Reading outputs with MSNoiseResult

MSNoiseResult — user-facing class for loading computed results.

MSNoiseResult is the recommended entry point for reading any output produced by the MSNoise pipeline from a notebook or script. You do not need to know the on-disk path layout or the low-level core.io functions: just build a result object for your lineage and call the appropriate get_* method.

Two construction paths exist depending on whether a database is available:

For the full reference see msnoise.results.MSNoiseResult.

Dynamic method gating

Methods are dynamically gated: only methods whose required step category is present in the lineage are exposed. Accessing an unavailable method raises AttributeError with a clear message, and the method is absent from dir() and tab-completion, so Jupyter’s autocomplete only shows what is actually readable.

Constructing a result object (DB-connected)

Use MSNoiseResult.from_ids() with the integer config-set numbers for each step you want to include. You only need to go as far down the pipeline as the data you want to read:

from msnoise.results import MSNoiseResult
from msnoise.core.db import connect

db = connect()

# ----- read stacked CCFs and everything downstream -----
r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1,
                           stack=1, refstack=1)
da = r.get_ccf("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))

# ----- read raw (pre-stack) CC outputs only -----
# Initialise only to filter level — get_ccf_raw is still available.
r_cc = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)
da = r_cc.get_ccf_raw("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ",
                       date="2023-01-01", kind="all")

# ----- read all the way to dv/v -----
r_dvv = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1,
                               stack=1, refstack=1,
                               mwcs=1, mwcs_dtt=1, mwcs_dtt_dvv=1)
ds = r_dvv.get_dvv(pair_type="CC", components="ZZ", mov_stack=("1D", "1D"))

# Only methods valid for this lineage are visible
r_cc.get_ccf(...)      # AttributeError — 'stack' not in lineage
r_cc.get_ccf_raw(...)  # works — 'cc' is in lineage

Available get_* methods

Method

Requires in lineage

Returns

get_ccf_raw

cc

Raw per-window or daily-stacked CCFs written by compute_cc

get_ccf

stack

Moving-stacked CCFs written by stack

get_ref

refstack

Reference stacks written by stack_refstack

get_mwcs

mwcs

MWCS results written by compute_mwcs

get_mwcs_dtt

mwcs_dtt

MWCS dt/t results written by compute_mwcs_dtt

get_stretching

stretching

Stretching results written by compute_stretching

get_wct

wavelet

Wavelet coherence results written by compute_wavelet

get_wct_dtt

wavelet_dtt

WCT dt/t results written by compute_wavelet_dtt

get_dvv

any dvv step

Aggregated dv/v written by compute_<method>_dvv

get_psd

psd

Daily PSD written by psd_compute

get_psd_rms

psd_rms

PSD RMS written by psd_compute_rms

Reading raw CC outputs (pre-stack)

get_ccf_raw reads the files written directly by s03_compute_no_rotation before any stacking. Two storage layouts exist depending on the config:

  • kind="all" — per-window CCFs under _output/all/ with dims (times, taxis) where times is the window start time.

  • kind="daily" — daily-stacked CCFs under _output/daily/ with dim (taxis,) per file.

The method only requires cc in the lineage, so you can call it on a filter-level result without needing stack or anything further:

r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)

# Single day — per-window DataArray with dims (times, taxis)
da = r.get_ccf_raw("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ",
                    date="2023-01-01", kind="all")

# All days for one pair, daily stacks — dict keyed by date string
d = r.get_ccf_raw("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", kind="daily")

# Concatenate all days into a single DataArray (times, taxis)
import xarray as xr
da_all = xr.concat(d.values(), dim="times").sortby("times")

Iterating over all completed results

list() returns every MSNoiseResult for which at least one Done job exists in the given category:

for r in MSNoiseResult.list(db, "mwcs_dtt"):
    ds = r.get_mwcs_dtt("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))

Exporting dv/v with provenance

export_dvv() writes dv/v NetCDF files that embed full provenance (lineage string, all config parameters, station metadata):

r = MSNoiseResult.list(db, "mwcs_dtt_dvv")[0]
written = r.export_dvv("exports/")
for f in written:
    print(f)  # dvv_CC_ZZ__pre1-cc1-f1-stk1-ref1-mwcs1-dtt1-dvv1__m1D-1D.nc

# Reload and inspect the embedded params
import xarray as xr
from msnoise.params import MSNoiseParams
ds = xr.open_dataset(written[0])
params = MSNoiseParams.from_yaml_string(ds.attrs["msnoise_params"])
print(params.mwcs.mwcs_wlen)

Portable bundles — HPC to laptop, or journal supplementary data

export_bundle() packages a selectable slice of the computed _output/ tree into a self-describing directory (or .zip), together with params.yaml (full lineage config) and MANIFEST.json (sha256 per file, station list, original output path). The bundle can be read on any machine with MSNoise installed — no database required.

Selecting what to include with the from_step= argument:

from_step=

What is copied

"stack"

Moving-stacked CCFs + all downstream steps (large)

"refstack"

Reference stacks + all dv/v branches (moderate; default for CC workflows)

"mwcs_dtt_dvv"

Only the final MWCS dv/v aggregates (small; ideal for paper supplements)

"psd"

PSD daily files + RMS (moderate; default for PSD-only workflows)

"cc" / "filter"

Raw per-window CCFs + everything downstream (very large; opt-in)

None

Earliest step with _output in the lineage (auto-detected)

The full lineage directory nesting is preserved verbatim inside the bundle root, so from_bundle() can set output_folder = bundle_root and all xr_get_*() calls resolve paths identically — nothing in core/io.py needed to change.

Typical HPC → laptop workflow:

# ── On the HPC (has DB) ───────────────────────────────────────────────
from msnoise.results import MSNoiseResult
from msnoise.core.db import connect

db = connect()
r = MSNoiseResult.list(db, "mwcs_dtt_dvv")[0]

# Export reference stacks + all dv/v branches as a zip:
bundle = r.export_bundle(
    "belgium_2023/",
    from_step="refstack",
    compress=True,          # → belgium_2023.zip  (~50 MB typical network)
)

# Export only the final dv/v for a journal data-availability statement:
r.export_bundle("paper_SI/", from_step="mwcs_dtt_dvv", compress=True)

# ── rsync to local machine ────────────────────────────────────────────
# rsync -avz hpc:/scratch/user/msnoise/belgium_2023.zip ./

# ── On the laptop / at the reviewer's machine (no DB needed) ─────────
from msnoise.results import MSNoiseResult

r2 = MSNoiseResult.from_bundle("belgium_2023.zip")

# Optional integrity check (recommended for published data):
r2.verify()
# OK — 347 files verified.

# Full MSNoiseResult API works immediately:
ds  = r2.get_dvv("CC", "ZZ", ("1D", "1D"))
da  = r2.get_ccf("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))
ref = r2.get_ref("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ")

# Navigate branches (folder scan — no DB):
for branch in r2.branches():
    print(branch)
# MSNoiseResult(category='mwcs_dtt_dvv', lineage='...', available_methods=[...])
# MSNoiseResult(category='stretching_dvv', lineage='...', available_methods=[...])

# Inspect the exact parameters that produced the result:
print(r2.params.mwcs.mwcs_wlen)        # → 10.0
print(r2.params.cc.cc_sampling_rate)   # → 20.0

# params.yaml is also preserved in MANIFEST.json alongside the original
# HPC output_folder path, msnoise version, and generation timestamp —
# sufficient for a data-availability or reproducibility statement.
class msnoise.results.MSNoiseResult(db, lineage_names: list, _params=None)

A resolved pipeline branch with dynamically gated data-loading methods.

Only get_* methods whose required step category appears in lineage_names are accessible. Accessing a gated method raises AttributeError with a helpful message and the method does not appear in dir() or tab-completion.

Do not instantiate directly — use from_ids(), from_names(), or list().

Attributes:
lineage_nameslist[str]

Ordered step-name strings, e.g. ['preprocess_1', 'cc_1', 'filter_1', 'stack_1', 'refstack_1'].

categorystr

Category of the terminal step, e.g. 'refstack'.

paramsMSNoiseParams

Merged configuration parameters for this lineage.

output_folderstr

Root output folder.

classmethod from_names(db, names: list) MSNoiseResult

Build from an explicit list of step-name strings.

classmethod from_ids(db, preprocess=None, cc=None, psd=None, psd_rms=None, filter=None, stack=None, refstack=None, mwcs=None, mwcs_dtt=None, mwcs_dtt_dvv=None, stretching=None, stretching_dvv=None, wavelet=None, wavelet_dtt=None, wavelet_dtt_dvv=None) MSNoiseResult

Build from integer configset IDs in canonical workflow order.

classmethod list(db, category: str, include_empty: bool = False) list

Return all done MSNoiseResult objects for a step category.

classmethod from_bundle(path: str) MSNoiseResult

Load a read-only MSNoiseResult from a bundle directory or .zip file produced by export_bundle().

No database connection is required. All get_* methods work immediately after construction. branches() uses a folder scan rather than the DB.

Parameters:

path – Path to a bundle directory or a .zip file.

Returns:

MSNoiseResult with _db=None.

Raises:

FileNotFoundError – if params.yaml is absent from the bundle.

Example — directory bundle:

r = MSNoiseResult.from_bundle("/data/msnoise_bundle/")
ds = r.get_dvv("CC", "ZZ", ("1D", "1D"))

Example — zip bundle:

r = MSNoiseResult.from_bundle("/data/bundle.zip")
da = r.get_ccf("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))
r.verify()
branches(include_empty: bool = False) list

Return downstream MSNoiseResult objects one step below this lineage.

In bundle / DB-free mode (_db is None) the DAG topology is taken from get_workflow_chains() and child directories are checked for the presence of an _output subdirectory to confirm they were actually computed.

verify(verbose: bool = False) bool

Verify bundle integrity against MANIFEST.json.

Re-hashes every file listed in the manifest and compares to the stored sha256 digest. Prints a summary line; returns True if all pass.

Parameters:

verbose – If True, print every file path as it is checked.

Returns:

True if all checksums match, False otherwise.

Raises:

FileNotFoundError – if MANIFEST.json is absent (not a bundle, or loaded via from_ids() / from_names()).

Example:

r = MSNoiseResult.from_bundle("/data/bundle/")
assert r.verify(), "Bundle integrity check failed"
get_ccf(pair=None, components=None, mov_stack=None)

Load CCFs from the stack step. Requires ‘stack’ in lineage.

Returns:

xarray.DataArray (single pair/comp/ms) or dict of DataArrays keyed by (pair, comp, mov_stack).

get_ccf_raw(pair=None, components=None, date=None, kind='all')

Load raw CC step outputs (per-window or daily-stacked CCFs).

Reads the files written by s03_compute_no_rotation before stacking. Requires only 'cc' in the lineage, so a result initialised down to filter_N (e.g. MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)) exposes this method.

Path layout on disk:

<o> / preprocess_1 / cc_1 / filter_1 / _output / all|daily
             / <comp> / <sta1>_<sta2> / <YYYY-MM-DD>.nc
Parameters:
pairstr or None

Station pair in "NET.STA.LOC:NET.STA.LOC" format. When None all available pairs are returned.

componentsstr or None

Component string, e.g. "ZZ". None = all.

datestr or None

ISO date "YYYY-MM-DD". None = all available dates.

kind{“all”, “daily”}

"all" — per-window CCFs (_output/all/), dims (times, taxis) per file. "daily" — daily-stacked CCFs (_output/daily/), dim (taxis,) per file, expanded with a times coordinate so results can be concatenated by the caller.

Returns:
xarray.DataArray

When pair, components and date are all specified.

dict

Otherwise: keys are (pair_key, comp, date_str) tuples, collapsed to 1-tuples for any dimension fixed by the caller.

Raises:
ValueError

If no cc_* or filter step is found in the lineage.

Example::

r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)

# Single file — per-window DataArray (times, taxis) da = r.get_ccf_raw(“BE.UCC..HHZ:BE.MEM..HHZ”, “ZZ”,

date=”2023-01-01”, kind=”all”)

# All dates for one pair/comp — daily stacks d = r.get_ccf_raw(“BE.UCC..HHZ:BE.MEM..HHZ”, “ZZ”, kind=”daily”) # d is {date_str: DataArray(times=[T], taxis=…)}

get_ref(pair=None, components=None)

Load reference stacks. Requires ‘refstack’ in lineage.

Returns:

xarray.DataArray (single pair/comp) or dict of DataArrays keyed by (pair, comp).

get_mwcs(pair=None, components=None, mov_stack=None)

Load MWCS results. Requires ‘mwcs’ in lineage.

Returns:

xarray.Dataset or dict of Datasets.

get_mwcs_dtt(pair=None, components=None, mov_stack=None)

Load MWCS-DTT results. Requires ‘mwcs_dtt’ in lineage.

Returns:

xarray.Dataset or dict of Datasets.

get_stretching(pair=None, components=None, mov_stack=None)

Load stretching results. Requires ‘stretching’ in lineage.

Returns:

xarray.Dataset or dict of Datasets.

get_dvv(pair_type='ALL', components=None, mov_stack=None)

Load DVV aggregate results. Requires a DVV step in lineage.

Returns:

xarray.Dataset or dict of Datasets keyed by (pair_type, comp, mov_stack).

get_dvv_pairs(pair_type='ALL', components=None, mov_stack=None)

Load per-pair dv/v time series. Requires a DVV step in lineage.

Returns a Dataset with dims (pair, times) and variables dvv and err, containing the standardised dv/v for every pair that was included in the aggregate — useful for per-pair QC, selection, and rejection.

Same signature as get_dvv(). Returns None for any combination where the pairs file was not yet produced (e.g. results computed before this feature was added).

Returns:

xarray.Dataset or dict of Datasets keyed by (pair_type, comp, mov_stack).

Example:

r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1,
                           stack=1, refstack=1, mwcs=1,
                           mwcs_dtt=1, mwcs_dtt_dvv=1)
ds = r.get_dvv_pairs("CC", "ZZ", ("1D", "1D"))
# ds["dvv"].sel(pair="PF.FOR.00:PF.CSS.00").plot()
# mask bad pairs and re-aggregate:
good = ds["dvv"].where(ds["err"] < 0.01).mean("pair")
get_wct(pair=None, components=None, mov_stack=None)

Load WCT results. Requires ‘wavelet’ in lineage.

Returns:

xarray.Dataset or dict of Datasets.

get_wct_dtt(pair=None, components=None, mov_stack=None)

Load WCT dt/t results. Requires ‘wavelet_dtt’ in lineage.

Returns:

xarray.Dataset or dict of Datasets.

get_psd(seed_id=None, day=None)

Load PSD results. Requires ‘psd’ in lineage.

Returns:

xarray.Dataset or dict of Datasets keyed by (seed_id, day).

get_psd_rms(seed_id=None)

Load PSD RMS results. Requires ‘psd_rms’ in lineage.

Returns:

xarray.Dataset or dict of Datasets keyed by seed_id.

export_bundle(dest: str, from_step: str | None = None, compress: bool = False, overwrite: bool = False) str

Export a portable, self-describing bundle of computed results.

Copies the selected portion of the _output/ tree together with params.yaml and MANIFEST.json (sha256 per file) into dest. The result can be read on any machine with MSNoise installed using from_bundle() — no database connection required.

The full lineage directory nesting is preserved verbatim, so that from_bundle() can set output_folder = bundle_root and all existing xr_get_*() calls resolve paths identically.

Parameters:
dest:

Output directory path. Created if absent. If compress is True, a <dest>.zip file is written instead and the temporary directory is removed on success.

from_step:

Category name of the first step whose _output/ to include. Must be present in the lineage. All downstream steps are included automatically; ancestor directory names are created as empty path components to preserve the full nesting.

Accepted values (examples): "stack", "refstack", "mwcs_dtt_dvv", "psd". "cc" advances automatically to "filter" because raw CCFs are physically stored under the filter-step directory.

None (default) picks the earliest step in the lineage that has an _output directory ("stack" for CC workflows, "psd" for PSD-only workflows).

compress:

If True, zip the bundle directory after writing and return the .zip path. The directory is removed on success.

overwrite:

If True, overwrite an existing dest directory or .zip.

Returns:
str

Absolute path to the bundle directory or .zip file written.

Raises:
ValueError

If from_step is not found in the lineage.

FileExistsError

If dest (or <dest>.zip) already exists and overwrite is False.

FileNotFoundError

If the source directory for from_step does not exist on disk.

Examples

Export from refstack downwards, compressed:

db = connect()
r = MSNoiseResult.list(db, "mwcs_dtt_dvv")[0]
path = r.export_bundle(
    "belgium_2023/",
    from_step="refstack",
    compress=True,
)
# → belgium_2023.zip

Read on another machine:

r2 = MSNoiseResult.from_bundle("belgium_2023.zip")
r2.verify()
ds = r2.get_dvv("CC", "ZZ", ("1D", "1D"))

Journal supplementary data (dvv only):

r.export_bundle(
    "paper_SI/",
    from_step="mwcs_dtt_dvv",
    compress=True,
)
static to_dataframe(ds)

Convert an xarray Dataset or DataArray returned by any get_* method to a DataFrame.

This is the recommended escape hatch for users who need pandas for custom analysis, plotting, or export. All get_* methods return xarray objects; call this helper when you need a DataFrame:

r = MSNoiseResult.from_ids(db, mwcs=1, mwcs_dtt=1)
ds = r.get_mwcs_dtt("BE.UCC:BE.MEM", "ZZ", ("1D", "1D"))
df = MSNoiseResult.to_dataframe(ds)

For Datasets with multiple variables the result is a DataFrame with a MultiIndex on the columns. For DataArrays the result is a flat DataFrame indexed by times.

Parameters:

dsxarray.Dataset or xarray.DataArray.

Returns:

DataFrame.

export_dvv(path: str, pair_type: str = 'CC', components: str | None = None, mov_stack: tuple | None = None) list

Export dv/v time series bundled with full parameter provenance.

Writes one NetCDF file per (components, mov_stack) combination. Each file embeds the dv/v statistics (mean, std, median, n_pairs, and any weighted / trimmed variants) plus global attributes for full reproducibility:

lineage

Slash-separated step-name path.

msnoise_params

Full YAML dump of params — one block per config category. Load offline with yaml.safe_load(ds.attrs["msnoise_params"]).

msnoise_version

Package version string.

generated

ISO-8601 UTC timestamp.

pair_type, components, mov_stack

Provenance of the specific slice exported.

Parameters:
path:

Output directory (created if absent) or a full .nc path. When a directory is given, files are named dvv_<pair_type>_<comp>__<lineage_tag>__m<ms>.nc. A full path requires exactly one (components, mov_stack) to be specified.

pair_type:

Pair-type filter (default "CC").

components:

Component string, e.g. "ZZ". None exports all found.

mov_stack:

Tuple (window, step). None exports all found.

Returns:
list of str

Paths of files written.

Example::

db = connect() r = MSNoiseResult.list(db, “mwcs_dtt_dvv”)[0] written = r.export_dvv(“exports/”, pair_type=”CC”) for f in written:

print(f) # dvv_CC_ZZ__pre1-cc1-f1-stk1-ref1-mwcs1-dtt1-dvv1__m1D-1D.nc

# Reload and inspect provenance import xarray as xr, yaml ds = xr.open_dataset(written[0]) params = yaml.safe_load(ds.attrs[“msnoise_params”]) print(params[“mwcs”][“mwcs_wlen”])