Reading outputs with MSNoiseResult

MSNoiseResult — user-facing class for loading computed results.

MSNoiseResult is the recommended entry point for reading any output produced by the MSNoise pipeline from a notebook or script. You do not need to know the on-disk path layout or the low-level core.io functions: just build a result object for your lineage and call the appropriate get_* method.

Two construction paths exist depending on whether a database is available:

DB-connected (active project): use from_ids(), from_names(), or list().
DB-free (portable bundle): use from_bundle() — no database, no msnoise db init, works anywhere the msnoise package is installed.

For the full reference see msnoise.results.MSNoiseResult.

Dynamic method gating

Methods are dynamically gated: only methods whose required step category is present in the lineage are exposed. Accessing an unavailable method raises AttributeError with a clear message, and the method is absent from dir() and tab-completion, so Jupyter’s autocomplete only shows what is actually readable.

Constructing a result object (DB-connected)

Use MSNoiseResult.from_ids() with the integer config-set numbers for each step you want to include. You only need to go as far down the pipeline as the data you want to read:

from msnoise.results import MSNoiseResult
from msnoise.core.db import connect

db = connect()

# ----- read stacked CCFs and everything downstream -----
r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1,
                           stack=1, refstack=1)
da = r.get_ccf("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))

# ----- read raw (pre-stack) CC outputs only -----
# Initialise only to filter level — get_ccf_raw is still available.
r_cc = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)
da = r_cc.get_ccf_raw("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ",
                       date="2023-01-01", kind="all")

# ----- read all the way to dv/v -----
r_dvv = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1,
                               stack=1, refstack=1,
                               mwcs=1, mwcs_dtt=1, mwcs_dtt_dvv=1)
ds = r_dvv.get_dvv(pair_type="CC", components="ZZ", mov_stack=("1D", "1D"))

# Only methods valid for this lineage are visible
r_cc.get_ccf(...)      # AttributeError — 'stack' not in lineage
r_cc.get_ccf_raw(...)  # works — 'cc' is in lineage

Available get_* methods

Method	Requires in lineage	Returns
`get_ccf_raw`	`cc`	Raw per-window or daily-stacked CCFs written by `compute_cc`
`get_ccf`	`stack`	Moving-stacked CCFs written by `stack`
`get_ref`	`refstack`	Reference stacks written by `stack_refstack`
`get_mwcs`	`mwcs`	MWCS results written by `compute_mwcs`
`get_mwcs_dtt`	`mwcs_dtt`	MWCS dt/t results written by `compute_mwcs_dtt`
`get_stretching`	`stretching`	Stretching results written by `compute_stretching`
`get_wct`	`wavelet`	Wavelet coherence results written by `compute_wavelet`
`get_wct_dtt`	`wavelet_dtt`	WCT dt/t results written by `compute_wavelet_dtt`
`get_dvv`	any dvv step	Aggregated dv/v written by `compute_<method>_dvv`
`get_psd`	`psd`	Daily PSD written by `psd_compute`
`get_psd_rms`	`psd_rms`	PSD RMS written by `psd_compute_rms`

Reading raw CC outputs (pre-stack)

get_ccf_raw reads the files written directly by s03_compute_no_rotation before any stacking. Two storage layouts exist depending on the config:

kind="all" — per-window CCFs under _output/all/ with dims (times, taxis) where times is the window start time.
kind="daily" — daily-stacked CCFs under _output/daily/ with dim (taxis,) per file.

The method only requires cc in the lineage, so you can call it on a filter-level result without needing stack or anything further:

r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)

# Single day — per-window DataArray with dims (times, taxis)
da = r.get_ccf_raw("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ",
                    date="2023-01-01", kind="all")

# All days for one pair, daily stacks — dict keyed by date string
d = r.get_ccf_raw("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", kind="daily")

# Concatenate all days into a single DataArray (times, taxis)
import xarray as xr
da_all = xr.concat(d.values(), dim="times").sortby("times")

Navigating branches

When a lineage has more than one downstream branch (e.g. mwcs_1 and stretching_1 both follow refstack_1), use branches() to enumerate them:

r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1,
                           stack=1, refstack=1)
for branch in r.branches():
    print(branch)   # MSNoiseResult(category='mwcs', ...)

branches() works identically on bundle-loaded results — it uses the DAG topology from get_workflow_chains() and confirms each branch by checking for an _output subdirectory on disk.

Iterating over all completed results

list() returns every MSNoiseResult for which at least one Done job exists in the given category:

for r in MSNoiseResult.list(db, "mwcs_dtt"):
    ds = r.get_mwcs_dtt("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))

Exporting dv/v with provenance

export_dvv() writes dv/v NetCDF files that embed full provenance (lineage string, all config parameters, station metadata):

r = MSNoiseResult.list(db, "mwcs_dtt_dvv")[0]
written = r.export_dvv("exports/")
for f in written:
    print(f)  # dvv_CC_ZZ__pre1-cc1-f1-stk1-ref1-mwcs1-dtt1-dvv1__m1D-1D.nc

# Reload and inspect the embedded params
import xarray as xr
from msnoise.params import MSNoiseParams
ds = xr.open_dataset(written[0])
params = MSNoiseParams.from_yaml_string(ds.attrs["msnoise_params"])
print(params.mwcs.mwcs_wlen)

Portable bundles — HPC to laptop, or journal supplementary data

export_bundle() packages a selectable slice of the computed _output/ tree into a self-describing directory (or .zip), together with params.yaml (full lineage config) and MANIFEST.json (sha256 per file, station list, original output path). The bundle can be read on any machine with MSNoise installed — no database required.

Selecting what to include with the from_step= argument:

`from_step=`	What is copied
`"stack"`	Moving-stacked CCFs + all downstream steps (large)
`"refstack"`	Reference stacks + all dv/v branches (moderate; default for CC workflows)
`"mwcs_dtt_dvv"`	Only the final MWCS dv/v aggregates (small; ideal for paper supplements)
`"psd"`	PSD daily files + RMS (moderate; default for PSD-only workflows)
`"cc"` / `"filter"`	Raw per-window CCFs + everything downstream (very large; opt-in)
`None`	Earliest step with `_output` in the lineage (auto-detected)

The full lineage directory nesting is preserved verbatim inside the bundle root, so from_bundle() can set output_folder = bundle_root and all xr_get_*() calls resolve paths identically — nothing in core/io.py needed to change.

Typical HPC → laptop workflow:

# ── On the HPC (has DB) ───────────────────────────────────────────────
from msnoise.results import MSNoiseResult
from msnoise.core.db import connect

db = connect()
r = MSNoiseResult.list(db, "mwcs_dtt_dvv")[0]

# Export reference stacks + all dv/v branches as a zip:
bundle = r.export_bundle(
    "belgium_2023/",
    from_step="refstack",
    compress=True,          # → belgium_2023.zip  (~50 MB typical network)
)

# Export only the final dv/v for a journal data-availability statement:
r.export_bundle("paper_SI/", from_step="mwcs_dtt_dvv", compress=True)

# ── rsync to local machine ────────────────────────────────────────────
# rsync -avz hpc:/scratch/user/msnoise/belgium_2023.zip ./

# ── On the laptop / at the reviewer's machine (no DB needed) ─────────
from msnoise.results import MSNoiseResult

r2 = MSNoiseResult.from_bundle("belgium_2023.zip")

# Optional integrity check (recommended for published data):
r2.verify()
# OK — 347 files verified.

# Full MSNoiseResult API works immediately:
ds  = r2.get_dvv("CC", "ZZ", ("1D", "1D"))
da  = r2.get_ccf("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))
ref = r2.get_ref("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ")

# Navigate branches (folder scan — no DB):
for branch in r2.branches():
    print(branch)
# MSNoiseResult(category='mwcs_dtt_dvv', lineage='...', available_methods=[...])
# MSNoiseResult(category='stretching_dvv', lineage='...', available_methods=[...])

# Inspect the exact parameters that produced the result:
print(r2.params.mwcs.mwcs_wlen)        # → 10.0
print(r2.params.cc.cc_sampling_rate)   # → 20.0

# params.yaml is also preserved in MANIFEST.json alongside the original
# HPC output_folder path, msnoise version, and generation timestamp —
# sufficient for a data-availability or reproducibility statement.

class msnoise.results.MSNoiseResult(db, lineage_names: list, _params=None)

A resolved pipeline branch with dynamically gated data-loading methods.

Only get_* methods whose required step category appears in lineage_names are accessible. Accessing a gated method raises AttributeError with a helpful message and the method does not appear in dir() or tab-completion.

Do not instantiate directly — use from_ids(), from_names(), or list().

Attributes:

lineage_nameslist[str]: Ordered step-name strings, e.g. ['preprocess_1', 'cc_1', 'filter_1', 'stack_1', 'refstack_1'].
categorystr: Category of the terminal step, e.g. 'refstack'.
paramsMSNoiseParams: Merged configuration parameters for this lineage.
output_folderstr: Root output folder.

classmethod from_names(db, names: list) → MSNoiseResult: Build from an explicit list of step-name strings.

classmethod from_ids(db, preprocess=None, cc=None, psd=None, psd_rms=None, filter=None, stack=None, refstack=None, mwcs=None, mwcs_dtt=None, mwcs_dtt_dvv=None, stretching=None, stretching_dvv=None, wavelet=None, wavelet_dtt=None, wavelet_dtt_dvv=None) → MSNoiseResult: Build from integer configset IDs in canonical workflow order.

classmethod list(db, category: str, include_empty: bool = False) → list: Return all done MSNoiseResult objects for a step category.

classmethod from_bundle(path: str) → MSNoiseResult

Load a read-only MSNoiseResult from a bundle directory or .zip file produced by export_bundle().

No database connection is required. All get_* methods work immediately after construction. branches() uses a folder scan rather than the DB.

Parameters:: path – Path to a bundle directory or a .zip file.
Returns:: MSNoiseResult with _db=None.
Raises:: FileNotFoundError – if params.yaml is absent from the bundle.

Example — directory bundle:

r = MSNoiseResult.from_bundle("/data/msnoise_bundle/")
ds = r.get_dvv("CC", "ZZ", ("1D", "1D"))

Example — zip bundle:

r = MSNoiseResult.from_bundle("/data/bundle.zip")
da = r.get_ccf("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))
r.verify()

get_distance(pair: str) → float

Return interstation distance in km for pair.

Works with a live DB (via _db) or in DB-free archive mode (reads station coordinates from project.yaml).

Parameters:: pair – "NET.STA.LOC:NET.STA.LOC" formatted pair string.
Returns:: Distance in kilometres.
Return type:: float

get_distances(pairs=None, components=None) → dict

Return {pair: dist_km} for all (or selected) pairs.

Discovers available pairs by scanning the _output filesystem tree so no DB is required. Distances are computed via get_distance().

Parameters:

pairs – Iterable of pair strings to restrict the result. None = all discovered pairs.
components – Component filter (e.g. "ZZ"). None = all.

Returns:

Dict mapping pair string → distance in km.

Return type:

dict[str, float]

branches(include_empty: bool = False) → list

Return downstream MSNoiseResult objects one step below this lineage.

In bundle / DB-free mode (_db is None) the DAG topology is taken from get_workflow_chains() and child directories are checked for the presence of an _output subdirectory to confirm they were actually computed.

verify(verbose: bool = False) → bool

Verify bundle integrity against MANIFEST.json.

Re-hashes every file listed in the manifest and compares to the stored sha256 digest. Prints a summary line; returns True if all pass.

Parameters:: verbose – If True, print every file path as it is checked.
Returns:: True if all checksums match, False otherwise.
Raises:: FileNotFoundError – if MANIFEST.json is absent (not a bundle, or loaded via from_ids() / from_names()).

Example:

r = MSNoiseResult.from_bundle("/data/bundle/")
assert r.verify(), "Bundle integrity check failed"

get_ccf(pair=None, components=None, mov_stack=None)

Load CCFs from the stack step. Requires ‘stack’ in lineage.

Returns:: xarray.DataArray (single pair/comp/ms) or dict of DataArrays keyed by (pair, comp, mov_stack).

get_ccf_raw(pair=None, components=None, date=None, kind='all')

Load raw CC step outputs (per-window or daily-stacked CCFs).

Reads the files written by s03_compute_no_rotation before stacking. Requires only 'cc' in the lineage, so a result initialised down to filter_N (e.g. MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)) exposes this method.

Path layout on disk:

<o> / preprocess_1 / cc_1 / filter_1 / _output / all|daily
             / <comp> / <sta1>_<sta2> / <YYYY-MM-DD>.nc

Parameters:

pairstr or None: Station pair in "NET.STA.LOC:NET.STA.LOC" format. When None all available pairs are returned.
componentsstr or None: Component string, e.g. "ZZ". None = all.
datestr or None: ISO date "YYYY-MM-DD". None = all available dates.
kind{“all”, “daily”}: "all" — per-window CCFs (_output/all/), dims (times, taxis) per file. "daily" — daily-stacked CCFs (_output/daily/), dim (taxis,) per file, expanded with a times coordinate so results can be concatenated by the caller.

Returns:

xarray.DataArray: When pair, components and date are all specified.
dict: Otherwise: keys are (pair_key, comp, date_str) tuples, collapsed to 1-tuples for any dimension fixed by the caller.

Raises:

ValueError

If no cc_* or filter step is found in the lineage.

Example::

r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)

# Single file — per-window DataArray (times, taxis) da = r.get_ccf_raw(“BE.UCC..HHZ:BE.MEM..HHZ”, “ZZ”,

date=”2023-01-01”, kind=”all”)

# All dates for one pair/comp — daily stacks d = r.get_ccf_raw(“BE.UCC..HHZ:BE.MEM..HHZ”, “ZZ”, kind=”daily”) # d is {date_str: DataArray(times=[T], taxis=…)}

get_ref(pair=None, components=None)

Load reference stacks. Requires ‘refstack’ in lineage.

Returns:: xarray.DataArray (single pair/comp) or dict of DataArrays keyed by (pair, comp).

get_mwcs(pair=None, components=None, mov_stack=None)

Load MWCS results. Requires ‘mwcs’ in lineage.

Returns:: xarray.Dataset or dict of Datasets.

get_mwcs_dtt(pair=None, components=None, mov_stack=None)

Load MWCS-DTT results. Requires ‘mwcs_dtt’ in lineage.

Returns:: xarray.Dataset or dict of Datasets.

get_stretching(pair=None, components=None, mov_stack=None)

Load stretching results. Requires ‘stretching’ in lineage.

Returns:: xarray.Dataset or dict of Datasets.

get_dvv(pair_type='ALL', components=None, mov_stack=None)

Load DVV aggregate results. Requires a DVV step in lineage.

Returns:: xarray.Dataset or dict of Datasets keyed by (pair_type, comp, mov_stack).

get_dvv_pairs(pair_type='ALL', components=None, mov_stack=None)

Load per-pair dv/v time series. Requires a DVV step in lineage.

Returns a Dataset with dims (pair, times) and variables dvv and err, containing the standardised dv/v for every pair that was included in the aggregate — useful for per-pair QC, selection, and rejection.

Same signature as get_dvv(). Returns None for any combination where the pairs file was not yet produced (e.g. results computed before this feature was added).

Returns:: xarray.Dataset or dict of Datasets keyed by (pair_type, comp, mov_stack).

Example:

r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1,
                           stack=1, refstack=1, mwcs=1,
                           mwcs_dtt=1, mwcs_dtt_dvv=1)
ds = r.get_dvv_pairs("CC", "ZZ", ("1D", "1D"))
# ds["dvv"].sel(pair="PF.FOR.00:PF.CSS.00").plot()
# mask bad pairs and re-aggregate:
good = ds["dvv"].where(ds["err"] < 0.01).mean("pair")

get_wct(pair=None, components=None, mov_stack=None)

Load WCT results. Requires ‘wavelet’ in lineage.

Returns:: xarray.Dataset or dict of Datasets.

get_wct_dtt(pair=None, components=None, mov_stack=None)

Load WCT dt/t results. Requires ‘wavelet_dtt’ in lineage.

Returns:: xarray.Dataset or dict of Datasets.

get_psd(seed_id=None, day=None)

Load PSD results. Requires ‘psd’ in lineage.

Returns:: xarray.Dataset or dict of Datasets keyed by (seed_id, day).

get_psd_rms(seed_id=None)

Load PSD RMS results. Requires ‘psd_rms’ in lineage.

Returns:: xarray.Dataset or dict of Datasets keyed by seed_id.

export_bundle(dest: str, from_step: str | None = None, compress: bool = False, overwrite: bool = False) → str

Export a portable, self-describing bundle of computed results.

Copies the selected portion of the _output/ tree together with params.yaml and MANIFEST.json (sha256 per file) into dest. The result can be read on any machine with MSNoise installed using from_bundle() — no database connection required.

The full lineage directory nesting is preserved verbatim, so that from_bundle() can set output_folder = bundle_root and all existing xr_get_*() calls resolve paths identically.

Parameters:

dest:

Output directory path. Created if absent. If compress is True, a <dest>.zip file is written instead and the temporary directory is removed on success.

from_step:

Category name of the first step whose _output/ to include. Must be present in the lineage. All downstream steps are included automatically; ancestor directory names are created as empty path components to preserve the full nesting.

Accepted values (examples): "stack", "refstack", "mwcs_dtt_dvv", "psd". "cc" advances automatically to "filter" because raw CCFs are physically stored under the filter-step directory.

None (default) picks the earliest step in the lineage that has an _output directory ("stack" for CC workflows, "psd" for PSD-only workflows).

compress:

If True, zip the bundle directory after writing and return the .zip path. The directory is removed on success.

overwrite:

If True, overwrite an existing dest directory or .zip.

Returns:

str: Absolute path to the bundle directory or .zip file written.

Raises:

ValueError: If from_step is not found in the lineage.
FileExistsError: If dest (or <dest>.zip) already exists and overwrite is False.
FileNotFoundError: If the source directory for from_step does not exist on disk.

Examples

Export from refstack downwards, compressed:

db = connect()
r = MSNoiseResult.list(db, "mwcs_dtt_dvv")[0]
path = r.export_bundle(
    "belgium_2023/",
    from_step="refstack",
    compress=True,
)
# → belgium_2023.zip

Read on another machine:

r2 = MSNoiseResult.from_bundle("belgium_2023.zip")
r2.verify()
ds = r2.get_dvv("CC", "ZZ", ("1D", "1D"))

Journal supplementary data (dvv only):

r.export_bundle(
    "paper_SI/",
    from_step="mwcs_dtt_dvv",
    compress=True,
)

static to_dataframe(ds)

Convert an xarray Dataset or DataArray returned by any get_* method to a DataFrame.

This is the recommended escape hatch for users who need pandas for custom analysis, plotting, or export. All get_* methods return xarray objects; call this helper when you need a DataFrame:

r = MSNoiseResult.from_ids(db, mwcs=1, mwcs_dtt=1)
ds = r.get_mwcs_dtt("BE.UCC:BE.MEM", "ZZ", ("1D", "1D"))
df = MSNoiseResult.to_dataframe(ds)

For Datasets with multiple variables the result is a DataFrame with a MultiIndex on the columns. For DataArrays the result is a flat DataFrame indexed by times.

Parameters:: ds – xarray.Dataset or xarray.DataArray.
Returns:: DataFrame.

export_dvv(path: str, pair_type: str = 'CC', components: str | None = None, mov_stack: tuple | None = None) → list

Export dv/v time series bundled with full parameter provenance.

Writes one NetCDF file per (components, mov_stack) combination. Each file embeds the dv/v statistics (mean, std, median, n_pairs, and any weighted / trimmed variants) plus global attributes for full reproducibility:

lineage: Slash-separated step-name path.
msnoise_params: Full YAML dump of params — one block per config category. Load offline with yaml.safe_load(ds.attrs["msnoise_params"]).
msnoise_version: Package version string.
generated: ISO-8601 UTC timestamp.
pair_type, components, mov_stack: Provenance of the specific slice exported.

Parameters:

path:: Output directory (created if absent) or a full .nc path. When a directory is given, files are named dvv_<pair_type>_<comp>__<lineage_tag>__m<ms>.nc. A full path requires exactly one (components, mov_stack) to be specified.
pair_type:: Pair-type filter (default "CC").
components:: Component string, e.g. "ZZ". None exports all found.
mov_stack:: Tuple (window, step). None exports all found.

Returns:

list of str

Paths of files written.

Example::

db = connect() r = MSNoiseResult.list(db, “mwcs_dtt_dvv”)[0] written = r.export_dvv(“exports/”, pair_type=”CC”) for f in written:

print(f) # dvv_CC_ZZ__pre1-cc1-f1-stk1-ref1-mwcs1-dtt1-dvv1__m1D-1D.nc

# Reload and inspect provenance import xarray as xr, yaml ds = xr.open_dataset(written[0]) params = yaml.safe_load(ds.attrs[“msnoise_params”]) print(params[“mwcs”][“mwcs_wlen”])