Workflow Concepts (MSNoise 2.x)
MSNoise 2.x introduced a fundamentally different architecture from 1.x. This page explains the key concepts — config sets, workflow steps, lineages, and jobs — that underpin everything else.
Config Sets
A config set is a named collection of parameters for one processing category.
Every category (cc, mwcs, filter, stack, …) can have one or more
config sets, numbered starting at 1.
You can think of a config set as “one specific parameterisation of one processing step”.
Creating and inspecting config sets
# List all config sets
msnoise config list_sets
# Create a second mwcs config set (inherits defaults)
msnoise config create_set mwcs
# Inspect and edit
msnoise config list mwcs # all mwcs sets
msnoise config list mwcs.2 # mwcs set 2 only
msnoise config set mwcs.2.freqmin 1.0
msnoise config set mwcs.2.freqmax 5.0
# Copy an existing set as a starting point
msnoise config copy_set mwcs 1 mwcs 2
Parameter notation uses dots: category.set_number.param_name.
For set 1 (the default), the set number can be omitted: mwcs.freqmin.
Practical example: two filter bands
Create a second filter config set for a higher frequency band:
msnoise config create_set filter
msnoise config set filter.2.freqmin 1.0
msnoise config set filter.2.freqmax 5.0
After running msnoise db upgrade (or project initialisation), MSNoise creates
a second CC branch through filter_2. Both filter branches share the same
preprocessed data and CC results, but produce independent stacked CCFs,
MWCS results, and dv/v curves — all written to separate folders under OUTPUT/.
Workflow Steps and Links
Every config set becomes a WorkflowStep in the database (table
WorkflowStep). Steps are then wired together by WorkflowLinks
(table WorkflowLink), which form the processing DAG.
The topology follows the canonical order:
global_1
├── preprocess_1 → cc_1 → filter_1 → stack_1 ─────────────────────────────┐
│ filter_2 → stack_2 (siblings of refstack) │
│ └─ refstack_1 → mwcs_1 → mwcs_dtt_1 → mwcs_dtt_dvv_1
│ stretching_1 → stretching_dvv_1
│ wavelet_1 → wavelet_dtt_1 → wavelet_dtt_dvv_1
└── psd_1 → psd_rms_1
stack_N and refstack_M are **siblings** (both children of filter).
mwcs jobs encode both parents: …/stack_N/refstack_M/mwcs_1
Steps are created automatically from config sets. The admin web UI
(msnoise admin) lets you view and edit the workflow graph.
Pass-through categories
Some categories (filter, global) are pass-through nodes: they have
no worker script and no jobs. They exist purely as parameter namespaces.
propagate_downstream recurses through them transparently.
Lineages
A lineage encodes the full path from the root of the DAG to the current step.
It is stored as a /-separated string of step names:
preprocess_1/cc_1/filter_1/refstack_1 ← refstack (sibling of stack)
preprocess_1/cc_1/filter_1/stack_1/refstack_1/mwcs_1 ← mwcs encodes both parents
Every job carries a lineage ID that resolves to this string. The lineage serves two purposes:
Output path: the lineage string becomes the folder hierarchy under
OUTPUT/. For example, MWCS results for thefilter_1branch land at:OUTPUT/preprocess_1/cc_1/filter_1/stack_1/refstack_1/mwcs_1/_output/
while the same step on the
filter_2branch lands at:OUTPUT/preprocess_1/cc_1/filter_2/stack_1/refstack_1/mwcs_1/_output/
This means two config set branches never overwrite each other.
Parameter resolution: when a worker loads its parameters, it reads the config for every step in its lineage, building a layered
MSNoiseParamsobject.params.mwcs.freqmintherefore always refers to the correct config set for this particular branch.
Reading lineages
from msnoise.plugins import connect, get_next_lineage_batch
db = connect()
batch = get_next_lineage_batch(db, "mwcs", group_by="pair_lineage")
print(batch["lineage_str"]) # e.g. "preprocess_1/.../mwcs_1"
print(batch["lineage_names"]) # ["preprocess_1", "cc_1", ..., "mwcs_1"]
print(batch["params"].mwcs.freqmin) # correct for this branch
Jobs
A job is one unit of work: (day, pair, step, lineage). Jobs are stored
in the Job table with a flag:
Flag |
Meaning |
|---|---|
|
Todo — ready to be claimed by a worker |
|
In progress — claimed atomically by one worker |
|
Done — completed successfully |
|
Failed — worker raised an exception |
Job lifecycle
msnoise new_jobs → creates T jobs for preprocess and psd
worker claims job → T → I (atomic, safe for parallel workers)
worker finishes → I → D
propagate_downstream → creates T jobs for the next step(s)
worker raises exception → I → F
msnoise reset cc_1 → resets I/F jobs back to T (step name)
msnoise reset cc → resets all cc_N steps (category name)
msnoise reset cc_1 --all → resets ALL cc_1 jobs to T (including D)
msnoise reset cc -a -d → resets cc_1 + all downstream steps
Run the full pipeline
After seeding jobs, run every step in dependency order with one command:
msnoise utils run_workflow # sequential, 1 worker
msnoise utils run_workflow -t 8 # 8 parallel workers per step
msnoise utils run_workflow --dry-run # preview without executing
msnoise utils run_workflow --export-script run.sh # write a shell script
See Run the full workflow automatically in the how-to guide for the full option reference.
HPC mode
When |global.hpc| is Y, propagate_downstream is not called
inline by workers. Instead, the operator runs msnoise new_jobs --after X
manually between steps to trigger downstream job creation.
msnoise utils run_workflow --hpc handles this automatically.
Output Paths
All output files follow a predictable hierarchy:
OUTPUT / lineage_names_upstream / step_name / _output / mov_stack / component / pair.nc
where lineage_names_upstream is the lineage excluding the current step.
Examples:
OUTPUT/preprocess_1/cc_1/filter_1/_output/all/ZZ/YA.UV05.00_YA.UV06.00/2024-01-01.nc
OUTPUT/preprocess_1/cc_1/filter_1/stack_1/_output/1D_1D/ZZ/YA.UV05.00_YA.UV06.00.nc
OUTPUT/preprocess_1/cc_1/filter_1/refstack_1/_output/REF/ZZ/YA.UV05.00_YA.UV06.00.nc ← refstack folder (no stack_N prefix)
OUTPUT/preprocess_1/cc_1/filter_1/stack_1/refstack_1/mwcs_1/_output/1D_1D/ZZ/YA.UV05.00_YA.UV06.00.nc ← mwcs encodes both
OUTPUT/psd_1/_output/daily/YA.UV05.00.HHZ/2024-01-01.nc
OUTPUT/psd_1/psd_rms_1/_output/YA.UV05.00.HHZ.nc
Reading Results: MSNoiseResult
Once the pipeline has run you do not need to know the on-disk path layout or
call low-level core.io functions. The recommended interface is
MSNoiseResult, a single object that:
knows which lineage branch it covers;
exposes only the
get_*methods valid for that branch (invalid ones raiseAttributeErrorand are hidden from tab-completion);returns
xarrayDataset or DataArray objects that you can slice, plot, or convert to pandas with one call.
from msnoise.results import MSNoiseResult
from msnoise.core.db import connect
db = connect()
# Stacked CCFs
r = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1, stack=1)
da = r.get_ccf("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ", ("1D", "1D"))
# Raw (pre-stack) CC outputs — only cc in lineage needed
r_cc = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1)
da = r_cc.get_ccf_raw("BE.UCC..HHZ:BE.MEM..HHZ", "ZZ",
date="2023-01-01", kind="all")
# dv/v via MWCS
r_dvv = MSNoiseResult.from_ids(db, preprocess=1, cc=1, filter=1,
stack=1, refstack=1,
mwcs=1, mwcs_dtt=1, mwcs_dtt_dvv=1)
ds = r_dvv.get_dvv(pair_type="CC", components="ZZ", mov_stack=("1D", "1D"))
# PSDs
r_psd = MSNoiseResult.from_ids(db, psd=1, psd_rms=1)
ds = r_psd.get_psd("BE.UCC..HHZ", day="2023-01-01")
Every workflow step page links back to the full guide.
See also
Reading outputs with MSNoiseResult — full MSNoiseResult guide with all methods,
the kind="all"/"daily" CC output modes, branch navigation, and
dv/v export with provenance.
Common recipes
Two filter frequency bands
msnoise config create_set filter
msnoise config set filter.1.freqmin 0.1
msnoise config set filter.1.freqmax 1.0
msnoise config set filter.2.freqmin 1.0
msnoise config set filter.2.freqmax 5.0
msnoise db upgrade
Two MWCS window lengths
msnoise config create_set mwcs
msnoise config set mwcs.1.mwcs_wlen 10
msnoise config set mwcs.2.mwcs_wlen 20
msnoise db upgrade
Multiple moving stack windows
Moving stacks are configured within a single stack config set as a
tuple-of-tuples. No additional config set is needed:
msnoise config set stack.mov_stack "(('1D','1D'),('7D','1D'),('30D','1D'))"
Checking the workflow graph
msnoise admin # opens the web UI at http://localhost:5000
# → Workflow → Steps / Links
or from Python:
from msnoise.plugins import connect, get_workflow_steps, get_workflow_chains
db = connect()
for step in get_workflow_steps(db):
print(step.step_name, step.category, step.set_number)
print(get_workflow_chains()) # full topology (plugin-aware)