Bulk Waveform Download (FDSN/EIDA)
When your project uses a remote FDSN or EIDA DataSource, there are
two ways to get waveforms into the pipeline:
Mode |
How it works |
When to use |
|---|---|---|
Bulk download |
|
Large date ranges, HPC clusters, reproducible paper datasets, slow or unreliable FDSN connections. |
On-the-fly (stream) |
The preprocess step fetches each day’s waveforms directly from FDSN during processing. No local archive needed. |
Short date ranges, fast reliable connections, interactive exploration. |
Bulk download workflow
# 1. Initialise the project (from a YAML or manually)
msnoise db init --from-yaml project.yaml
# 2. Populate station metadata (sets used_location_codes / used_channel_names)
msnoise utils import-stationxml https://...
# or: msnoise admin → Stations → mark used=True
# 3. Download all raw waveforms into the local SDS archive
msnoise utils download
# 4. Scan the archive to populate data availability
msnoise scan_archive --init
# 5. Run the full pipeline
msnoise utils run_workflow
The SDS write root is resolved automatically in this order:
--sds-path PATHoption (explicit override).The single unambiguous local SDS
DataSourceURI from the database../SDS(with a warning when no local SDS DataSource is found, or when multiple local SDS sources are configured).
StationXML files are stored under <sds_root>/../stationxml/ and are never
overwritten if already present. Traces are never discarded for missing
instrument response (sanitize=False).
# Override the SDS path or restrict the date range:
msnoise utils download --sds-path /data/SDS
msnoise utils download --startdate 2013-06-01 --enddate 2013-06-30
On-the-fly (stream) workflow
# 1. Initialise the project
msnoise db init --from-yaml project.yaml
# 2. Create preprocess jobs directly (no scan_archive needed)
msnoise utils create_preprocess_jobs --date_range 2013-04-01 2014-10-31
# 3. Run the pipeline — preprocess fetches from FDSN on demand
msnoise utils run_workflow
Prompt at db init
When msnoise db init --from-yaml detects a remote DataSource and the
project has explicit startdate/enddate values, it prompts:
DataSource(s) 'resif-fdsn' use remote FDSN/EIDA.
How would you like to proceed?
[1] Bulk download first — fetch all raw waveforms into SDS, then run pipeline
[2] Stream from FDSN — preprocess fetches on-the-fly (creates jobs now)
[3] Skip — I'll handle it manually