Reproducible Papers client (MRP)

MSNoise Reproducible Papers (MRP) client.

Provides programmatic access to the MSNoise Reproducible Papers registry — a curated collection of project.yaml files and optional data bundles that reproduce published studies using MSNoise.

Quick start:

from msnoise.papers import MRP

mrp = MRP()
mrp.list_papers()

paper = mrp.get_paper("2016_DePlaen_PitonDeLaFournaise")
paper.info()

# Downloads the archive on first call; cached locally afterwards.
project = paper.get_project("stack")
for result in project.list("stack"):
    ds = result.get_ccf()

The returned MSNoiseProject is identical to one obtained via from_archive() — all get_* methods work without a database connection.

Browsing available papers

MRP.list_papers() prints a table of all papers in the registry:

mrp = MRP()
mrp.list_papers()
# ID                                    Year  Net       Levels          ✓
# 2016_DePlaen_PitonDeLaFournaise       2016  PF......  stack, dvv      ✅

Loading a paper

paper = mrp.get_paper("2016_DePlaen_PitonDeLaFournaise")
paper.info()
# Paper:   2016_DePlaen_PitonDeLaFournaise
# journal_abbrev: GRL
# ...
# bundle_levels_available: ['stack', 'dvv']

Papers with multiple datasets (e.g. two volcanoes) expose multiple project files. Pass project= to disambiguate:

paper = mrp.get_paper("2023_Yates_PitonRuapehu")
project_pdf     = paper.get_project("dvv", project="pdf")
project_ruapehu = paper.get_project("dvv", project="ruapehu")

Cache management

Downloaded archives are stored in the platform user-cache directory (~/.cache/msnoise-mrp/ on Linux). To free space:

mrp.clear_cache("2016_DePlaen_PitonDeLaFournaise")  # one paper
mrp.clear_cache()                                    # all archives

Registry metadata and small paper files are never deleted by MRP.clear_cache(). To force a fresh registry download:

mrp = MRP(force_refresh=True)

Contributing a paper

See the CONTRIBUTING guide in the registry repository. In brief:

  1. Fork the repo, create papers/<YYYY_Author_Title>/

  2. Add project.yaml, citation.bib, meta.yaml, README.md

  3. Run python scripts/update_registry.py && python scripts/update_readme.py

  4. Open a PR — CI validates schemas and runs msnoise db init on every project*.yaml

exception msnoise.papers.LevelNotAvailable

Raised when a requested bundle level is absent from bundle_pointer.yaml.

exception msnoise.papers.AmbiguousProject

Raised when a paper has multiple project files and no project= kwarg was supplied to MRPPaper.get_project().

class msnoise.papers.MRP(cache_dir: str | Path | None = None, force_refresh: bool = False)

Client for the MSNoise Reproducible Papers registry.

Parameters:
  • cache_dir – Local directory used to cache downloaded files. Defaults to the platform user-cache directory for "msnoise-mrp" (e.g. ~/.cache/msnoise-mrp on Linux).

  • force_refresh – If True, re-download registry.yaml even if a cached copy exists. Downloaded paper archives are never re-downloaded; use clear_cache() to force a fresh download.

list_papers() None

Pretty-print a table of available papers.

get_paper(paper_id: str) MRPPaper

Fetch a paper’s metadata and return an MRPPaper object.

Downloads project*.yaml, meta.yaml, and bundle_pointer.yaml (if present) from the registry into the local cache.

Parameters:

paper_id – Folder name in the registry, e.g. "2016_DePlaen_PitonDeLaFournaise".

Raises:

KeyError – if paper_id is not listed in the registry.

clear_cache(paper_id: str | None = None) None

Delete downloaded archives from the local cache.

Registry cache (registry.yaml) and metadata files are never deleted — only .tar.zst archives.

Parameters:

paper_id – Delete archives for this paper only. None (default) deletes all paper archives.

class msnoise.papers.MRPPaper(paper_id: str, cache_dir: Path, mrp: MRP)

Represents a single paper in the MRP registry.

Not constructed directly — obtain via MRP.get_paper().

property project_yaml: dict

The raw parsed project.yaml for the default project.

property projects: dict[str, str]

Map of project name → absolute path to project*.yaml in cache.

A paper with a single project.yaml has key "default". Papers with multiple datasets have keys like "pdf", "ruapehu", derived from project_<name>.yaml filenames.

info() None

Print metadata for this paper.

get_project(level: str | list[str], project: str = 'default') msnoise.project.MSNoiseProject

Download archive(s) for level and return an MSNoiseProject.

Archives are downloaded once and cached permanently; subsequent calls return immediately (or skip already-extracted levels). Use MRP.clear_cache() to force a fresh download.

Parameters:
  • level – Entry level(s) to download. Pass a single string (e.g. "stack"), a list (["stack", "dvv"]), or "all" to download every level in bundle_pointer.yaml. All archives are extracted into the same directory.

  • project – Project name for papers with multiple datasets. Omit (or "default") for single-project papers.

Raises:
  • LevelNotAvailable – if a requested level is absent.

  • AmbiguousProject – if multiple projects exist and project was not specified.

  • FileNotFoundError – if no bundle_pointer.yaml exists.