Reproducible Papers client (MRP)
MSNoise Reproducible Papers (MRP) client.
Provides programmatic access to the MSNoise Reproducible Papers registry — a
curated collection of project.yaml files and optional data bundles that
reproduce published studies using MSNoise.
Quick start:
from msnoise.papers import MRP
mrp = MRP()
mrp.list_papers()
paper = mrp.get_paper("2016_DePlaen_PitonDeLaFournaise")
paper.info()
# Downloads the archive on first call; cached locally afterwards.
project = paper.get_project("stack")
for result in project.list("stack"):
ds = result.get_ccf()
The returned MSNoiseProject is identical to one
obtained via from_archive() — all
get_* methods work without a database connection.
Browsing available papers
MRP.list_papers() prints a table of all papers in the registry:
mrp = MRP()
mrp.list_papers()
# ID Year Net Levels ✓
# 2016_DePlaen_PitonDeLaFournaise 2016 PF...... stack, dvv ✅
Loading a paper
paper = mrp.get_paper("2016_DePlaen_PitonDeLaFournaise")
paper.info()
# Paper: 2016_DePlaen_PitonDeLaFournaise
# journal_abbrev: GRL
# ...
# bundle_levels_available: ['stack', 'dvv']
Papers with multiple datasets (e.g. two volcanoes) expose multiple project
files. Pass project= to disambiguate:
paper = mrp.get_paper("2023_Yates_PitonRuapehu")
project_pdf = paper.get_project("dvv", project="pdf")
project_ruapehu = paper.get_project("dvv", project="ruapehu")
Cache management
Downloaded archives are stored in the platform user-cache directory
(~/.cache/msnoise-mrp/ on Linux). To free space:
mrp.clear_cache("2016_DePlaen_PitonDeLaFournaise") # one paper
mrp.clear_cache() # all archives
Registry metadata and small paper files are never deleted by
MRP.clear_cache(). To force a fresh registry download:
mrp = MRP(force_refresh=True)
Contributing a paper
See the CONTRIBUTING guide in the registry repository. In brief:
Fork the repo, create
papers/<YYYY_Author_Title>/Add
project.yaml,citation.bib,meta.yaml,README.mdRun
python scripts/update_registry.py && python scripts/update_readme.pyOpen a PR — CI validates schemas and runs
msnoise db initon everyproject*.yaml
- exception msnoise.papers.LevelNotAvailable
Raised when a requested bundle level is absent from
bundle_pointer.yaml.
- exception msnoise.papers.AmbiguousProject
Raised when a paper has multiple project files and no
project=kwarg was supplied toMRPPaper.get_project().
- class msnoise.papers.MRP(cache_dir: str | Path | None = None, force_refresh: bool = False)
Client for the MSNoise Reproducible Papers registry.
- Parameters:
cache_dir – Local directory used to cache downloaded files. Defaults to the platform user-cache directory for
"msnoise-mrp"(e.g.~/.cache/msnoise-mrpon Linux).force_refresh – If
True, re-downloadregistry.yamleven if a cached copy exists. Downloaded paper archives are never re-downloaded; useclear_cache()to force a fresh download.
- get_paper(paper_id: str) MRPPaper
Fetch a paper’s metadata and return an
MRPPaperobject.Downloads
project*.yaml,meta.yaml, andbundle_pointer.yaml(if present) from the registry into the local cache.- Parameters:
paper_id – Folder name in the registry, e.g.
"2016_DePlaen_PitonDeLaFournaise".- Raises:
KeyError – if paper_id is not listed in the registry.
- class msnoise.papers.MRPPaper(paper_id: str, cache_dir: Path, mrp: MRP)
Represents a single paper in the MRP registry.
Not constructed directly — obtain via
MRP.get_paper().- property projects: dict[str, str]
Map of project name → absolute path to
project*.yamlin cache.A paper with a single
project.yamlhas key"default". Papers with multiple datasets have keys like"pdf","ruapehu", derived fromproject_<name>.yamlfilenames.
- get_project(level: str | list[str], project: str = 'default') msnoise.project.MSNoiseProject
Download archive(s) for level and return an
MSNoiseProject.Archives are downloaded once and cached permanently; subsequent calls return immediately (or skip already-extracted levels). Use
MRP.clear_cache()to force a fresh download.- Parameters:
level – Entry level(s) to download. Pass a single string (e.g.
"stack"), a list (["stack", "dvv"]), or"all"to download every level inbundle_pointer.yaml. All archives are extracted into the same directory.project – Project name for papers with multiple datasets. Omit (or
"default") for single-project papers.
- Raises:
LevelNotAvailable – if a requested level is absent.
AmbiguousProject – if multiple projects exist and project was not specified.
FileNotFoundError – if no
bundle_pointer.yamlexists.