{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# CC vs PCC \u2014 Side-by-side Comparison\n\nThis notebook builds a minimal MSNoise project from scratch and runs two\nparallel correlation lineages on the same dataset:\n\n| Step name | Method | Notes |\n|-----------------|--------|------------------------------------|\n| `preprocess_1` | \u2014 | shared preprocessing |\n| `cc_1` | CC | standard geometrically-normalised |\n| `cc_2` | PCC | phase cross-correlation (v=2) |\n| `filter_1` | \u2014 | shared band-pass filter |\n| `stack_1` | linear | shared stacking config |\n\nBoth `cc_1` and `cc_2` feed the same `filter_1` \u2192 `stack_1` steps.\nThe final section retrieves stacked CCFs from both lineages and plots\nthem side-by-side for a direct comparison.\n\n## Prerequisites\n\n* MSNoise installed (with the PCC patch applied).\n* A one-day MiniSEED dataset for **at least two stations**.\n The classic MSNoise tutorial dataset (network `YA`, stations `UV05`,\n `UV06`, `UV10`, day 2010-09-01 / Julian day 244, stored in PDF layout)\n is used as the reference, but **any SDS or PDF archive works** \u2014 just\n adjust the USER SETTINGS cell below.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0 \u00b7 Imports\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# %matplotlib inline\nimport os\nimport shutil\nimport logging\nimport tempfile\n\nimport numpy as np\nimport matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n\n\n# MSNoise core\nfrom msnoise.core.db import connect, create_database_inifile\nfrom msnoise.core.config import (\n create_config_set, update_config\n)\nfrom msnoise.core.stations import update_station\nfrom msnoise.core.workflow import (\n create_workflow_steps_from_config_sets,\n create_workflow_links_from_steps,\n get_workflow_steps,\n reset_jobs\n)\nfrom msnoise.msnoise_table_def import declare_tables, DataAvailability\nfrom msnoise.results import MSNoiseResult\n\n# MSNoise compute steps\nfrom msnoise.s01_scan_archive import main as scan_archive\nfrom msnoise.s02_new_jobs import main as new_jobs\nfrom msnoise.s02_preprocessing import main as preprocess\nfrom msnoise.s03_compute_no_rotation import main as compute_cc\nfrom msnoise.s04_stack_mov import main as stack_mov\n\nlogging.basicConfig(level=logging.WARNING) # suppress INFO noise in notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1 \u00b7 User Settings\n\n**Edit this cell to match your dataset.**\n\nThe defaults match the classic MSNoise tutorial dataset (YA network,\nstations UV05/UV06/UV10, PDF archive layout).\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# \u2500\u2500 Path to your waveform archive \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n# Can be an absolute path or relative.\n# The installer will store it verbatim in the DataSource table.\nDATA_PATH = r\"C:\\Users\\tlecocq\\AppData\\Local\\msnoise-testdata\\msnoise-testdata\\Cache\\1.1\\classic\\data\"\n\n# \u2500\u2500 Archive layout \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n# \"PDF\" \u2192 YEAR/STA/CHAN.D/NET.STA.LOC.CHAN.D.YEAR.JDAY\n# \"SDS\" \u2192 YEAR/NET/STA/CHAN.D/NET.STA.LOC.CHAN.D.YEAR.JDAY\nDATA_STRUCTURE = \"PDF\" # \u2190 \"SDS\" or \"PDF\"\n\n# \u2500\u2500 Network / channel filter used when scanning the archive \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nNETWORK_CODE = \"YA\"\nCHANNELS = \"HHZ\"\n\n# \u2500\u2500 Date range to process (ISO format, inclusive) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nSTARTDATE = \"2010-09-01\"\nENDDATE = \"2010-09-01\"\n\n# \u2500\u2500 Station coordinates (net, sta, lon\u00b0E, lat\u00b0N, elev_m) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n# Fill in your actual stations. These are the YA tutorial stations.\nSTATIONS = [\n (\"YA\", \"UV05\", 29.735, -17.817, 1174.0),\n (\"YA\", \"UV06\", 29.785, -17.827, 1162.0),\n (\"YA\", \"UV10\", 29.790, -17.847, 1180.0),\n]\n\n# \u2500\u2500 CC config (shared by both cc_1 and cc_2) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nCC_SAMPLING_RATE = 20.0 # Hz \u2014 resample target\nMAXLAG = 120.0 # seconds\nCORR_DURATION = 1800.0 # seconds per window\nCOMPONENTS = \"ZZ\"\n\n# \u2500\u2500 Filter passband \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nFREQMIN = 0.1 # Hz\nFREQMAX = 4.0 # Hz\n\n# \u2500\u2500 Stack config \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nMOV_STACK = \"(('1h','1h'),)\" # 1-hour stack\n\n# \u2500\u2500 Working directory (project folder) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n# A fresh temporary directory is created automatically.\n# Set to a fixed path if you want a persistent project between kernel restarts.\nWORK_DIR = None # None \u2192 auto temp dir" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 \u00b7 Create Project Directory and MSNoise Database\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if WORK_DIR is None:\n WORK_DIR = tempfile.mkdtemp(prefix=\"msnoise_ccvspcc_\")\n print(f\"Created temporary project directory: {WORK_DIR}\")\nelse:\n os.makedirs(WORK_DIR, exist_ok=True)\n print(f\"Using project directory: {WORK_DIR}\")\n\nos.chdir(WORK_DIR)\n\n# Initialise SQLite database\ncreate_database_inifile(\n tech=1,\n hostname=os.path.join(WORK_DIR, \"msnoise.sqlite\"),\n database=\"\", username=\"\", password=\"\", prefix=\"\",\n)\n\n# Create all tables\ndb = connect()\ndeclare_tables().Base.metadata.create_all(db.get_bind())\n\nprint(\"Database created:\", os.path.join(WORK_DIR, \"msnoise.sqlite\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3 \u00b7 Create Default Config Sets and Workflow Skeleton\n\nWe create **one** config set for every category except `cc`, for which we\nwill create **two** (cc_1 = CC, cc_2 = PCC).\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ALL_CATEGORIES = [\n \"global\", \"preprocess\", \"filter\", \"stack\", \"refstack\",\n \"mwcs\", \"mwcs_dtt\", \"mwcs_dtt_dvv\",\n \"stretching\", \"stretching_dvv\",\n \"wavelet\", \"wavelet_dtt\", \"wavelet_dtt_dvv\",\n \"psd\", \"psd_rms\",\n]\n\nfor cat in ALL_CATEGORIES:\n sn = create_config_set(db, cat)\n print(f\" {cat}_1 created (set_number={sn})\")\n\n# CC set 1 \u2014 standard cross-correlation\ncc1_sn = create_config_set(db, \"cc\")\nprint(f\" cc_{cc1_sn} created \u2190 standard CC\")\n\n# CC set 2 \u2014 phase cross-correlation\ncc2_sn = create_config_set(db, \"cc\")\nprint(f\" cc_{cc2_sn} created \u2190 PCC\")\n\ndb.commit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4 \u00b7 Configure Global Parameters\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "OUTPUT_FOLDER = os.path.join(WORK_DIR, \"OUTPUT\")\n\nglobal_params = {\n \"output_folder\": OUTPUT_FOLDER,\n \"startdate\": STARTDATE,\n \"enddate\": ENDDATE,\n \"hpc\": \"N\", # propagate_downstream fires automatically\n}\nfor k, v in global_params.items():\n update_config(db, k, v, category=\"global\", set_number=1)\n print(f\" global.{k} = {v!r}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5 \u00b7 Configure the Two CC Sets\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# \u2500\u2500 Shared CC parameters \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\ncc_shared = {\n \"cc_sampling_rate\": str(CC_SAMPLING_RATE),\n \"maxlag\": str(MAXLAG),\n \"corr_duration\": str(CORR_DURATION),\n \"components_to_compute\": COMPONENTS,\n \"keep_all\": \"Y\",\n \"keep_days\": \"Y\",\n \"whitening\": \"A\", # whiten all inter-station pairs\n \"winsorizing\": \"3.0\", # 3\u00d7 RMS clipping\n \"clip_after_whiten\": \"N\",\n \"stack_method\": \"linear\",\n}\n\nfor k, v in cc_shared.items():\n update_config(db, k, v, category=\"cc\", set_number=cc1_sn)\n update_config(db, k, v, category=\"cc\", set_number=cc2_sn)\n\n# \u2500\u2500 cc_1: standard CC \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nupdate_config(db, \"cc_type\", \"CC\", category=\"cc\", set_number=cc1_sn)\nupdate_config(db, \"whitening\", \"A\", category=\"cc\", set_number=cc1_sn)\nprint(f\"cc_{cc1_sn}: cc_type = CC (standard cross-correlation)\")\n\n# \u2500\u2500 cc_2: PCC (Phase Cross-Correlation v=2) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n#\n# What happens in the PCC branch of s03:\n# 1. The filter_N bandpass is applied to the time-domain traces so each\n# filter produces a genuinely distinct CCF (CC does this via whiten2's\n# frequency window; PCC needs it done explicitly).\n# 2. If whitening != \"N\", whiten2 is also applied within the band BEFORE\n# AmpNorm. This flattens the spectrum so the phase signal is broadband\n# across the full filter passband rather than dominated by the most\n# energetic frequency in the band (\"spectrally whitened PCC\",\n# Ventosa 2019 \u00a73.1). With whitening=\"N\" you get the classic PCC\n# result whose bandwidth is set solely by the preprocessing bandpass.\n#\n# Winsorising: AmpNorm already discards all amplitude information per-sample,\n# so winsorising (clipping at N\u00d7RMS) is redundant for PCC. Disable it.\nupdate_config(db, \"cc_type\", \"PCC\", category=\"cc\", set_number=cc2_sn)\nupdate_config(db, \"winsorizing\", \"0\", category=\"cc\", set_number=cc2_sn)\nupdate_config(db, \"whitening\", \"A\", category=\"cc\", set_number=cc2_sn)\nprint(f\"cc_{cc2_sn}: cc_type = PCC (phase cross-correlation v=2)\")\nprint(f\" winsorizing=0 (redundant: AmpNorm already discards amplitude)\")\nprint(f\" whitening=A (spectral whitening within band before AmpNorm)\")\n\ndb.commit()\n\n\"\"\nreset_jobs(db, \"cc_1\", alljobs=True)\nreset_jobs(db, \"cc_2\", alljobs=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6 \u00b7 Configure Filter and Stack\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "update_config(db, \"freqmin\", str(FREQMIN), category=\"filter\", set_number=1)\nupdate_config(db, \"freqmax\", str(FREQMAX), category=\"filter\", set_number=1)\nupdate_config(db, \"CC\", \"Y\", category=\"filter\", set_number=1)\n\nupdate_config(db, \"mov_stack\", MOV_STACK, category=\"stack\", set_number=1)\nupdate_config(db, \"ref_begin\", STARTDATE, category=\"refstack\", set_number=1)\nupdate_config(db, \"ref_end\", ENDDATE, category=\"refstack\", set_number=1)\n\ndb.commit()\nprint(f\"filter_1: {FREQMIN}\u2013{FREQMAX} Hz\")\nprint(f\"stack_1: {MOV_STACK}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7 \u00b7 Configure DataSource and Station Table\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Create the default DataSource (the installer normally does this; when\n# initialising the DB manually we must insert it ourselves).\nDataSource = declare_tables().DataSource\nds = DataSource(\n name=\"local\",\n uri=os.path.realpath(DATA_PATH),\n data_structure=DATA_STRUCTURE,\n auth_env=\"MSNOISE\",\n network_code=NETWORK_CODE,\n channels=CHANNELS,\n)\ndb.add(ds)\ndb.commit()\nprint(f\"DataSource created: uri={os.path.realpath(DATA_PATH)!r}\")\nprint(f\" data_structure={DATA_STRUCTURE!r}\")\n\n# Add stations\nfor net, sta, lon, lat, elev in STATIONS:\n update_station(db, net=net, sta=sta, X=lon, Y=lat, altitude=elev,\n coordinates=\"DEG\", used=1)\n print(f\" Added station {net}.{sta} ({lon:.3f}\u00b0E, {lat:.3f}\u00b0N)\")\n\ndb.commit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8 \u00b7 Build the Workflow Graph\n\nThe topology we want:\n\n```\npreprocess_1 \u2500\u2500\u25ba cc_1 \u2500\u2500\u25ba filter_1 \u2500\u2500\u25ba stack_1\n \u2514\u2500\u2500\u25ba cc_2 \u2500\u2500\u25ba \u2191\n```\n\nBoth cc steps feed the **same** filter_1 and stack_1.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Create WorkflowSteps from all config sets (auto-discovers preprocess_1,\n# cc_1, cc_2, filter_1, stack_1, \u2026)\ncreated, existing, err = create_workflow_steps_from_config_sets(db)\nassert err is None, f\"Error creating workflow steps: {err}\"\nprint(f\"Workflow steps: {created} created, {existing} already existed\")\n\n# Auto-link following MSNoise's canonical dependency rules\ncreated_links, existing_links, err = create_workflow_links_from_steps(db)\nassert err is None, f\"Error creating workflow links: {err}\"\nprint(f\"Workflow links: {created_links} created, {existing_links} already existed\")\n\n# Verify the topology\nsteps = {s.step_name: s for s in get_workflow_steps(db)}\nprint(\"\\nWorkflow steps present:\", sorted(steps.keys()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9 \u00b7 Scan Archive and Seed Jobs\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Scan the waveform archive \u2192 populate DataAvailability table\nscan_archive(init=True, threads=1)\n\n# Update loc/chan on Station rows from DataAvailability\nfrom sqlalchemy import text as _text\n_db2 = connect()\nfor sta in _db2.query(declare_tables().Station):\n data = _db2.query(DataAvailability). \\\n filter(_text(\"net=:net\")).filter(_text(\"sta=:sta\")). \\\n group_by(DataAvailability.net, DataAvailability.sta,\n DataAvailability.loc, DataAvailability.chan). \\\n params(net=sta.net, sta=sta.sta).all()\n sta.used_location_codes = \",\".join(sorted({d.loc for d in data}))\n sta.used_channel_names = \",\".join(sorted({d.chan for d in data}))\n_db2.commit()\n_db2.close()\n\n# Verify availability\n_db3 = connect()\nn_da = _db3.query(DataAvailability).count()\nprint(f\"DataAvailability rows: {n_da}\")\n_db3.close()\n\n# Seed initial jobs (creates preprocess_1 T-jobs)\nnew_jobs(init=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10 \u00b7 Run the Pipeline\n\nSteps run sequentially in-process. For large datasets use `msnoise` CLI\nor HPC submission instead.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\u2500\u2500 preprocess \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\npreprocess()\n\nprint(\"\u2500\u2500 new_jobs --after preprocess \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\nnew_jobs(after=\"preprocess\")\n\nprint(\"\u2500\u2500 compute_cc (cc_1 = CC and cc_2 = PCC) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\ncompute_cc()\n\nprint(\"\u2500\u2500 new_jobs --after cc \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\nnew_jobs(after=\"cc\")\n\nprint(\"\u2500\u2500 stack_mov \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\nstack_mov(stype=\"mov\")\n\nprint(\"\u2500\u2500 new_jobs --after stack \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\nnew_jobs(after=\"stack\")\n\nprint(\"Done.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11 \u00b7 Gather MSNoiseResult Objects\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "db = connect()\n\n# Lineage for CC branch: preprocess_1 / cc_1 / filter_1 / stack_1\nresult_cc = MSNoiseResult.from_ids(\n db,\n preprocess=1,\n cc=cc1_sn, # cc_1\n filter=1,\n stack=1,\n)\nprint(\"CC lineage:\", result_cc.lineage_names)\n\n# Lineage for PCC branch: preprocess_1 / cc_2 / filter_1 / stack_1\nresult_pcc = MSNoiseResult.from_ids(\n db,\n preprocess=1,\n cc=cc2_sn, # cc_2\n filter=1,\n stack=1,\n)\nprint(\"PCC lineage:\", result_pcc.lineage_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 12 \u00b7 Compare Stacked CCFs\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Retrieve all stacked CCFs for both lineages\nccfs_cc = result_cc.get_ccf()\nccfs_pcc = result_pcc.get_ccf()\n\nprint(f\"CC result: {len(ccfs_cc)} CCF(s) found\")\nprint(f\"PCC result: {len(ccfs_pcc)} CCF(s) found\")\n\nfor key in sorted(ccfs_cc):\n pair, comp, ms = key\n print(f\" {pair} {comp} {ms[0]}\u2013{ms[1]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 13 \u00b7 Plot: CC vs PCC per Pair\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig_dir = os.path.join(WORK_DIR, \"figures\")\nos.makedirs(fig_dir, exist_ok=True)\n\ncommon_keys = sorted(set(ccfs_cc) & set(ccfs_pcc))\nif not common_keys:\n print(\"No common keys found between CC and PCC results \u2014 check that \"\n \"the pipeline completed successfully.\")\nelse:\n n = len(common_keys)\n fig, axes = plt.subplots(n, 2, figsize=(14, 3.5 * n), squeeze=False)\n fig.suptitle(\n f\"Stacked CCF comparison: CC (left) vs PCC (right)\\n\"\n f\"Filter {FREQMIN}\u2013{FREQMAX} Hz | {STARTDATE}\",\n fontsize=13, y=1.01,\n )\n\n for row, key in enumerate(common_keys):\n pair, comp, ms = key\n da_cc = ccfs_cc[key]\n da_pcc = ccfs_pcc[key]\n\n # Time axis (seconds)\n taxis = da_cc.coords[\"taxis\"].values if \"taxis\" in da_cc.coords \\\n else np.linspace(-MAXLAG, MAXLAG, da_cc.shape[-1])\n\n # Stack over the time dimension if multiple days present\n ccf_cc = float(da_cc.mean()) if da_cc.ndim == 0 else da_cc.values\n ccf_pcc = float(da_pcc.mean()) if da_pcc.ndim == 0 else da_pcc.values\n\n if ccf_cc.ndim > 1:\n ccf_cc = ccf_cc.mean(axis=0)\n if ccf_pcc.ndim > 1:\n ccf_pcc = ccf_pcc.mean(axis=0)\n\n label = f\"{pair} {comp}\"\n\n # \u2500\u2500 CC panel \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n ax = axes[row, 0]\n ax.plot(taxis, ccf_cc, color=\"#1f77b4\", lw=0.9)\n ax.axvline(0, color=\"k\", lw=0.6, ls=\"--\", alpha=0.4)\n ax.set_title(f\"{label} \u2014 CC\")\n ax.set_xlabel(\"Lag (s)\")\n ax.set_ylabel(\"Amplitude\")\n ax.set_xlim(-MAXLAG, MAXLAG)\n\n # \u2500\u2500 PCC panel \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n ax = axes[row, 1]\n ax.plot(taxis, ccf_pcc, color=\"#d62728\", lw=0.9)\n ax.axvline(0, color=\"k\", lw=0.6, ls=\"--\", alpha=0.4)\n ax.set_title(f\"{label} \u2014 PCC\")\n ax.set_xlabel(\"Lag (s)\")\n ax.set_ylabel(\"Amplitude\")\n ax.set_xlim(-MAXLAG, MAXLAG)\n\n plt.tight_layout()\n outfig = os.path.join(fig_dir, \"cc_vs_pcc.png\")\n fig.savefig(outfig, dpi=150, bbox_inches=\"tight\")\n plt.show()\n print(f\"Saved: {outfig}\")\n\n\"\"\nimport numpy as np\nA = np.fft.fft(ccf_cc)\nf = np.fft.fftfreq(len(A), d=1/20.)[:len(A)//2]\nA /= A.max()\nB = np.fft.fft(ccf_pcc)\nB /= B.max()\nplt.figure(figsize=(15,5))\nplt.plot(f, abs(A[:len(A)//2]))\nplt.plot(f, abs(B[:len(B)//2]))\n# plt.ylim(0,0.15)\nplt.xlim(0,5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 14 \u00b7 Overlay: CC vs PCC on the same axes (one panel per pair)\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if common_keys:\n fig2, axes2 = plt.subplots(1, len(common_keys),\n figsize=(6 * len(common_keys), 4),\n squeeze=False)\n fig2.suptitle(\n f\"CC (blue) vs PCC (red) | {FREQMIN}\u2013{FREQMAX} Hz | {STARTDATE}\",\n fontsize=12,\n )\n\n for col, key in enumerate(common_keys):\n pair, comp, ms = key\n da_cc = ccfs_cc[key]\n da_pcc = ccfs_pcc[key]\n\n taxis = da_cc.coords[\"taxis\"].values if \"taxis\" in da_cc.coords \\\n else np.linspace(-MAXLAG, MAXLAG, da_cc.values.shape[-1])\n\n ccf_cc = da_cc.values\n ccf_pcc = da_pcc.values\n if ccf_cc.ndim > 1: ccf_cc = ccf_cc.mean(axis=0)\n if ccf_pcc.ndim > 1: ccf_pcc = ccf_pcc.mean(axis=0)\n\n # Normalise to ABSMAX for visual overlay\n def _norm(x):\n m = np.abs(x).max()\n return x / m if m > 0 else x\n\n ax = axes2[0, col]\n ax.plot(taxis, _norm(ccf_cc), color=\"#1f77b4\", lw=1.0,\n label=\"CC\", alpha=0.85)\n ax.plot(taxis, _norm(ccf_pcc), color=\"#d62728\", lw=1.0,\n label=\"PCC\", alpha=0.85)\n ax.axvline(0, color=\"k\", lw=0.6, ls=\"--\", alpha=0.3)\n ax.set_title(f\"{pair} {comp}\")\n ax.set_xlabel(\"Lag (s)\")\n ax.set_ylabel(\"Normalised amplitude\")\n ax.set_xlim(-MAXLAG, MAXLAG)\n ax.legend(loc=\"upper right\", fontsize=9)\n\n plt.tight_layout()\n outfig2 = os.path.join(fig_dir, \"cc_vs_pcc_overlay.png\")\n fig2.savefig(outfig2, dpi=150, bbox_inches=\"tight\")\n plt.show()\n print(f\"Saved: {outfig2}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 15 \u00b7 Summary Table\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n\nrows = []\nfor key in common_keys:\n pair, comp, ms = key\n cc_arr = ccfs_cc[key].values\n pcc_arr = ccfs_pcc[key].values\n if cc_arr.ndim > 1: cc_arr = cc_arr.mean(axis=0)\n if pcc_arr.ndim > 1: pcc_arr = pcc_arr.mean(axis=0)\n rows.append({\n \"Pair\": pair,\n \"Component\": comp,\n \"Mov. stack\": f\"{ms[0]}\u2013{ms[1]}\",\n \"CC peak\": f\"{np.abs(cc_arr).max():.4g}\",\n \"PCC peak\": f\"{np.abs(pcc_arr).max():.4g}\",\n \"CC lag@peak (s)\": f\"{np.where(np.abs(cc_arr) == np.abs(cc_arr).max())[0][0] - len(cc_arr)//2:.1f}\",\n \"PCC lag@peak (s)\": f\"{np.where(np.abs(pcc_arr) == np.abs(pcc_arr).max())[0][0] - len(pcc_arr)//2:.1f}\",\n })\n\ndf = pd.DataFrame(rows)\nprint(df.to_string(index=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notes\n\n### Why PCC output looks different from CC\n\n* **CC** retains amplitude information: large-energy windows (earthquakes,\n transients) dominate the stack even after winsorising. The CCF contains\n the full filter-band content weighted by window energy.\n\n* **PCC** (v=2) projects every sample of the analytic signal onto the unit\n circle before correlating \u2014 amplitude is discarded per-sample. The output\n is bounded to [\u22121, 1] by construction. The PCC branch in s03:\n (a) bandpasses to the filter_N frequencies so each filter is distinct,\n (b) optionally whitens within the band (controlled by `whitening` config)\n to give a broadband phase signal (\"spectrally whitened PCC\"),\n (c) then calls `pcc_xcorr` which computes AmpNorm + FFT cross-correlation.\n\n* **Winsorising** is disabled for `cc_2`: it clips amplitude extremes, which\n is meaningless after AmpNorm has already equalised every sample to unit\n amplitude.\n\n* **Whitening = \"N\" variant**: set `whitening` to `\"N\"` on `cc_2` for\n classic (non-whitened) PCC. The result will be dominated by the most\n energetic frequency within the preprocessing bandpass \u2014 often visually\n narrower than CC. The default (`whitening=\"A\"`) gives a more\n spectrally balanced PCC that compares directly to CC.\n\n### Adjusting the comparison\n\n* Change `FREQMIN` / `FREQMAX` in the User Settings cell and re-run from\n \u00a76 onwards \u2014 both lineages share `filter_1`, so re-running the stack\n step is sufficient (no need to redo preprocessing or CC).\n* The `result_cc` and `result_pcc` objects expose the full MSNoiseResult\n API: call `.get_mwcs()`, `.get_dvv()`, etc. after running the downstream\n steps on each lineage.\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.15" } }, "nbformat": 4, "nbformat_minor": 0 }