{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# CC vs PCC \u2014 Side-by-side Comparison\n\nThis notebook builds a minimal MSNoise project from scratch and runs two\nparallel correlation lineages on the same dataset:\n\n| Step name       | Method | Notes                              |\n|-----------------|--------|------------------------------------|\n| `preprocess_1`  | \u2014      | shared preprocessing               |\n| `cc_1`          | CC     | standard geometrically-normalised  |\n| `cc_2`          | PCC    | phase cross-correlation (v=2)      |\n| `filter_1`      | \u2014      | shared band-pass filter            |\n| `stack_1`       | linear | shared stacking config             |\n\nBoth `cc_1` and `cc_2` feed the same `filter_1` \u2192 `stack_1` steps.\nThe final section retrieves stacked CCFs from both lineages and plots\nthem side-by-side for a direct comparison.\n\n## Prerequisites\n\n* MSNoise installed (with the PCC patch applied).\n* A one-day MiniSEED dataset for **at least two stations**.\n  The classic MSNoise tutorial dataset (network `YA`, stations `UV05`,\n  `UV06`, `UV10`, day 2010-09-01 / Julian day 244, stored in PDF layout)\n  is used as the reference, but **any SDS or PDF archive works** \u2014 just\n  adjust the USER SETTINGS cell below.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 0 \u00b7 Imports\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# %matplotlib inline\nimport os\nimport shutil\nimport logging\nimport tempfile\n\nimport numpy as np\nimport matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n\n\n# MSNoise core\nfrom msnoise.core.db import connect, create_database_inifile\nfrom msnoise.core.config import (\n    create_config_set, update_config\n)\nfrom msnoise.core.stations import update_station\nfrom msnoise.core.workflow import (\n    create_workflow_steps_from_config_sets,\n    create_workflow_links_from_steps,\n    get_workflow_steps,\n    reset_jobs\n)\nfrom msnoise.msnoise_table_def import declare_tables, DataAvailability\nfrom msnoise.results import MSNoiseResult\n\n# MSNoise compute steps\nfrom msnoise.s01_scan_archive import main as scan_archive\nfrom msnoise.s02_new_jobs import main as new_jobs\nfrom msnoise.s02_preprocessing import main as preprocess\nfrom msnoise.s03_compute_no_rotation import main as compute_cc\nfrom msnoise.s04_stack_mov import main as stack_mov\n\nlogging.basicConfig(level=logging.WARNING)   # suppress INFO noise in notebook"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1 \u00b7 User Settings\n\n**Edit this cell to match your dataset.**\n\nThe defaults match the classic MSNoise tutorial dataset (YA network,\nstations UV05/UV06/UV10, PDF archive layout).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# \u2500\u2500 Path to your waveform archive \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n# Can be an absolute path or relative.\n# The installer will store it verbatim in the DataSource table.\nDATA_PATH = r\"C:\\Users\\tlecocq\\AppData\\Local\\msnoise-testdata\\msnoise-testdata\\Cache\\1.1\\classic\\data\"\n\n# \u2500\u2500 Archive layout \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n# \"PDF\"  \u2192  YEAR/STA/CHAN.D/NET.STA.LOC.CHAN.D.YEAR.JDAY\n# \"SDS\"  \u2192  YEAR/NET/STA/CHAN.D/NET.STA.LOC.CHAN.D.YEAR.JDAY\nDATA_STRUCTURE = \"PDF\"                    # \u2190 \"SDS\" or \"PDF\"\n\n# \u2500\u2500 Network / channel filter used when scanning the archive \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nNETWORK_CODE = \"YA\"\nCHANNELS     = \"HHZ\"\n\n# \u2500\u2500 Date range to process (ISO format, inclusive) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nSTARTDATE = \"2010-09-01\"\nENDDATE   = \"2010-09-01\"\n\n# \u2500\u2500 Station coordinates (net, sta, lon\u00b0E, lat\u00b0N, elev_m) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n# Fill in your actual stations.  These are the YA tutorial stations.\nSTATIONS = [\n    (\"YA\", \"UV05\",  29.735,  -17.817, 1174.0),\n    (\"YA\", \"UV06\",  29.785,  -17.827, 1162.0),\n    (\"YA\", \"UV10\",  29.790,  -17.847, 1180.0),\n]\n\n# \u2500\u2500 CC config (shared by both cc_1 and cc_2) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nCC_SAMPLING_RATE = 20.0     # Hz \u2014 resample target\nMAXLAG           = 120.0    # seconds\nCORR_DURATION    = 1800.0   # seconds per window\nCOMPONENTS       = \"ZZ\"\n\n# \u2500\u2500 Filter passband \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nFREQMIN = 0.1   # Hz\nFREQMAX = 4.0   # Hz\n\n# \u2500\u2500 Stack config \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nMOV_STACK = \"(('1h','1h'),)\"   # 1-hour stack\n\n# \u2500\u2500 Working directory (project folder) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n# A fresh temporary directory is created automatically.\n# Set to a fixed path if you want a persistent project between kernel restarts.\nWORK_DIR = None   # None \u2192 auto temp dir"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2 \u00b7 Create Project Directory and MSNoise Database\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if WORK_DIR is None:\n    WORK_DIR = tempfile.mkdtemp(prefix=\"msnoise_ccvspcc_\")\n    print(f\"Created temporary project directory: {WORK_DIR}\")\nelse:\n    os.makedirs(WORK_DIR, exist_ok=True)\n    print(f\"Using project directory: {WORK_DIR}\")\n\nos.chdir(WORK_DIR)\n\n# Initialise SQLite database\ncreate_database_inifile(\n    tech=1,\n    hostname=os.path.join(WORK_DIR, \"msnoise.sqlite\"),\n    database=\"\", username=\"\", password=\"\", prefix=\"\",\n)\n\n# Create all tables\ndb = connect()\ndeclare_tables().Base.metadata.create_all(db.get_bind())\n\nprint(\"Database created:\", os.path.join(WORK_DIR, \"msnoise.sqlite\"))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3 \u00b7 Create Default Config Sets and Workflow Skeleton\n\nWe create **one** config set for every category except `cc`, for which we\nwill create **two** (cc_1 = CC, cc_2 = PCC).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ALL_CATEGORIES = [\n    \"global\", \"preprocess\", \"filter\", \"stack\", \"refstack\",\n    \"mwcs\", \"mwcs_dtt\", \"mwcs_dtt_dvv\",\n    \"stretching\", \"stretching_dvv\",\n    \"wavelet\", \"wavelet_dtt\", \"wavelet_dtt_dvv\",\n    \"psd\", \"psd_rms\",\n]\n\nfor cat in ALL_CATEGORIES:\n    sn = create_config_set(db, cat)\n    print(f\"  {cat}_1 created (set_number={sn})\")\n\n# CC set 1 \u2014 standard cross-correlation\ncc1_sn = create_config_set(db, \"cc\")\nprint(f\"  cc_{cc1_sn} created  \u2190 standard CC\")\n\n# CC set 2 \u2014 phase cross-correlation\ncc2_sn = create_config_set(db, \"cc\")\nprint(f\"  cc_{cc2_sn} created  \u2190 PCC\")\n\ndb.commit()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4 \u00b7 Configure Global Parameters\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "OUTPUT_FOLDER = os.path.join(WORK_DIR, \"OUTPUT\")\n\nglobal_params = {\n    \"output_folder\": OUTPUT_FOLDER,\n    \"startdate\":     STARTDATE,\n    \"enddate\":       ENDDATE,\n    \"hpc\":           \"N\",     # propagate_downstream fires automatically\n}\nfor k, v in global_params.items():\n    update_config(db, k, v, category=\"global\", set_number=1)\n    print(f\"  global.{k} = {v!r}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5 \u00b7 Configure the Two CC Sets\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# \u2500\u2500 Shared CC parameters \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\ncc_shared = {\n    \"cc_sampling_rate\":  str(CC_SAMPLING_RATE),\n    \"maxlag\":            str(MAXLAG),\n    \"corr_duration\":     str(CORR_DURATION),\n    \"components_to_compute\": COMPONENTS,\n    \"keep_all\":          \"Y\",\n    \"keep_days\":         \"Y\",\n    \"whitening\":         \"A\",        # whiten all inter-station pairs\n    \"winsorizing\":       \"3.0\",      # 3\u00d7 RMS clipping\n    \"clip_after_whiten\": \"N\",\n    \"stack_method\":      \"linear\",\n}\n\nfor k, v in cc_shared.items():\n    update_config(db, k, v, category=\"cc\", set_number=cc1_sn)\n    update_config(db, k, v, category=\"cc\", set_number=cc2_sn)\n\n# \u2500\u2500 cc_1: standard CC \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nupdate_config(db, \"cc_type\", \"CC\", category=\"cc\", set_number=cc1_sn)\nupdate_config(db, \"whitening\", \"A\",   category=\"cc\", set_number=cc1_sn)\nprint(f\"cc_{cc1_sn}: cc_type = CC  (standard cross-correlation)\")\n\n# \u2500\u2500 cc_2: PCC (Phase Cross-Correlation v=2) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n#\n# What happens in the PCC branch of s03:\n#   1. The filter_N bandpass is applied to the time-domain traces so each\n#      filter produces a genuinely distinct CCF (CC does this via whiten2's\n#      frequency window; PCC needs it done explicitly).\n#   2. If whitening != \"N\", whiten2 is also applied within the band BEFORE\n#      AmpNorm.  This flattens the spectrum so the phase signal is broadband\n#      across the full filter passband rather than dominated by the most\n#      energetic frequency in the band (\"spectrally whitened PCC\",\n#      Ventosa 2019 \u00a73.1).  With whitening=\"N\" you get the classic PCC\n#      result whose bandwidth is set solely by the preprocessing bandpass.\n#\n# Winsorising: AmpNorm already discards all amplitude information per-sample,\n# so winsorising (clipping at N\u00d7RMS) is redundant for PCC.  Disable it.\nupdate_config(db, \"cc_type\",    \"PCC\", category=\"cc\", set_number=cc2_sn)\nupdate_config(db, \"winsorizing\", \"0\", category=\"cc\", set_number=cc2_sn)\nupdate_config(db, \"whitening\", \"A\", category=\"cc\", set_number=cc2_sn)\nprint(f\"cc_{cc2_sn}: cc_type = PCC (phase cross-correlation v=2)\")\nprint(f\"          winsorizing=0 (redundant: AmpNorm already discards amplitude)\")\nprint(f\"          whitening=A   (spectral whitening within band before AmpNorm)\")\n\ndb.commit()\n\n\"\"\nreset_jobs(db, \"cc_1\", alljobs=True)\nreset_jobs(db, \"cc_2\", alljobs=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 6 \u00b7 Configure Filter and Stack\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "update_config(db, \"freqmin\", str(FREQMIN), category=\"filter\", set_number=1)\nupdate_config(db, \"freqmax\", str(FREQMAX), category=\"filter\", set_number=1)\nupdate_config(db, \"CC\",      \"Y\",          category=\"filter\", set_number=1)\n\nupdate_config(db, \"mov_stack\", MOV_STACK, category=\"stack\", set_number=1)\nupdate_config(db, \"ref_begin\", STARTDATE,  category=\"refstack\", set_number=1)\nupdate_config(db, \"ref_end\",   ENDDATE,    category=\"refstack\", set_number=1)\n\ndb.commit()\nprint(f\"filter_1: {FREQMIN}\u2013{FREQMAX} Hz\")\nprint(f\"stack_1:  {MOV_STACK}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 7 \u00b7 Configure DataSource and Station Table\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Create the default DataSource (the installer normally does this; when\n# initialising the DB manually we must insert it ourselves).\nDataSource = declare_tables().DataSource\nds = DataSource(\n    name=\"local\",\n    uri=os.path.realpath(DATA_PATH),\n    data_structure=DATA_STRUCTURE,\n    auth_env=\"MSNOISE\",\n    network_code=NETWORK_CODE,\n    channels=CHANNELS,\n)\ndb.add(ds)\ndb.commit()\nprint(f\"DataSource created: uri={os.path.realpath(DATA_PATH)!r}\")\nprint(f\"                    data_structure={DATA_STRUCTURE!r}\")\n\n# Add stations\nfor net, sta, lon, lat, elev in STATIONS:\n    update_station(db, net=net, sta=sta, X=lon, Y=lat, altitude=elev,\n                   coordinates=\"DEG\", used=1)\n    print(f\"  Added station {net}.{sta}  ({lon:.3f}\u00b0E, {lat:.3f}\u00b0N)\")\n\ndb.commit()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 8 \u00b7 Build the Workflow Graph\n\nThe topology we want:\n\n```\npreprocess_1 \u2500\u2500\u25ba cc_1 \u2500\u2500\u25ba filter_1 \u2500\u2500\u25ba stack_1\n             \u2514\u2500\u2500\u25ba cc_2 \u2500\u2500\u25ba    \u2191\n```\n\nBoth cc steps feed the **same** filter_1 and stack_1.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Create WorkflowSteps from all config sets (auto-discovers preprocess_1,\n# cc_1, cc_2, filter_1, stack_1, \u2026)\ncreated, existing, err = create_workflow_steps_from_config_sets(db)\nassert err is None, f\"Error creating workflow steps: {err}\"\nprint(f\"Workflow steps: {created} created, {existing} already existed\")\n\n# Auto-link following MSNoise's canonical dependency rules\ncreated_links, existing_links, err = create_workflow_links_from_steps(db)\nassert err is None, f\"Error creating workflow links: {err}\"\nprint(f\"Workflow links: {created_links} created, {existing_links} already existed\")\n\n# Verify the topology\nsteps = {s.step_name: s for s in get_workflow_steps(db)}\nprint(\"\\nWorkflow steps present:\", sorted(steps.keys()))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 9 \u00b7 Scan Archive and Seed Jobs\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Scan the waveform archive \u2192 populate DataAvailability table\nscan_archive(init=True, threads=1)\n\n# Update loc/chan on Station rows from DataAvailability\nfrom sqlalchemy import text as _text\n_db2 = connect()\nfor sta in _db2.query(declare_tables().Station):\n    data = _db2.query(DataAvailability). \\\n        filter(_text(\"net=:net\")).filter(_text(\"sta=:sta\")). \\\n        group_by(DataAvailability.net, DataAvailability.sta,\n                 DataAvailability.loc, DataAvailability.chan). \\\n        params(net=sta.net, sta=sta.sta).all()\n    sta.used_location_codes = \",\".join(sorted({d.loc for d in data}))\n    sta.used_channel_names  = \",\".join(sorted({d.chan for d in data}))\n_db2.commit()\n_db2.close()\n\n# Verify availability\n_db3 = connect()\nn_da = _db3.query(DataAvailability).count()\nprint(f\"DataAvailability rows: {n_da}\")\n_db3.close()\n\n# Seed initial jobs (creates preprocess_1 T-jobs)\nnew_jobs(init=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 10 \u00b7 Run the Pipeline\n\nSteps run sequentially in-process.  For large datasets use `msnoise` CLI\nor HPC submission instead.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\u2500\u2500 preprocess \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\npreprocess()\n\nprint(\"\u2500\u2500 new_jobs --after preprocess \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\nnew_jobs(after=\"preprocess\")\n\nprint(\"\u2500\u2500 compute_cc (cc_1 = CC  and  cc_2 = PCC) \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\ncompute_cc()\n\nprint(\"\u2500\u2500 new_jobs --after cc \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\nnew_jobs(after=\"cc\")\n\nprint(\"\u2500\u2500 stack_mov \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\nstack_mov(stype=\"mov\")\n\nprint(\"\u2500\u2500 new_jobs --after stack \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\")\nnew_jobs(after=\"stack\")\n\nprint(\"Done.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 11 \u00b7 Gather MSNoiseResult Objects\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "db = connect()\n\n# Lineage for CC branch:  preprocess_1 / cc_1 / filter_1 / stack_1\nresult_cc = MSNoiseResult.from_ids(\n    db,\n    preprocess=1,\n    cc=cc1_sn,        # cc_1\n    filter=1,\n    stack=1,\n)\nprint(\"CC  lineage:\", result_cc.lineage_names)\n\n# Lineage for PCC branch: preprocess_1 / cc_2 / filter_1 / stack_1\nresult_pcc = MSNoiseResult.from_ids(\n    db,\n    preprocess=1,\n    cc=cc2_sn,        # cc_2\n    filter=1,\n    stack=1,\n)\nprint(\"PCC lineage:\", result_pcc.lineage_names)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 12 \u00b7 Compare Stacked CCFs\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Retrieve all stacked CCFs for both lineages\nccfs_cc  = result_cc.get_ccf()\nccfs_pcc = result_pcc.get_ccf()\n\nprint(f\"CC  result: {len(ccfs_cc)}  CCF(s) found\")\nprint(f\"PCC result: {len(ccfs_pcc)} CCF(s) found\")\n\nfor key in sorted(ccfs_cc):\n    pair, comp, ms = key\n    print(f\"  {pair}  {comp}  {ms[0]}\u2013{ms[1]}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 13 \u00b7 Plot: CC vs PCC per Pair\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig_dir = os.path.join(WORK_DIR, \"figures\")\nos.makedirs(fig_dir, exist_ok=True)\n\ncommon_keys = sorted(set(ccfs_cc) & set(ccfs_pcc))\nif not common_keys:\n    print(\"No common keys found between CC and PCC results \u2014 check that \"\n          \"the pipeline completed successfully.\")\nelse:\n    n = len(common_keys)\n    fig, axes = plt.subplots(n, 2, figsize=(14, 3.5 * n), squeeze=False)\n    fig.suptitle(\n        f\"Stacked CCF comparison: CC (left) vs PCC (right)\\n\"\n        f\"Filter {FREQMIN}\u2013{FREQMAX} Hz  |  {STARTDATE}\",\n        fontsize=13, y=1.01,\n    )\n\n    for row, key in enumerate(common_keys):\n        pair, comp, ms = key\n        da_cc  = ccfs_cc[key]\n        da_pcc = ccfs_pcc[key]\n\n        # Time axis (seconds)\n        taxis = da_cc.coords[\"taxis\"].values if \"taxis\" in da_cc.coords \\\n            else np.linspace(-MAXLAG, MAXLAG, da_cc.shape[-1])\n\n        # Stack over the time dimension if multiple days present\n        ccf_cc  = float(da_cc.mean())  if da_cc.ndim == 0 else da_cc.values\n        ccf_pcc = float(da_pcc.mean()) if da_pcc.ndim == 0 else da_pcc.values\n\n        if ccf_cc.ndim > 1:\n            ccf_cc  = ccf_cc.mean(axis=0)\n        if ccf_pcc.ndim > 1:\n            ccf_pcc = ccf_pcc.mean(axis=0)\n\n        label = f\"{pair}  {comp}\"\n\n        # \u2500\u2500 CC panel \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n        ax = axes[row, 0]\n        ax.plot(taxis, ccf_cc, color=\"#1f77b4\", lw=0.9)\n        ax.axvline(0, color=\"k\", lw=0.6, ls=\"--\", alpha=0.4)\n        ax.set_title(f\"{label}  \u2014  CC\")\n        ax.set_xlabel(\"Lag (s)\")\n        ax.set_ylabel(\"Amplitude\")\n        ax.set_xlim(-MAXLAG, MAXLAG)\n\n        # \u2500\u2500 PCC panel \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n        ax = axes[row, 1]\n        ax.plot(taxis, ccf_pcc, color=\"#d62728\", lw=0.9)\n        ax.axvline(0, color=\"k\", lw=0.6, ls=\"--\", alpha=0.4)\n        ax.set_title(f\"{label}  \u2014  PCC\")\n        ax.set_xlabel(\"Lag (s)\")\n        ax.set_ylabel(\"Amplitude\")\n        ax.set_xlim(-MAXLAG, MAXLAG)\n\n    plt.tight_layout()\n    outfig = os.path.join(fig_dir, \"cc_vs_pcc.png\")\n    fig.savefig(outfig, dpi=150, bbox_inches=\"tight\")\n    plt.show()\n    print(f\"Saved: {outfig}\")\n\n\"\"\nimport numpy as np\nA = np.fft.fft(ccf_cc)\nf = np.fft.fftfreq(len(A), d=1/20.)[:len(A)//2]\nA /= A.max()\nB = np.fft.fft(ccf_pcc)\nB /= B.max()\nplt.figure(figsize=(15,5))\nplt.plot(f, abs(A[:len(A)//2]))\nplt.plot(f, abs(B[:len(B)//2]))\n# plt.ylim(0,0.15)\nplt.xlim(0,5)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 14 \u00b7 Overlay: CC vs PCC on the same axes (one panel per pair)\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if common_keys:\n    fig2, axes2 = plt.subplots(1, len(common_keys),\n                               figsize=(6 * len(common_keys), 4),\n                               squeeze=False)\n    fig2.suptitle(\n        f\"CC (blue) vs PCC (red)  |  {FREQMIN}\u2013{FREQMAX} Hz  |  {STARTDATE}\",\n        fontsize=12,\n    )\n\n    for col, key in enumerate(common_keys):\n        pair, comp, ms = key\n        da_cc  = ccfs_cc[key]\n        da_pcc = ccfs_pcc[key]\n\n        taxis = da_cc.coords[\"taxis\"].values if \"taxis\" in da_cc.coords \\\n            else np.linspace(-MAXLAG, MAXLAG, da_cc.values.shape[-1])\n\n        ccf_cc  = da_cc.values\n        ccf_pcc = da_pcc.values\n        if ccf_cc.ndim > 1:  ccf_cc  = ccf_cc.mean(axis=0)\n        if ccf_pcc.ndim > 1: ccf_pcc = ccf_pcc.mean(axis=0)\n\n        # Normalise to ABSMAX for visual overlay\n        def _norm(x):\n            m = np.abs(x).max()\n            return x / m if m > 0 else x\n\n        ax = axes2[0, col]\n        ax.plot(taxis, _norm(ccf_cc),  color=\"#1f77b4\", lw=1.0,\n                label=\"CC\",  alpha=0.85)\n        ax.plot(taxis, _norm(ccf_pcc), color=\"#d62728\", lw=1.0,\n                label=\"PCC\", alpha=0.85)\n        ax.axvline(0, color=\"k\", lw=0.6, ls=\"--\", alpha=0.3)\n        ax.set_title(f\"{pair}  {comp}\")\n        ax.set_xlabel(\"Lag (s)\")\n        ax.set_ylabel(\"Normalised amplitude\")\n        ax.set_xlim(-MAXLAG, MAXLAG)\n        ax.legend(loc=\"upper right\", fontsize=9)\n\n    plt.tight_layout()\n    outfig2 = os.path.join(fig_dir, \"cc_vs_pcc_overlay.png\")\n    fig2.savefig(outfig2, dpi=150, bbox_inches=\"tight\")\n    plt.show()\n    print(f\"Saved: {outfig2}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 15 \u00b7 Summary Table\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n\nrows = []\nfor key in common_keys:\n    pair, comp, ms = key\n    cc_arr  = ccfs_cc[key].values\n    pcc_arr = ccfs_pcc[key].values\n    if cc_arr.ndim  > 1: cc_arr  = cc_arr.mean(axis=0)\n    if pcc_arr.ndim > 1: pcc_arr = pcc_arr.mean(axis=0)\n    rows.append({\n        \"Pair\":        pair,\n        \"Component\":   comp,\n        \"Mov. stack\":  f\"{ms[0]}\u2013{ms[1]}\",\n        \"CC peak\":     f\"{np.abs(cc_arr).max():.4g}\",\n        \"PCC peak\":    f\"{np.abs(pcc_arr).max():.4g}\",\n        \"CC  lag@peak (s)\":  f\"{np.where(np.abs(cc_arr)  == np.abs(cc_arr).max())[0][0] - len(cc_arr)//2:.1f}\",\n        \"PCC lag@peak (s)\":  f\"{np.where(np.abs(pcc_arr) == np.abs(pcc_arr).max())[0][0] - len(pcc_arr)//2:.1f}\",\n    })\n\ndf = pd.DataFrame(rows)\nprint(df.to_string(index=False))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Notes\n\n### Why PCC output looks different from CC\n\n* **CC** retains amplitude information: large-energy windows (earthquakes,\n  transients) dominate the stack even after winsorising.  The CCF contains\n  the full filter-band content weighted by window energy.\n\n* **PCC** (v=2) projects every sample of the analytic signal onto the unit\n  circle before correlating \u2014 amplitude is discarded per-sample.  The output\n  is bounded to [\u22121, 1] by construction.  The PCC branch in s03:\n  (a) bandpasses to the filter_N frequencies so each filter is distinct,\n  (b) optionally whitens within the band (controlled by `whitening` config)\n      to give a broadband phase signal (\"spectrally whitened PCC\"),\n  (c) then calls `pcc_xcorr` which computes AmpNorm + FFT cross-correlation.\n\n* **Winsorising** is disabled for `cc_2`: it clips amplitude extremes, which\n  is meaningless after AmpNorm has already equalised every sample to unit\n  amplitude.\n\n* **Whitening = \"N\" variant**: set `whitening` to `\"N\"` on `cc_2` for\n  classic (non-whitened) PCC.  The result will be dominated by the most\n  energetic frequency within the preprocessing bandpass \u2014 often visually\n  narrower than CC.  The default (`whitening=\"A\"`) gives a more\n  spectrally balanced PCC that compares directly to CC.\n\n### Adjusting the comparison\n\n* Change `FREQMIN` / `FREQMAX` in the User Settings cell and re-run from\n  \u00a76 onwards \u2014 both lineages share `filter_1`, so re-running the stack\n  step is sufficient (no need to redo preprocessing or CC).\n* The `result_cc` and `result_pcc` objects expose the full MSNoiseResult\n  API: call `.get_mwcs()`, `.get_dvv()`, etc. after running the downstream\n  steps on each lineage.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.15"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}