Compute Cross-Correlations

Computation of inter-station and single-station cross-correlation functions.

This module implements the msnoise cc compute worker. It groups jobs marked **T**odo in the database by day, marks them **I**n-progress, processes all sliding windows and filters for that day, then marks them **D**one. Multiple instances can run in parallel because each worker atomically claims its own batch of jobs.

Overview

The worker operates on 2-D arrays of shape (n_stations, N) — one row per pre-processed trace, one column per sample — and processes all station pairs simultaneously for each time window. Two independent correlation algorithms are available:

  • CC — classic Generalised Normalised Cross-Correlation (GNCC), computed in the frequency domain via FFT (see Cross-Correlation (CC) below).

  • PCC — Phase Cross-Correlation v2 (PCC2), a transient-robust alternative that operates in the time domain before taking any FFT (see Phase Cross-Correlation (PCC2) below).

The algorithm is selected per correlation mode through three independent configuration parameters: cc_type (inter-station CC), cc_type_single_station_AC (auto-correlation) and cc_type_single_station_SC (same-station cross-component).

Cross-Correlation (CC)

The classic ambient-noise cross-correlation (Shapiro & Campillo 2004; Bensen et al. 2007) is computed using the tiled-batch implementation myCorr2().

Shared pre-processing per time window (CC and SC modes — NOT AC):

  1. Temporal normalisation — optional Winsorising (clipping to winsorizing × RMS) applied either before whitening (default) or after (clip_after_whiten = Y).

  2. Time-domain taper — a cosine (Hann) taper of width cc_taper_fraction × N is applied symmetrically to both ends to suppress spectral leakage.

  3. Single bandpass per filter band — one bandpassed copy _data_bp is computed from the raw tapered data at the start of each filter iteration and reused by CC and SC modes. AC always uses _data_bp algorithms (CC, PCC). No mode bandpasses the data a second time.

Processing chain — CC algorithm:

For each requested pair \((i, j)\) in each filter band:

  • whitening = N_data_bp fed directly into the cross-spectrum:

    \[C_{ij}[k] = X_i^*[k] \cdot X_j[k], \quad X_i = \mathcal{F}\{x_i^\text{bp}[n]\}\]
  • whitening N — raw (non-bandpassed) data fed into FFT, then spectral whitening applied in-place by whiten2() (which itself bandlimits to [f_\text{low}, f_\text{high}], making a prior bandpass redundant):

    \[\tilde{X}_i[k] = \operatorname{whiten2}(\mathcal{F}\{x_i[n]\})\]

    Three whitening shapes are available via whitening_type:

    • Brutal (default) — |\tilde{X}[k]| = 1 inside the passband with a cosine-tapered transition.

    • HANN — Hann-weighted one-bit normalisation inside the passband.

    • PSD — divide by smoothed PSD, then clip outlier bins at the 5th–95th percentile.

The IFFT, folding and optional normalisation follow in all cases:

\[c_{ij}[\tau] = \mathcal{F}^{-1}\{C_{ij}[k]\} / N\]

Normalisation (cc_normalisation): POW — divide by the product of RMS energies \((e_i \cdot e_j)\); MAX — divide by the CCF maximum; ABSMAX — divide by the absolute maximum.

References

Bensen, G. D., Ritzwoller, M. H., Barmin, M. P., Levshin, A. L., Lin, F., Moschetti, M. P., Shapiro, N. M., & Yang, Y. (2007). Processing seismic ambient noise data to obtain reliable broad-band surface wave dispersion measurements. Geophysical Journal International, 169(3), 1239–1260. https://doi.org/10.1111/j.1365-246X.2007.03374.x

Shapiro, N. M., & Campillo, M. (2004). Emergence of broadband Rayleigh waves from correlations of the ambient seismic noise. Geophysical Research Letters, 31(7), L07614. https://doi.org/10.1029/2004GL019491

Phase Cross-Correlation (PCC2)

Phase Cross-Correlation (Schimmel 1999; Schimmel et al. 2011) is an alternative to the classic GNCC that discards amplitude information entirely at every sample, making it intrinsically robust against transient noise (earthquakes, instrumental glitches) without requiring explicit temporal normalisation such as one-bit or clipping. MSNoise implements PCC version 2 (PCC2), the FFT-accelerated formulation introduced by Ventosa et al. (2019, 2023) as part of the FastPCC package. The implementation is self-contained in pcc_xcorr() and does not depend on the phasecorr or fastpcc external packages.

Processing chain — PCC2 algorithm:

For each trace the shared _data_bp (already bandpassed to \([f_\text{low}, f_\text{high}]\)) is used directly:

  1. Optional spectral whitening — if whitening N, _data_bp is FFT-whitened within the band and transformed back to the time domain. This distributes energy evenly across the passband before the per-sample amplitude normalisation, so the resulting phase signal reflects broadband coherence rather than being dominated by the most energetic frequency.

  2. Analytic signal and amplitude normalisation — the complex analytic signal \(x_i^{(a)}[n]\) is computed via the Hilbert transform (scipy.signal.hilbert()). Each sample is divided by its own amplitude to produce the phase signal \(\varphi_i\):

    \[\varphi_i[n] = \frac{x_i^{(a)}[n]}{|x_i^{(a)}[n]| + \varepsilon}\]

    where \(\varepsilon = 10^{-6} \max_n |x_i^{(a)}[n]|\) is a numerical stability floor (matching FastPCC’s AmpNormf convention). By construction \(|\varphi_i[n]| \leq 1\), and amplitude transients (earthquakes, glitches) are reduced to exactly the same per-sample weight as the ambient noise — no explicit temporal normalisation is needed.

  3. Zero-padding — each phase signal is zero-padded to \(N_z = \text{next\_fast\_len}(N + \tau_\text{max})\) to compute a linear (non-circular) cross-correlation.

  4. FFT cross-spectrum and IFFT — all phase signals are pre-transformed once, then for each pair \((i, j)\):

    \[\text{PCC2}_{ij}[\tau] = \mathcal{F}^{-1}\!\left\{ \Phi_i^*[k] \cdot \Phi_j[k] \right\} \Big/ N\]

    Division by \(N\) (not \(N_z\)) gives a peak amplitude of approximately 1 for identical signals.

  5. Lag unwrapping — positive lags \(0 \ldots \tau_\text{max}\) come from full[0:\tau_\text{max}+1], negative lags \(-\tau_\text{max} \ldots -1\) from full[N_z-\tau_\text{max}:N_z].

  6. NormalisationMAX or ABSMAX as for CC; POW is silently ignored because amplitudes are discarded in step 2.

References

Schimmel, M. (1999). Phase cross-correlations: design, comparisons, and applications. Bulletin of the Seismological Society of America, 89(5), 1366–1378. https://doi.org/10.1785/BSSA0890051366

Schimmel, M., Stutzmann, E., & Gallart, J. (2011). Using instantaneous phase coherence for signal extraction from ambient noise data at a local to a global scale. Geophysical Journal International, 184(1), 494–506. https://doi.org/10.1111/j.1365-246X.2010.04861.x

Ventosa, S., Schimmel, M., & Stutzmann, E. (2019). Towards the processing of large data volumes with phase cross-correlation. Seismological Research Letters, 90(4), 1663–1669. https://doi.org/10.1785/0220190022

Ventosa, S., & Schimmel, M. (2023). FastPCC: Fast phase cross-correlation algorithm for large seismic datasets. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–17. https://doi.org/10.1109/TGRS.2023.3294302

Stacking daily windows

For each corr_duration-long window, and for each configured filter, the CCF (CC or PCC2) is computed and accumulated. If keep_all = Y the individual window CCFs are written to disk. By default (keep_days = Y), all windows for the day are stacked into a single daily CCF using either a linear mean or Phase-Weighted Stack (PWS; Schimmel & Paulssen 1997):

Note

PWS is provided as an experimental option. It has not been systematically cross-validated. Use with caution.

Schimmel, M., & Paulssen, H. (1997). Noise reduction and detection of weak, coherent signals through phase-weighted stacks. Geophysical Journal International, 130(2), 497–505. https://doi.org/10.1111/j.1365-246X.1997.tb05664.x

Configuration Parameters

The following parameters (modifiable via msnoise admin) are used for this step:

  • cc.components_to_compute : List (comma separated) of components to compute between two different stations (default=ZZ)

  • cc.components_to_compute_single_station : List (comma separated) of components within a single station. ZZ would be the autocorrelation of Z component, while ZE or ZN are the cross-components. Defaults to [], no single-station computations are done. (default=)

  • cc.cc_sampling_rate : Sampling Rate for the CrossCorrelation (in Hz) (default=20.0)

  • cc.cc_normalisation : Normalisation method for individual corr_duration) CCFs, default is not normalized (NO), but can be normalized based on the power of the two traces (POW), or by the maximum (MAX) or absolute maximum (ABSMAX) of the CCF (default=NO)

  • cc.cc_type : Cross-Correlation type: CC for classical FFT-based correlation ; PCC for Phase Crosse Correlation (FastPCC’s PCC2, Ventosa et al) (default=CC)

  • cc.cc_type_single_station_AC : Auto-Correlation type (same as for cc_type) (default=CC)

  • cc.cc_type_single_station_SC : Cross-Correlation type for Cross-Components (same as for cc_type: CC or PCC) (default=CC)

  • cc.cc_taper_fraction : Fraction of each trace to taper (cosine) before cross-correlation. Applied symmetrically to both ends. Standard seismological convention is 0.04 (4%). Range: 0.0 (no taper) to 0.5 (full Hann window). (default=0.04)

  • cc.clip_after_whiten : Do the clipping (winsorizing) after whitening? (default=N)

  • cc.overlap : Amount of overlap between data windows [0:1] (default=0.0)

  • cc.maxlag : Maximum lag (in seconds) (default=120.0)

  • cc.corr_duration : Data windows to correlate (in seconds) (default=1800.0)

  • cc.winsorizing : Winsorizing at N time RMS , 0 disables winsorizing, -1 enables 1-bit normalization (default=3.0)

  • cc.whitening : Whiten Traces before cross-correlation: [A]ll (except for autocorr), [N]one, or only if [C]omponents are different (default=A)

  • cc.whitening_type : Type of spectral whitening function to use: [B]rutal (amplitude to 1.0), divide spectrum by its [PSD] or by band-passing the white spectrum with a hanning-window [HANN]. WARNING: only works for compute_cc, not compute_cc_rot, where it will always be B (default=B)

  • cc.keep_all : Keep all cross-corr (length: corr_duration) (default=Y)

  • cc.keep_days : Keep all daily cross-corr (default=Y)

  • cc.stack_method : Stack Method: Linear Mean or Phase Weighted Stack (default=linear)

  • cc.pws_timegate : If stack_method =’pws’, width of the smoothing (in seconds) (default=10.0)

  • cc.pws_power : If stack_method =”pws”, Power of the Weighting (default=2.0)

  • global.hpc : Is MSNoise going to run on an HPC? (default=N)

Computing the Cross-Correlations

The following two diagrams describe the complete execution flow of msnoise cc compute_cc. Both are rendered from the DOT source embedded below, so they stay in sync with the code automatically.

Figure 1 — Outer job loop shows how days are consumed, time windows slid, and per-filter results accumulated and saved.

digraph cc_outer { graph [ rankdir=LR fontname="Helvetica,Arial,sans-serif" fontsize=12 nodesep=0.4 ranksep=0.6 bgcolor="white" pad=0.4 label="Figure 1 – Outer job loop (s03_compute_no_rotation)" labelloc=t labeljust=l ] node [fontname="Helvetica,Arial,sans-serif" fontsize=11 style=filled penwidth=1.5] edge [fontname="Helvetica,Arial,sans-serif" fontsize=10 color="#444444" penwidth=1.2] node [shape=oval fillcolor="#263238" fontcolor=white color="#263238"] START [label="begin"] END [label="end"] node [shape=diamond fillcolor="#E3F2FD" fontcolor="#0D1B2A" color="#1565C0"] D_JOB [label="New CC\njob?"] D_SLIDE [label="Next time\nwindow?"] D_GAPS [label="Any trace\nhas gaps?"] D_SHORT [label="All traces\ntoo short?"] D_WTYPE [label="whitening_type\n== PSD?"] D_WMODE [label="whitening\n!= N?"] D_FILTER[label="Next filter\nband?"] node [shape=diamond fillcolor="#E8EAF6" fontcolor="#0D1B2A" color="#283593"] D_KEEPALL [label="keep_all?"] D_KEEPDAY [label="keep_days?"] node [shape=box fillcolor="#FAFAFA" fontcolor="#0D1B2A" color="#607D8B" style="filled,rounded"] P_STREAM [label="Load preprocessed\nstreams for day"] P_SLIDE [label="Slide window\n(corr_duration, overlap)"] P_RMGAP [label="Remove gapped\ntraces"] P_PREP [label="Detrend / demean\nWinsorise (if !clip_after_whiten)\nCosine taper (cc_taper_fraction %)"] P_PSD [label="Precompute Welch\nPSD per station"] P_BP_N [label="Bandpass _data\n[f_low, f_high]"] P_FIDX [label="Build pair indices\n(AC · CC · SC)"] P_ACC [label="Accumulate window CCF\ninto allcorr dict"] node [shape=box fillcolor="#FFF8E1" fontcolor="#0D1B2A" color="#F9A825" style="filled,rounded,bold"] P_PERFILTER [label="▶ Per-filter processing\n(see Figure 2)"] node [shape=box fillcolor="#E8EAF6" fontcolor="#0D1B2A" color="#283593" style="filled,rounded"] P_KEEPALL_S [label="Save all-window CCFs\n(xr_save_ccf_all)"] P_STACK [label="Stack windows\n(linear or PWS)"] P_SAVEDAY [label="Save daily CCF\n(xr_save_ccf_daily)"] P_JOBS [label="massive_update_job → Done\npropagate_downstream"] // Force two rows via an invisible break node node [shape=point width=0 height=0 style=invis] BREAK // Row 1: start → window loop → prep/whitening { rank=same; START; D_JOB; END } { rank=same; P_STREAM; D_SLIDE } { rank=same; P_SLIDE; P_RMGAP; D_GAPS; D_SHORT } { rank=same; P_PREP; D_WTYPE; P_PSD } // Row break sentinel — pin BREAK at same rank as P_FIDX // Row 2: filter loop → per-filter → save { rank=same; P_FIDX; D_FILTER } { rank=same; BREAK; D_WMODE; P_BP_N; P_PERFILTER; P_ACC } { rank=same; D_KEEPALL; P_KEEPALL_S; D_KEEPDAY; P_STACK; P_SAVEDAY; P_JOBS } // Invisible structural edge to force rank break after row 1 D_WTYPE -> BREAK [style=invis] BREAK -> D_WMODE [style=invis] START -> D_JOB D_JOB -> END [label="no"] D_JOB -> P_STREAM [label="yes"] P_STREAM -> D_SLIDE D_SLIDE -> D_KEEPALL [label="no more\nwindows" constraint=false] D_SLIDE -> P_SLIDE [label="yes"] P_SLIDE -> P_RMGAP P_RMGAP -> D_GAPS D_GAPS -> D_SLIDE [label="yes" constraint=false] D_GAPS -> D_SHORT [label="no"] D_SHORT -> D_SLIDE [label="yes" constraint=false] D_SHORT -> P_PREP [label="no"] P_PREP -> D_WTYPE D_WTYPE -> P_PSD [label="yes (PSD)"] D_WTYPE -> P_FIDX [label="no"] P_PSD -> P_FIDX P_FIDX -> D_FILTER D_FILTER -> D_KEEPALL [label="no more filters" constraint=false] D_FILTER -> D_WMODE [label="yes"] D_WMODE -> P_BP_N [label="no\n(whitening=N)"] D_WMODE -> P_PERFILTER [label="yes"] P_BP_N -> P_PERFILTER P_PERFILTER -> P_ACC P_ACC -> D_FILTER [constraint=false] D_KEEPALL -> P_KEEPALL_S [label="yes"] D_KEEPALL -> D_KEEPDAY [label="no"] P_KEEPALL_S -> D_KEEPDAY D_KEEPDAY -> P_STACK [label="yes"] D_KEEPDAY -> P_JOBS [label="no"] P_STACK -> P_SAVEDAY P_SAVEDAY -> P_JOBS P_JOBS -> D_JOB [constraint=false] }

Figure 2 — Per-filter processing shows the three parallel correlation modes (AC, CC, SC) and the CC/PCC2 algorithm branch within each. The entry node carries _data_bp (bandpassed once per filter iteration, shared by all modes) and _data_raw (used only by CC/SC with whitening != N, where whiten2 applies the spectral window itself).

digraph cc_perfilter { graph [ rankdir=TB fontname="Helvetica,Arial,sans-serif" fontsize=12 nodesep=0.4 ranksep=0.5 bgcolor="white" pad=0.4 label="Figure 2 – Per-filter processing (each time window × filter band)" labelloc=t labeljust=l ] node [fontname="Helvetica,Arial,sans-serif" fontsize=11 style=filled penwidth=1.5] edge [fontname="Helvetica,Arial,sans-serif" fontsize=10 color="#444444" penwidth=1.2] node [shape=oval fillcolor="#607D8B" fontcolor=white color="#607D8B"] ENTRY [label="enter\n_data_bp (bandpassed once)\n_data_raw (for CC+whiten path)"] EXIT [label="return\n(to outer loop)"] node [shape=box fillcolor="#F0F4F8" fontcolor="#0D1B2A" color="#607D8B" style="filled,rounded"] P_ACC [label="Accumulate CCF\ninto allcorr"] subgraph cluster_ac { label="Auto-Correlation (AC)" labelloc=t color="#F9A825" style=rounded penwidth=2.0 node [shape=diamond fillcolor="#FFF9C4" fontcolor="#0D1B2A" color="#F9A825"] D_AC [label="AC pairs?"] D_ACTYPE [label="cc_type\nsingle_AC?"] D_CAW_AC [label="clip_after\nwhiten?"] node [shape=box fillcolor="#FFFDE7" fontcolor="#0D1B2A" color="#F9A825" style="filled,rounded"] P_AC_CAW [label="Winsorise (time-series)"] node [shape=box fillcolor="#FFF9C4" fontcolor="#0D1B2A" color="#F9A825" style="filled,rounded"] P_AC_CC_F [label="FFT → nfft"] P_AC_CC_W [label="whiten2 (inplace)\nif whitening != N"] P_AC_CC_E [label="Compute RMS energy"] P_AC_CC_C [label="myCorr2\n(IFFT · fold · normalise)"] P_AC_PCC_W [label="FFT → whiten2 → IFFT\n(optional, whitening != N)"] P_AC_PCC_C [label="pcc_xcorr\nHilbert → φ(t) → FFT\ncross-spec → IFFT / N"] } subgraph cluster_cc { label="Cross-Correlation (CC)" labelloc=t color="#2E7D32" style=rounded penwidth=2.0 node [shape=diamond fillcolor="#E8F5E9" fontcolor="#0D1B2A" color="#2E7D32"] D_CC [label="CC pairs?"] D_CCTYPE [label="cc_type?"] D_CAW_CC [label="clip_after\nwhiten?"] D_CC_WN [label="whitening\n!= N?"] node [shape=box fillcolor="#E8F5E9" fontcolor="#0D1B2A" color="#2E7D32" style="filled,rounded"] P_CC_RAW [label="FFT(_data_raw)\n→ whiten2 (inplace)"] P_CC_BP_F [label="FFT(_data_bp)"] P_CC_CAW [label="Winsorise (FFT domain)"] P_CC_E [label="Compute RMS energy"] P_CC_CORR [label="myCorr2\n(IFFT · fold · normalise)"] P_CC_PW [label="FFT → whiten2 → IFFT\n(optional, whitening != N)"] P_CC_PCORR [label="pcc_xcorr\nHilbert → φ(t) → FFT\ncross-spec → IFFT / N"] } subgraph cluster_sc { label="Same-station Cross-Component (SC)" labelloc=t color="#6A1B9A" style=rounded penwidth=2.0 node [shape=diamond fillcolor="#F3E5F5" fontcolor="#0D1B2A" color="#6A1B9A"] D_SC [label="SC pairs?"] D_SCTYPE [label="cc_type\nsingle_SC?"] D_CAW_SC [label="clip_after\nwhiten?"] D_SC_WN [label="whitening\n!= N?"] node [shape=box fillcolor="#F3E5F5" fontcolor="#0D1B2A" color="#6A1B9A" style="filled,rounded"] P_SC_RAW [label="FFT(_data_raw)\n→ whiten2 (inplace)"] P_SC_BP_F [label="FFT(_data_bp)"] P_SC_CAW [label="Winsorise (FFT domain)"] P_SC_E [label="Compute RMS energy"] P_SC_CORR [label="myCorr2\n(IFFT · fold · normalise)"] P_SC_PW [label="FFT → whiten2 → IFFT\n(optional, whitening != N)"] P_SC_PCORR [label="pcc_xcorr\nHilbert → φ(t) → FFT\ncross-spec → IFFT / N"] } ENTRY -> D_AC // AC D_AC -> D_CC [label="no"] D_AC -> D_CAW_AC [label="yes"] D_CAW_AC -> P_AC_CAW [label="yes"] D_CAW_AC -> D_ACTYPE [label="no"] P_AC_CAW -> D_ACTYPE D_ACTYPE -> P_AC_CC_F [label="CC"] D_ACTYPE -> P_AC_PCC_W [label="PCC"] P_AC_CC_F -> P_AC_CC_W P_AC_CC_W -> P_AC_CC_E P_AC_CC_E -> P_AC_CC_C P_AC_PCC_W -> P_AC_PCC_C P_AC_CC_C -> P_ACC P_AC_PCC_C -> P_ACC P_ACC -> D_CC // CC D_CC -> D_SC [label="no"] D_CC -> D_CCTYPE [label="yes"] D_CCTYPE -> D_CC_WN [label="CC"] D_CCTYPE -> P_CC_PW [label="PCC"] D_CC_WN -> P_CC_RAW [label="yes\n(_data_raw)"] D_CC_WN -> P_CC_BP_F [label="no\n(_data_bp)"] P_CC_RAW -> D_CAW_CC P_CC_BP_F -> D_CAW_CC D_CAW_CC -> P_CC_CAW [label="yes"] D_CAW_CC -> P_CC_E [label="no"] P_CC_CAW -> P_CC_E P_CC_E -> P_CC_CORR P_CC_PW -> P_CC_PCORR P_CC_CORR -> P_ACC P_CC_PCORR -> P_ACC P_ACC -> D_SC // SC D_SC -> EXIT [label="no"] D_SC -> D_SCTYPE [label="yes"] D_SCTYPE -> D_SC_WN [label="CC"] D_SCTYPE -> P_SC_PW [label="PCC"] D_SC_WN -> P_SC_RAW [label="yes\n(_data_raw)"] D_SC_WN -> P_SC_BP_F [label="no\n(_data_bp)"] P_SC_RAW -> D_CAW_SC P_SC_BP_F -> D_CAW_SC D_CAW_SC -> P_SC_CAW [label="yes"] D_CAW_SC -> P_SC_E [label="no"] P_SC_CAW -> P_SC_E P_SC_E -> P_SC_CORR P_SC_PW -> P_SC_PCORR P_SC_CORR -> P_ACC P_SC_PCORR -> P_ACC P_ACC -> EXIT [constraint=false] }

Usage

To run this script:

$ msnoise cc compute_cc

This step also supports parallel processing/threading:

$ msnoise -t 4 cc compute_cc

will start 4 instances of the code (after 1 second delay to avoid database conflicts). This works both with SQLite and MySQL but be aware problems could occur with SQLite.

See also

Reading these results in Python — use MSNoiseResult:

from msnoise.results import MSNoiseResult
from msnoise.core.db import connect
db = connect()
r = MSNoiseResult.from_ids(db, ...)  # include the steps you need
# then call r.get_ccf_raw(...)

See Reading outputs with MSNoiseResult for the full guide and all available methods.