Compute Cross-Correlations
Computation of inter-station and single-station cross-correlation functions.
This module implements the msnoise cc compute worker. It groups jobs
marked **T**odo in the database by day, marks them **I**n-progress, processes
all sliding windows and filters for that day, then marks them **D**one.
Multiple instances can run in parallel because each worker atomically claims
its own batch of jobs.
Overview
The worker operates on 2-D arrays of shape (n_stations, N) — one row per
pre-processed trace, one column per sample — and processes all station pairs
simultaneously for each time window. Two independent correlation algorithms
are available:
CC — classic Generalised Normalised Cross-Correlation (GNCC), computed in the frequency domain via FFT (see Cross-Correlation (CC) below).
PCC — Phase Cross-Correlation v2 (PCC2), a transient-robust alternative that operates in the time domain before taking any FFT (see Phase Cross-Correlation (PCC2) below).
The algorithm is selected per correlation mode through three independent
configuration parameters: cc_type (inter-station CC),
cc_type_single_station_AC (auto-correlation) and
cc_type_single_station_SC (same-station cross-component).
Cross-Correlation (CC)
The classic ambient-noise cross-correlation (Shapiro and Campillo[1];
Bensen et al.[2]) is computed using the tiled-batch implementation
myCorr2().
Shared pre-processing per time window (CC and SC modes — NOT AC):
Temporal normalisation — optional Winsorising (clipping to
winsorizing× RMS) applied either before whitening (default) or after (clip_after_whiten = Y).Time-domain taper — a cosine (Hann) taper of width
cc_taper_fraction × Nis applied symmetrically to both ends to suppress spectral leakage.Single bandpass per filter band — one bandpassed copy
_data_bpis computed from the raw tapered data at the start of each filter iteration and reused by CC and SC modes. AC always uses _data_bp algorithms (CC, PCC). No mode bandpasses the data a second time.
Processing chain — CC algorithm:
For each requested pair \((i, j)\) in each filter band:
whitening = N—_data_bpfed directly into the cross-spectrum:\[C_{ij}[k] = X_i^*[k] \cdot X_j[k], \quad X_i = \mathcal{F}\{x_i^\text{bp}[n]\}\]whitening ≠ N— raw (non-bandpassed) data fed into FFT, then spectral whitening applied in-place bywhiten2()(which itself bandlimits to[f_\text{low}, f_\text{high}], making a prior bandpass redundant):\[\tilde{X}_i[k] = \operatorname{whiten2}(\mathcal{F}\{x_i[n]\})\]Three whitening shapes are available via
whitening_type:Brutal (default) —
|\tilde{X}[k]| = 1inside the passband with a cosine-tapered transition.HANN — Hann-weighted one-bit normalisation inside the passband.
PSD — divide by smoothed PSD, then clip outlier bins at the 5th–95th percentile.
The IFFT, folding and optional normalisation follow in all cases:
Normalisation (cc_normalisation): POW — divide by the
product of RMS energies \((e_i \cdot e_j)\); MAX — divide by
the CCF maximum; ABSMAX — divide by the absolute maximum.
References
Phase Cross-Correlation (PCC2)
Phase Cross-Correlation (Schimmel[3]; Schimmel et al.[4]) is an
alternative to the classic GNCC that discards amplitude information entirely
at every sample, making it intrinsically robust against transient noise
(earthquakes, instrumental glitches) without requiring explicit temporal
normalisation such as one-bit or clipping. MSNoise implements PCC version 2
(PCC2), the FFT-accelerated formulation introduced by Ventosa et al.[5] and
Ventosa and Schimmel[6] as part of the FastPCC package. The implementation is self-contained
in pcc_xcorr() and does not depend on the
phasecorr or fastpcc external packages.
Processing chain — PCC2 algorithm:
For each trace the shared _data_bp (already bandpassed to
\([f_\text{low}, f_\text{high}]\)) is used directly:
Optional spectral whitening — if
whitening ≠ N,_data_bpis FFT-whitened within the band and transformed back to the time domain. This distributes energy evenly across the passband before the per-sample amplitude normalisation, so the resulting phase signal reflects broadband coherence rather than being dominated by the most energetic frequency.Analytic signal and amplitude normalisation — the complex analytic signal \(x_i^{(a)}[n]\) is computed via the Hilbert transform (
scipy.signal.hilbert()). Each sample is divided by its own amplitude to produce the phase signal \(\varphi_i\):\[\varphi_i[n] = \frac{x_i^{(a)}[n]}{|x_i^{(a)}[n]| + \varepsilon}\]where \(\varepsilon = 10^{-6} \max_n |x_i^{(a)}[n]|\) is a numerical stability floor (matching FastPCC’s
AmpNormfconvention). By construction \(|\varphi_i[n]| \leq 1\), and amplitude transients (earthquakes, glitches) are reduced to exactly the same per-sample weight as the ambient noise — no explicit temporal normalisation is needed.Zero-padding — each phase signal is zero-padded to \(N_z = \text{next\_fast\_len}(N + \tau_\text{max})\) to compute a linear (non-circular) cross-correlation.
FFT cross-spectrum and IFFT — all phase signals are pre-transformed once, then for each pair \((i, j)\):
\[\text{PCC2}_{ij}[\tau] = \mathcal{F}^{-1}\!\left\{ \Phi_i^*[k] \cdot \Phi_j[k] \right\} \Big/ N\]Division by \(N\) (not \(N_z\)) gives a peak amplitude of approximately 1 for identical signals.
Lag unwrapping — positive lags \(0 \ldots \tau_\text{max}\) come from
full[0:\tau_\text{max}+1], negative lags \(-\tau_\text{max} \ldots -1\) fromfull[N_z-\tau_\text{max}:N_z].Normalisation —
MAXorABSMAXas for CC;POWis silently ignored because amplitudes are discarded in step 2.
References
Same-Station Cross-Component (SC)
When components_to_compute_single_station contains cross-component pairs
(e.g. ZN, ZE, NE) with station indices \(i = j\) but different
component indices, MSNoise computes the cross-correlation between two components
of the same sensor. SC was introduced for seismic monitoring by
De Plaen et al.[7], who showed it outperforms inter-station CC when only
a single broadband station is available:
where \(\alpha, \beta \in \{Z, N, E\}\) are distinct component labels at the same station \(i\). The SC function reconstructs an approximation to the Green’s function between two virtual sensors co-located at the same site but oriented differently, making it sensitive to surface-wave velocity changes in the shallow crust.
The algorithm (CC or PCC2) is selected independently via
cc_type_single_station_SC.
References
Auto-Correlation (AC)
When components_to_compute_single_station includes same-component pairs
(e.g. ZZ) and station indices satisfy \(i = j\), MSNoise computes
the zero-offset autocorrelation function (ACF). The ACF is the
zero-lag cross-correlation of a trace with a time-delayed copy of itself:
which is equivalent to the inverse Fourier transform of the power spectrum \(|X_i(\nu)|^2\). The ACF provides a zero-offset reflection response beneath the station — the same signal that would be recorded if the station were both source and receiver (Romero and Schimmel[8]). AC was used for volcano monitoring alongside SC by De Plaen et al.[7], and the combination of AC with PCC was demonstrated at Mt. Etna by De Plaen et al.[9].
The algorithm (CC or PCC2) is selected independently for AC via
cc_type_single_station_AC. PCC2 is strongly recommended for AC because
it suppresses the zero-lag spike inherent to the classical autocorrelogram
and is insensitive to amplitude transients without requiring explicit
temporal normalisation.
Note
Spectral whitening must not be applied before AC (set
whitening = N when cc_type_single_station_AC = PCC). Whitening
before autocorrelation collapses the ACF to a sinc function, destroying
all structural information.
References
Stacking daily windows
For each corr_duration-long window, and for each configured filter, the
CCF (CC or PCC2) is computed and accumulated. If keep_all = Y the
individual window CCFs are written to disk. By default (keep_days = Y),
all windows for the day are stacked into a single daily CCF using
linear, pws (Schimmel and Paulssen[10]), or tfpws
(Schimmel and Gallart[11]) — controlled by stack_method.
Note
PWS and tf-PWS are provided as experimental options. They have not been systematically cross-validated for CC/PCC daily stacking. Use with caution.
Configuration Parameters
The following parameters (modifiable via msnoise admin) are used for
this step:
cc.components_to_compute: List (comma separated) of components to compute between two different stations (default=ZZ)cc.components_to_compute_single_station: List (comma separated) of components within a single station. ZZ would be the autocorrelation of Z component, while ZE or ZN are the cross-components. Defaults to [], no single-station computations are done. (default=)cc.cc_sampling_rate: Sampling Rate for the CrossCorrelation (in Hz) (default=20.0)cc.cc_normalisation: Normalisation method for individual corr_duration) CCFs, default is not normalized (NO), but can be normalized based on the power of the two traces (POW), or by the maximum (MAX) or absolute maximum (ABSMAX) of the CCF (default=NO)cc.cc_type: Cross-Correlation type: CC for classical FFT-based correlation ; PCC for Phase Crosse Correlation (FastPCC’s PCC2, Ventosa et al) (default=CC)cc.cc_type_single_station_AC: Auto-Correlation type (same as for cc_type) (default=CC)cc.cc_type_single_station_SC: Cross-Correlation type for Cross-Components (same as for cc_type: CC or PCC) (default=CC)cc.cc_taper_fraction: Fraction of each trace to taper (cosine) before cross-correlation. Applied symmetrically to both ends. Standard seismological convention is 0.04 (4%). Range: 0.0 (no taper) to 0.5 (full Hann window). (default=0.04)cc.clip_after_whiten: Do the clipping (winsorizing) after whitening? (default=N)cc.overlap: Amount of overlap between data windows [0:1] (default=0.0)cc.maxlag: Maximum lag (in seconds) (default=120.0)cc.corr_duration: Data windows to correlate (in seconds) (default=1800.0)cc.winsorizing: Winsorizing at N time RMS , 0 disables winsorizing, -1 enables 1-bit normalization (default=3.0)cc.whitening: Whiten Traces before cross-correlation: [A]ll (except for autocorr), [N]one, or only if [C]omponents are different (default=A)cc.whitening_type: Type of spectral whitening function to use: [B]rutal (amplitude to 1.0), divide spectrum by its [PSD] or by band-passing the white spectrum with a hanning-window [HANN]. WARNING: only works for compute_cc, not compute_cc_rot, where it will always be B (default=B)cc.keep_all: Keep all cross-corr (length:corr_duration) (default=Y)cc.keep_days: Keep all daily cross-corr (default=Y)cc.stack_method: Stack Method: Linear Mean or Phase Weighted Stack or Time-Frequency Phase Weighted Stack (default=linear)cc.pws_timegate: Ifstack_method=’pws’, width of the smoothing (in seconds) (default=10.0)cc.pws_power: Ifstack_methodispwsortfpws, power of the coherence weight (default=2.0)global.hpc: Is MSNoise going to run on an HPC? (default=N)
Computing the Cross-Correlations
The following two diagrams describe the complete execution flow of
msnoise cc compute_cc. Both are rendered from the DOT source embedded
below, so they stay in sync with the code automatically.
Figure 1 — Outer job loop shows how days are consumed, time windows slid, and per-filter results accumulated and saved.
digraph cc_outer { graph [ rankdir=LR fontname="Helvetica,Arial,sans-serif" fontsize=12 nodesep=0.4 ranksep=0.6 bgcolor="white" pad=0.4 label="Figure 1 – Outer job loop (s03_compute_no_rotation)" labelloc=t labeljust=l ] node [fontname="Helvetica,Arial,sans-serif" fontsize=11 style=filled penwidth=1.5] edge [fontname="Helvetica,Arial,sans-serif" fontsize=10 color="#444444" penwidth=1.2] node [shape=oval fillcolor="#263238" fontcolor=white color="#263238"] START [label="begin"] END [label="end"] node [shape=diamond fillcolor="#E3F2FD" fontcolor="#0D1B2A" color="#1565C0"] D_JOB [label="New CC\njob?"] D_SLIDE [label="Next time\nwindow?"] D_GAPS [label="Any trace\nhas gaps?"] D_SHORT [label="All traces\ntoo short?"] D_WTYPE [label="whitening_type\n== PSD?"] D_WMODE [label="whitening\n!= N?"] D_FILTER[label="Next filter\nband?"] node [shape=diamond fillcolor="#E8EAF6" fontcolor="#0D1B2A" color="#283593"] D_KEEPALL [label="keep_all?"] D_KEEPDAY [label="keep_days?"] node [shape=box fillcolor="#FAFAFA" fontcolor="#0D1B2A" color="#607D8B" style="filled,rounded"] P_STREAM [label="Load preprocessed\nstreams for day"] P_SLIDE [label="Slide window\n(corr_duration, overlap)"] P_RMGAP [label="Remove gapped\ntraces"] P_PREP [label="Detrend / demean\nWinsorise (if !clip_after_whiten)\nCosine taper (cc_taper_fraction %)"] P_PSD [label="Precompute Welch\nPSD per station"] P_BP_N [label="Bandpass _data\n[f_low, f_high]"] P_FIDX [label="Build pair indices\n(AC · CC · SC)"] P_ACC [label="Accumulate window CCF\ninto allcorr dict"] node [shape=box fillcolor="#FFF8E1" fontcolor="#0D1B2A" color="#F9A825" style="filled,rounded,bold"] P_PERFILTER [label="▶ Per-filter processing\n(see Figure 2)"] node [shape=box fillcolor="#E8EAF6" fontcolor="#0D1B2A" color="#283593" style="filled,rounded"] P_KEEPALL_S [label="Save all-window CCFs\n(xr_save_ccf_all)"] P_STACK [label="Stack windows\n(linear or PWS)"] P_SAVEDAY [label="Save daily CCF\n(xr_save_ccf_daily)"] P_JOBS [label="massive_update_job → Done\npropagate_downstream"] // Force two rows via an invisible break node node [shape=point width=0 height=0 style=invis] BREAK // Row 1: start → window loop → prep/whitening { rank=same; START; D_JOB; END } { rank=same; P_STREAM; D_SLIDE } { rank=same; P_SLIDE; P_RMGAP; D_GAPS; D_SHORT } { rank=same; P_PREP; D_WTYPE; P_PSD } // Row break sentinel — pin BREAK at same rank as P_FIDX // Row 2: filter loop → per-filter → save { rank=same; P_FIDX; D_FILTER } { rank=same; BREAK; D_WMODE; P_BP_N; P_PERFILTER; P_ACC } { rank=same; D_KEEPALL; P_KEEPALL_S; D_KEEPDAY; P_STACK; P_SAVEDAY; P_JOBS } // Invisible structural edge to force rank break after row 1 D_WTYPE -> BREAK [style=invis] BREAK -> D_WMODE [style=invis] START -> D_JOB D_JOB -> END [label="no"] D_JOB -> P_STREAM [label="yes"] P_STREAM -> D_SLIDE D_SLIDE -> D_KEEPALL [label="no more\nwindows" constraint=false] D_SLIDE -> P_SLIDE [label="yes"] P_SLIDE -> P_RMGAP P_RMGAP -> D_GAPS D_GAPS -> D_SLIDE [label="yes" constraint=false] D_GAPS -> D_SHORT [label="no"] D_SHORT -> D_SLIDE [label="yes" constraint=false] D_SHORT -> P_PREP [label="no"] P_PREP -> D_WTYPE D_WTYPE -> P_PSD [label="yes (PSD)"] D_WTYPE -> P_FIDX [label="no"] P_PSD -> P_FIDX P_FIDX -> D_FILTER D_FILTER -> D_KEEPALL [label="no more filters" constraint=false] D_FILTER -> D_WMODE [label="yes"] D_WMODE -> P_BP_N [label="no\n(whitening=N)"] D_WMODE -> P_PERFILTER [label="yes"] P_BP_N -> P_PERFILTER P_PERFILTER -> P_ACC P_ACC -> D_FILTER [constraint=false] D_KEEPALL -> P_KEEPALL_S [label="yes"] D_KEEPALL -> D_KEEPDAY [label="no"] P_KEEPALL_S -> D_KEEPDAY D_KEEPDAY -> P_STACK [label="yes"] D_KEEPDAY -> P_JOBS [label="no"] P_STACK -> P_SAVEDAY P_SAVEDAY -> P_JOBS P_JOBS -> D_JOB [constraint=false] }Figure 2 — Per-filter processing shows the three parallel correlation
modes (AC, CC, SC) and the CC/PCC2 algorithm branch within each. The entry
node carries _data_bp (bandpassed once per filter iteration, shared by
all modes) and _data_raw (used only by CC/SC with whitening != N,
where whiten2 applies the spectral window itself).
Usage
To run this script:
$ msnoise cc compute_cc
This step also supports parallel processing/threading:
$ msnoise -t 4 cc compute_cc
will start 4 instances of the code (after 1 second delay to avoid database conflicts). This works both with SQLite and MySQL but be aware problems could occur with SQLite.
See also
Reading these results in Python — use MSNoiseResult:
from msnoise.results import MSNoiseResult
from msnoise.core.db import connect
db = connect()
r = MSNoiseResult.from_ids(db, ...) # include the steps you need
# then call r.get_ccf_raw(...)
See Reading outputs with MSNoiseResult for the full guide and all available methods.