Compute Cross-Correlations
Computation of inter-station and single-station cross-correlation functions.
This module implements the msnoise cc compute worker. It groups jobs
marked **T**odo in the database by day, marks them **I**n-progress, processes
all sliding windows and filters for that day, then marks them **D**one.
Multiple instances can run in parallel because each worker atomically claims
its own batch of jobs.
Overview
The worker operates on 2-D arrays of shape (n_stations, N) — one row per
pre-processed trace, one column per sample — and processes all station pairs
simultaneously for each time window. Two independent correlation algorithms
are available:
CC — classic Generalised Normalised Cross-Correlation (GNCC), computed in the frequency domain via FFT (see Cross-Correlation (CC) below).
PCC — Phase Cross-Correlation v2 (PCC2), a transient-robust alternative that operates in the time domain before taking any FFT (see Phase Cross-Correlation (PCC2) below).
The algorithm is selected per correlation mode through three independent
configuration parameters: cc_type (inter-station CC),
cc_type_single_station_AC (auto-correlation) and
cc_type_single_station_SC (same-station cross-component).
Cross-Correlation (CC)
The classic ambient-noise cross-correlation (Shapiro & Campillo 2004;
Bensen et al. 2007) is computed using the tiled-batch implementation
myCorr2().
Shared pre-processing per time window (CC and SC modes — NOT AC):
Temporal normalisation — optional Winsorising (clipping to
winsorizing× RMS) applied either before whitening (default) or after (clip_after_whiten = Y).Time-domain taper — a cosine (Hann) taper of width
cc_taper_fraction × Nis applied symmetrically to both ends to suppress spectral leakage.Single bandpass per filter band — one bandpassed copy
_data_bpis computed from the raw tapered data at the start of each filter iteration and reused by CC and SC modes. AC always uses _data_bp algorithms (CC, PCC). No mode bandpasses the data a second time.
Processing chain — CC algorithm:
For each requested pair \((i, j)\) in each filter band:
whitening = N—_data_bpfed directly into the cross-spectrum:\[C_{ij}[k] = X_i^*[k] \cdot X_j[k], \quad X_i = \mathcal{F}\{x_i^\text{bp}[n]\}\]whitening ≠ N— raw (non-bandpassed) data fed into FFT, then spectral whitening applied in-place bywhiten2()(which itself bandlimits to[f_\text{low}, f_\text{high}], making a prior bandpass redundant):\[\tilde{X}_i[k] = \operatorname{whiten2}(\mathcal{F}\{x_i[n]\})\]Three whitening shapes are available via
whitening_type:Brutal (default) —
|\tilde{X}[k]| = 1inside the passband with a cosine-tapered transition.HANN — Hann-weighted one-bit normalisation inside the passband.
PSD — divide by smoothed PSD, then clip outlier bins at the 5th–95th percentile.
The IFFT, folding and optional normalisation follow in all cases:
Normalisation (cc_normalisation): POW — divide by the
product of RMS energies \((e_i \cdot e_j)\); MAX — divide by
the CCF maximum; ABSMAX — divide by the absolute maximum.
References
Bensen, G. D., Ritzwoller, M. H., Barmin, M. P., Levshin, A. L., Lin, F., Moschetti, M. P., Shapiro, N. M., & Yang, Y. (2007). Processing seismic ambient noise data to obtain reliable broad-band surface wave dispersion measurements. Geophysical Journal International, 169(3), 1239–1260. https://doi.org/10.1111/j.1365-246X.2007.03374.x
Shapiro, N. M., & Campillo, M. (2004). Emergence of broadband Rayleigh waves from correlations of the ambient seismic noise. Geophysical Research Letters, 31(7), L07614. https://doi.org/10.1029/2004GL019491
Phase Cross-Correlation (PCC2)
Phase Cross-Correlation (Schimmel 1999; Schimmel et al. 2011) is an
alternative to the classic GNCC that discards amplitude information entirely
at every sample, making it intrinsically robust against transient noise
(earthquakes, instrumental glitches) without requiring explicit temporal
normalisation such as one-bit or clipping. MSNoise implements PCC version 2
(PCC2), the FFT-accelerated formulation introduced by Ventosa et al. (2019,
2023) as part of the FastPCC package. The implementation is self-contained
in pcc_xcorr() and does not depend on the
phasecorr or fastpcc external packages.
Processing chain — PCC2 algorithm:
For each trace the shared _data_bp (already bandpassed to
\([f_\text{low}, f_\text{high}]\)) is used directly:
Optional spectral whitening — if
whitening ≠ N,_data_bpis FFT-whitened within the band and transformed back to the time domain. This distributes energy evenly across the passband before the per-sample amplitude normalisation, so the resulting phase signal reflects broadband coherence rather than being dominated by the most energetic frequency.Analytic signal and amplitude normalisation — the complex analytic signal \(x_i^{(a)}[n]\) is computed via the Hilbert transform (
scipy.signal.hilbert()). Each sample is divided by its own amplitude to produce the phase signal \(\varphi_i\):\[\varphi_i[n] = \frac{x_i^{(a)}[n]}{|x_i^{(a)}[n]| + \varepsilon}\]where \(\varepsilon = 10^{-6} \max_n |x_i^{(a)}[n]|\) is a numerical stability floor (matching FastPCC’s
AmpNormfconvention). By construction \(|\varphi_i[n]| \leq 1\), and amplitude transients (earthquakes, glitches) are reduced to exactly the same per-sample weight as the ambient noise — no explicit temporal normalisation is needed.Zero-padding — each phase signal is zero-padded to \(N_z = \text{next\_fast\_len}(N + \tau_\text{max})\) to compute a linear (non-circular) cross-correlation.
FFT cross-spectrum and IFFT — all phase signals are pre-transformed once, then for each pair \((i, j)\):
\[\text{PCC2}_{ij}[\tau] = \mathcal{F}^{-1}\!\left\{ \Phi_i^*[k] \cdot \Phi_j[k] \right\} \Big/ N\]Division by \(N\) (not \(N_z\)) gives a peak amplitude of approximately 1 for identical signals.
Lag unwrapping — positive lags \(0 \ldots \tau_\text{max}\) come from
full[0:\tau_\text{max}+1], negative lags \(-\tau_\text{max} \ldots -1\) fromfull[N_z-\tau_\text{max}:N_z].Normalisation —
MAXorABSMAXas for CC;POWis silently ignored because amplitudes are discarded in step 2.
References
Schimmel, M. (1999). Phase cross-correlations: design, comparisons, and applications. Bulletin of the Seismological Society of America, 89(5), 1366–1378. https://doi.org/10.1785/BSSA0890051366
Schimmel, M., Stutzmann, E., & Gallart, J. (2011). Using instantaneous phase coherence for signal extraction from ambient noise data at a local to a global scale. Geophysical Journal International, 184(1), 494–506. https://doi.org/10.1111/j.1365-246X.2010.04861.x
Ventosa, S., Schimmel, M., & Stutzmann, E. (2019). Towards the processing of large data volumes with phase cross-correlation. Seismological Research Letters, 90(4), 1663–1669. https://doi.org/10.1785/0220190022
Ventosa, S., & Schimmel, M. (2023). FastPCC: Fast phase cross-correlation algorithm for large seismic datasets. IEEE Transactions on Geoscience and Remote Sensing, 61, 1–17. https://doi.org/10.1109/TGRS.2023.3294302
Stacking daily windows
For each corr_duration-long window, and for each configured filter, the
CCF (CC or PCC2) is computed and accumulated. If keep_all = Y the
individual window CCFs are written to disk. By default (keep_days = Y),
all windows for the day are stacked into a single daily CCF using either a
linear mean or Phase-Weighted Stack (PWS; Schimmel & Paulssen 1997):
Note
PWS is provided as an experimental option. It has not been systematically cross-validated. Use with caution.
Schimmel, M., & Paulssen, H. (1997). Noise reduction and detection of weak, coherent signals through phase-weighted stacks. Geophysical Journal International, 130(2), 497–505. https://doi.org/10.1111/j.1365-246X.1997.tb05664.x
Configuration Parameters
The following parameters (modifiable via msnoise admin) are used for
this step:
cc.components_to_compute: List (comma separated) of components to compute between two different stations (default=ZZ)cc.components_to_compute_single_station: List (comma separated) of components within a single station. ZZ would be the autocorrelation of Z component, while ZE or ZN are the cross-components. Defaults to [], no single-station computations are done. (default=)cc.cc_sampling_rate: Sampling Rate for the CrossCorrelation (in Hz) (default=20.0)cc.cc_normalisation: Normalisation method for individual corr_duration) CCFs, default is not normalized (NO), but can be normalized based on the power of the two traces (POW), or by the maximum (MAX) or absolute maximum (ABSMAX) of the CCF (default=NO)cc.cc_type: Cross-Correlation type: CC for classical FFT-based correlation ; PCC for Phase Crosse Correlation (FastPCC’s PCC2, Ventosa et al) (default=CC)cc.cc_type_single_station_AC: Auto-Correlation type (same as for cc_type) (default=CC)cc.cc_type_single_station_SC: Cross-Correlation type for Cross-Components (same as for cc_type: CC or PCC) (default=CC)cc.cc_taper_fraction: Fraction of each trace to taper (cosine) before cross-correlation. Applied symmetrically to both ends. Standard seismological convention is 0.04 (4%). Range: 0.0 (no taper) to 0.5 (full Hann window). (default=0.04)cc.clip_after_whiten: Do the clipping (winsorizing) after whitening? (default=N)cc.overlap: Amount of overlap between data windows [0:1] (default=0.0)cc.maxlag: Maximum lag (in seconds) (default=120.0)cc.corr_duration: Data windows to correlate (in seconds) (default=1800.0)cc.winsorizing: Winsorizing at N time RMS , 0 disables winsorizing, -1 enables 1-bit normalization (default=3.0)cc.whitening: Whiten Traces before cross-correlation: [A]ll (except for autocorr), [N]one, or only if [C]omponents are different (default=A)cc.whitening_type: Type of spectral whitening function to use: [B]rutal (amplitude to 1.0), divide spectrum by its [PSD] or by band-passing the white spectrum with a hanning-window [HANN]. WARNING: only works for compute_cc, not compute_cc_rot, where it will always be B (default=B)cc.keep_all: Keep all cross-corr (length:corr_duration) (default=Y)cc.keep_days: Keep all daily cross-corr (default=Y)cc.stack_method: Stack Method: Linear Mean or Phase Weighted Stack (default=linear)cc.pws_timegate: Ifstack_method=’pws’, width of the smoothing (in seconds) (default=10.0)cc.pws_power: Ifstack_method=”pws”, Power of the Weighting (default=2.0)global.hpc: Is MSNoise going to run on an HPC? (default=N)
Computing the Cross-Correlations
The following two diagrams describe the complete execution flow of
msnoise cc compute_cc. Both are rendered from the DOT source embedded
below, so they stay in sync with the code automatically.
Figure 1 — Outer job loop shows how days are consumed, time windows slid, and per-filter results accumulated and saved.
Figure 2 — Per-filter processing shows the three parallel correlation
modes (AC, CC, SC) and the CC/PCC2 algorithm branch within each. The entry
node carries _data_bp (bandpassed once per filter iteration, shared by
all modes) and _data_raw (used only by CC/SC with whitening != N,
where whiten2 applies the spectral window itself).
Usage
To run this script:
$ msnoise cc compute_cc
This step also supports parallel processing/threading:
$ msnoise -t 4 cc compute_cc
will start 4 instances of the code (after 1 second delay to avoid database conflicts). This works both with SQLite and MySQL but be aware problems could occur with SQLite.
See also
Reading these results in Python — use MSNoiseResult:
from msnoise.results import MSNoiseResult
from msnoise.core.db import connect
db = connect()
r = MSNoiseResult.from_ids(db, ...) # include the steps you need
# then call r.get_ccf_raw(...)
See Reading outputs with MSNoiseResult for the full guide and all available methods.