Compute Cross-Correlations

This code is responsible for the computation of the cross-correlation functions.

This script will group jobs marked “T”odo in the database by day and process them using the following scheme. As soon as one day is selected, the corresponding jobs are marked “I”n Progress in the database. This allows running several instances of this script in parallel.

Configuration Parameters

  • cc_sampling_rate: Sampling Rate for the CrossCorrelation [20.0] (default=20.0)
  • analysis_duration: Duration of the Analysis (total in seconds : 3600, [86400]) (default=86400)
  • overlap: Amount of overlap between data windows [0:1[ [0.] (default=0.0)
  • maxlag: Maximum lag (in seconds) [120.0] (default=120.)
  • corr_duration: Data windows to correlate (in seconds) [1800.] (default=1800.)
  • windsorizing: Windsorizing at N time RMS , 0 disables windsorizing, -1 enables 1-bit normalization [3] (default=3)
  • resampling_method: Resampling method Decimate/[Lanczos] (default=Lanczos)
  • remove_response: Remove instrument response Y/[N] (default=N)
  • response_format: Remove instrument file format [dataless]/inventory/paz/resp (default=dataless)
  • response_path: Instrument correction file(s) location (path relative to db.ini), defaults to ‘./inventory’, i.e. a subfolder in the current project folder.<br>All files in that folder will be parsed. (default=inventory)
  • response_prefilt: Remove instrument correction pre-filter (0.005, 0.006, 30.0, 35.0) (default=(0.005, 0.006, 30.0, 35.0))
  • preprocess_lowpass: Preprocessing Low-pass value in Hz [8.0] (default=8.0)
  • preprocess_highpass: Preprocessing High-pass value in Hz [0.01] (default=0.01)
  • preprocess_max_gap: Preprocessing maximum gap length that will be filled by interpolation [10.0] seconds (default=10.0) | new in 1.6
  • keep_all: Keep all cross-corr (length: corr_duration) [Y]/N (default=N)
  • keep_days: Keep all daily cross-corr [Y]/N (default=Y)
  • stack_method: Stack Method: Linear Mean or Phase Weighted Stack: [linear]/pws (default=linear)
  • pws_timegate: If stack_method=’pws’, width of the smoothing in seconds : 10.0 (default=10.0)
  • pws_power: If stack_method=’pws’, Power of the Weighting: 2.0 (default=2.0)
  • whitening: Whiten Traces before cross-correlation: All (except for autocorr), None, or only if components are different [A]/N/C (default=A) | new in 1.5
  • hpc: Is MSNoise going to run on an HPC? Y/[N] (default=N) | new in 1.6

Waveform Pre-processing

Pairs are first split and a station list is created. The database is then queried to get file paths. For each station, all files potentially containing data for the day are opened. The traces are then merged and splitted, to obtain the most continuous chunks possible. The different chunks are then demeaned, tapered and merged again to a 1-day long trace. If a chunk is not aligned on the sampling grid (that is, start at a integer times the sample spacing in s) , the chunk is phase-shifted in the frequency domain. This requires tapering and fft/ifft. If the gap between two chunks is small, compared to a configurable value (preprocess_max_gap), the gap is filled with interpolated values. Larger gaps will not be filled with interpolated values.

Warning

As from MSNoise 1.5, traces are no longer padded by or merged with 0s.

Each 1-day long trace is then high-passed (at preprocess_highpass Hz), then if needed low-passed (at preprocess_lowpass Hz) and decimated/downsampled. Decimation/Downsampling are configurable (resampling_method) and users are advised testing Decimate. One advantage of Downsampling over Decimation is that it is able to downsample the data by any factor, not only integer factors. Downsampling can be achieved with the new ObsPy Lanczos resampler.

If configured, each 1-day long trace is corrected for its instrument response. Currently, only dataless seed and inventory XML are supported.

As from MSNoise 1.5, the preprocessing routine is separated from the compute_cc and can be used by plugins with their own parameters. The routine returns a Stream object containing all the traces for all the stations/components.

Processing

Once all traces are preprocessed, station pairs are processed sequentially. If a component different from ZZ is to be computed, the traces are first rotated. This supposes the user has provided the station coordinates in the station table. The rotation is computed for Radial and Transverse components.

Then, for each corr_duration window in the signal, and for each filter configured in the database, the traces are clipped to windsorizing times the RMS (or 1-bit converted) and then whitened in the frequency domain (see Whitening) between the frequency bounds. The whitening procedure can be skipped by setting the whitening configuration to None. The two other whitening modes are “[A]ll except for auto-correlation” or “Only if [C]omponents are different”. This allows skipping the whitening when, for example, computing ZZ components for very close by stations (much closer than the wavelength sampled), leading to spatial autocorrelation issues.

When both traces are ready, the cross-correlation function is computed (see Correlation). The function returned contains data for time lags corresponding to maxlag in the acausal (negative lags) and causal (positive lags) parts.

Stacking and Saving Results

If configured (setting keep_all to ‘Y’), each corr_duration CCF is saved to the hard disk. By default, the keep_days setting is set to True and so “N = 1 day / corr_duration” CCF are stacked and saved to the hard disk in the STACKS/001_DAYS folder.

Note

Currently, the keep-all data (every CCF) are not used by next steps.

If stack_method is ‘linear’, then a simple mean CCF of all windows is saved as the daily CCF. On the other hand, if stack_method is ‘pws’, then all the Phase Weighted Stack (PWS) is computed and saved as the daily CCF. The PWS is done in two steps: first the mean coherence between the instataneous phases of all windows is calculated, and eventually serves a weighting factor on the mean. The smoothness of this weighting array is defined using the pws_timegate parameter in the configuration. The weighting array is the power of the mean coherence array. If pws_power is equal to 0, a linear stack is done (then it’s faster to do set stack_method = ‘linear’). Usual value is 2.

Warning

PWS is largely untested, not cross-validated. It looks good, but that doesn’t mean a lot, does it? Use with Caution! And if you cross-validate it, please let us know!!

Schimmel, M. and Paulssen H., “Noise reduction and detection of weak, coherent signals through phase-weighted stacks”. Geophysical Journal International 130, 2 (1997): 497-505.

Once done, each job is marked “D”one in the database and, unless hpc is Y, STACK jobs are inserted/updated in the database.

To run this script:

$ msnoise compute_cc

This step also supports parallel processing/threading:

$ msnoise -t 4 compute_cc

will start 4 instances of the code (after 1 second delay to avoid database conflicts). This works both with SQLite and MySQL but be aware problems could occur with SQLite.

New in version 1.4: The Instrument Response removal & The Phase Weighted Stack & Parallel Processing

New in version 1.5: The Obspy Lanczos resampling method, gives similar results as the scikits.samplerate package, thus removing the requirement for it. This method is defined by default.

New in version 1.5: The preprocessing routine is separated from the compute_cc and can be called by external plugins.

New in version 1.6: The preprocess_max_gap parameter. The hpc parameter that can prevent the automatic creation of STACK jobs.