Calidad catalogos sismicos gy cesarchr I llOR6pp I S, 2011 38 pagcs Bulletin of the Seismological Society of America, Vol. 95, No. 2, pp. 684-698, April 2005, d0i: 10. 1785/0120040007 Assessing the Quality of Earthquake Catalogues: Estimating the Magnitude of Completeness and Its Uncertainty by Jochen Woessner and Stefan Wiemer Abstract We introduce a new method to determine the magnitude of completeness MC and its uncertainty.

Our method models the entire magnitude range (E-MR method) consisting of the self-similar complete part of the frequency-magnitude distribution and the incomplete portion, thus providing a omprehensive seismicity model. We compare the EMR method with three existing techni ues findin that EMR shows a superior performanc tic test cases or real data from region _ u catalogues. This method, however, is o the ou ionally intensive. ny seismicity-based Accurate knowledge studies, and particula micity parameters such as the b-value of the Gutenberg-Richter relationship.

By explicitly computing the uncertainties in Mc using a bootstrap approach, we show that uncertainties in b-values are larger than traditionally assumed, especially when considering small sample sizes. As examples, we investigated temporal variations of Mc or the 1992 Landers aftershock sequence and found that it was underestimated on average by 0. 2 with former techniques. Mapping MC on a global scale, MC reveals considerable spatial variations fort Swipe to page the Harvard Centroid Moment Tensor (CMT) (5. 3 Mc 6. 0) and the International Seismological Centre (ISC) catalogue (4. Mc 5. 0). Introduction Earthquake catalogues are one of the most important products of seismology. They provide a comprehensive database useful for numerous studies related to seismotectonics, seismicity, earthquake physics, and hazard analysis. A critical issue to be ddressed before any scientific analysis is to assess the quality, consistency, and homogeneity of the data. Any earthquake catalogue is the result of signals recorded on a complex, spatially and temporally heterogeneous neMork of seismometers, and processed by humans using a variety of software and assumptions.

Consequently, the resulting seismclty record is far from being calibrated, in the sense of a laboratory physical experiment- Thus, even the best earthquake catalogues are heterogeneous and inconsistent in space and time because of networks’ limitations to detect signals, and are likely to show s many man-made changes in reporting as natural ones (Habermann, 1987; Habermann, 1991; Habermann and Creamer, 1994; Zuniga and Wiemer, 1999). Unraveling and understanding this complex fabric is a challenging yet essential task.

In this study, we address one specific aspect of quality control: the assessment of the magnitude of completeness, MC, which is defined as the Iowest magnitude at which 100% of the events in a space-time volume are detected (Rydelek and Sacks, 1989; Taylor et al. , 1990; Wiemer and Wyss, 2000). This definition is not strict in a mathematical sense, 684 and is connected 2 8 Wiemer and Wyss, 2000). This definition is not strict in a mathematical sense, 684 and is connected to the assumption of a power-law behavior of the larger magnitudes.

Below MC, a fractlon of events is missed by the network (1) because they are too small to be recorded on enough stations; (2) because network operators decided that events below a certain threshold are not of interest; or, (3) in case of an aftershock sequence, because they are too small to be detected within the coda of larger events. We compare methods to estimate Mc based on the assumption that, for a given volume, a simple power-law can approximate the frequency-magnitude distribution (FMD).

The FMD describes the relationship between the frequency of occurrence and the magnitude of earthquakes (Ishimoto and lida, 1939; Gutenberg and Richter, 1944): logi0 N(M) a bM , (1) where N(M) refers to the frequency of earthquakes with magnitudes larger or equal than M. The b-value describes the relative size distribution of events. To estimate the bvalue, a maximum-likelihood technique is the most appropriate measure: b M log10 G) MC DVIbin 2 . (2) 685 Here M is the mean magnitude ofthe sample and DMbin is the binning width of the catalogue (Aki, 1965; Bender, 1983; Utsu, 1999).

Rydelek and Sacks (2003) criticized Wiemer and Wyss 2000), who had performed detailed mapping of MC, for using the assumption of earthquake self-similarity in their methods. However, Wiemer and Wyss (2003) maintain 38 assumption of earthquake self-similarity in their methods. However, Wiemer and Wyss (2003) mantain that the assumption of self-similarity is in most cases well founded, and that breaks in earthquake scaling claimed by Rydelek and Sacks (2003) are indeed caused by temporal and spatial heterogeneity in MC.

The assumption that seismic events are self-similar for the entire range of observable events is supported by Studies of, for example, von Seggern et al. 2003) and lde and geroza (2001 A «safe» way to deal with the dependence of b- and avalues on Mc is to choose a large value of MC, but this seems overly conservative. However, this approach decreases the amount of available data, reducing spatial and temporal resolution and increasing uncertainties due to smaller sample sizes.

Maximizing data availability while avoiding bias due to underestimated MC is desirable; moreover, it is essential when one is interested in questions such as studying breaks in magnitude scaling (Abercrombie and Bruner 1 994; Knopoff, 2000; Taylor et al. , 1990; von Seggern et al. , 2003). Unless the space-time history of Mc Mc(x,y,z,t) is taken into consideration, a study would have to conservatively assume the highest Mc observed.

It is further complicated by the need to determine Mc automatically, slnce in most applications, numerous determinatlons of Mc are needed when mapping parameters such as seismicity rates or b-values (Wiemer and Wyss, 2000; Wiemer, 2001 A reliable Mc determination is vital for numerous seismicity- and hazard-related studies. Transients in seismicity rates, for 4 38 numerous seismicity- and hazard-related studies. Transients in seismiclty rates, for example, have increasingly been scrutinized, s they are closely linked to changes in stress or strain, such as static and dynamic triggering phenomena (e. . , Gomberg et al. , 2001; Stein, 1999). Other examples of Studies that are sensitive to Mc are scaling-related investigations (Knopoff, 2000; Main, 2000) or aftershock sequences (Enescu and Ito, 2002; Woessner et al. , 2004). In aur own work on seismic quiescence (Wiemer and Wyss, 1994; Wyss and Wiemer, 2000), b-value mapping (Wiemer and Wyss, 2002; Gerstenberger et al. , 2001), and time-dependent hazard (Wiemer, 2000), for example, we often found Mc to be the most critical parameter of the analysis. Knowledge of Mc(x,y,z,t) is important, because a minute change in Mc in DMc 0. leads (assuming b 1. 0) to a 25% change in seismicity rates; a change of DMc 0. 3 reduces the rate by a factor of two. The requirements for an algorithm to determine MC in Our assessment are: (1) to calculate Mc automatically for a variety of datasets; (2) to give reliable uncertainty estimates; and (3) to conserve computer time. We specifically limit our study to techniques based on parametric data of modern earthquake catalogues. A number of researchers have investigated detection capability by studying signal-to-noise atos at particular stations (Gomberg, 1991; Kvaerna et al. 2002a,b); however, these waveform-based techniques are generally too time-consuming to be practical for most Studies. We also focus on recent instrumental catalogues, ignoring the que s 8 be practical for most studies. We also focus on recent instrumental catalogues, ignoring the question of how to best determine Mc in historical datasets commonly used in seismic hazard assessment (Albarelo et al. , 2001; Faeh et al. , 2003). In order to evaluate the performance of the different algorithms, we use synthetically-created regional and global data sets.

We believe hat the review and comparison of adaptable methods presented in this article, and the introduction of uncertainties in MC, are an important contribution for improving seismicity related studies. Data For the comparison of methods to determine MC, we chose subsets of Six different catalogues with diverse properties. The catalogues analyzed are freely available from the websites of the specific agencies: • Regional catalogue: We selected a subset of the Earthquake Catalogue of Switzerland (ECOS) of the Swiss Seismological Service (SSS) in the southern province Wallis for the period 1992-2002 (Fig. A), providing a local magnitude ML Deichmann et al. , 2002). • Regional catalogue: We chose a subset of the Northern California Seismic Network (NCSN) catalogue focused on the San Francisco gay area for the period 1998-2002, using the preferred magnitude (Fig. IB). • Volcanc reglan: We use a subset of the earthquake catalogue maintained by the National Research Institute for Earth Science and Disaster Prevention (NIED) reporting a local magnitude ML The subset spans a Small volcanic region in the Kanto province for the period 1992-2002 (Fig. C). • Aftershock sequence: We selected a seven year period ( 6 8 province for the period 1992-2002 (Fig. 1 C). ?? Aftershock sequence: We selected a seven year period (1992-1999) from the Landers 1992 MW 7. 3 aftershock sequence, using the earthquakes recorded by the Southern Callfornia Seismic Network (SCSN), a cooperative project of Caltech and the U. S. Geological Sur,’ey, distributed through the Southern California Earthquake Data center (SCEDC), reporting a local magnitude ML (Fig. ID). • Global datasets: a. he Harvard Centroid Moment Tensor (CMT) catalogue, reporting the moment magnitude MW, is used for the time period 1983-2002. Only shallow events (d 70km) are used for mapping purposes. b. the International Seismological Centre (ISC) catalogue is analyzed for the period 1980—2000 and magnitudes mb 4. 3. Only shallow events (d 70km) are used. The cut-off magnitude was chosen due to the temporal heterogeneity of the catalogue. Surface wave magnitudes are taken to equal mb in case there is none. From this point on, we refer to the generic expresslon 68 6 J. Woessner and S.

Wiemer Earthquakes used in this study: (A) Subset of the earthquake catalogue of Switzerland (ECOS) in the southern province Wallis; (B) subset of the NCSN catalogue comprising the San Francisco Bay area; (C) subset of the NIED catalogue in the Kanto province ith the triangles indicating volcanoes; and (D) the Landers 1992 aftershock sequence from the SCSN catalogue. California maps display known faults in light gray. Figure 1. «magnitude» that corresponds to the magnitude of the respective earthquake catalogue outlined above In this study, we to the magnitude ofthe respective earthquake catalogue outlined above.

In this study, we compare only methods assuming selfsimilarity of the earthquake process: 1. Entire-magnitude-range method (EMR) modified from Ogata and Katsura (1993) 2. Maximum curvature-method (MAXC) (Wiemer and Wyss, 2000) 3. Goodness- f-fit test (GFT) (Wiemer and Wyss, 2000) 4. Mc by b-value stability (MBS) (Cao and Gao, 2002) These methods are described below and are illustrated schematically in Figure 2. The code is freely available together with the seismicity analysis software package ZMAP (Wiemer, 2001), which is written in Mathworks’ commercial software language Matlab (http://www. athworks. com). Methods Methods to estimate the magnitude of completeness of earthquake catalogues are based on two fundamentally different assumptions. Most methods assume self-similarity of the earthquake process, which consequently implies a power-law istribution of earthquakes in the magnitude and in the seismic moment domain. One other approach relies on the assumption that the detection threshold due to noise decreases at night, thus the magnitude of completeness is determined using the day-to- night ratio of earthquake frequency (Rydelek and Sacks, 1989; Taylor et al. 1990). Magnitude of Completeness and Its Uncertainty EMR Method 687 We developed a method to estimate MC that uses the entire data set, including the range of magnitudes reported incompletely. Our approach is similar to that of Ogata and Katsura (1993), and uses magnitudes reported incompletely. Our approach is similar to that of Ogata and Katsura (1993), and uses a maximum- likelihood estimator for a model that consists of two parts: one to model the complete part, and one to sample the incomplete part of the frequency-magnitude distribution (Fig. 2).

We use the entire magnitude range to obtain a more robust estimate of MC, especially for mapping purposes. For data above an assumed MC, we presume a powerlaw behavior. We compute a- and b- values using a maximumlikelihood estimate for the a- and b- value (Aki, 1965; Utsu, 1965). For data below the assumed MC, a normal cumulative distribution function ll,r) that describes he detection capability as a function of magnitude is fitted to the data. l,r) denotes the probability of a seismic network to detect an earthquake of a certain magnitude and can be written as: q(Mll, r) 1 r2p 1 (3) exp (M dM, M 2r2 MC MC .

Figure 2. EMR method applied to the NCSN-catalogue data (1998- 2001): Mc 1. 2, b 0. 98, a 5. 25, 1 0. 73, r 21 . (A) Cumulative and non- cumulative FMD and model on logarithmic scale with the arrow indicating MC. (B) Normal CDF fit (gray line) to the data below MC 1 on linear scale. Standard deviations of the model, dashed gray line; original data, diamonds; non-cumulative FMD of EMR-model, ircles. (C) Choice of the best model from the maximum-likelihood estimates denoted with an arrow pointing to the resulting Mc- value.

Here, I is the magnitude at which 50% of the earthquakes are detected and r denotes the standard deviation describing the width of earthquakes are detected and r denotes the standard deviation describing the width ofthe range where earthquakes are partially detected. Higher values of r indicate that the detection capability of a speclfic network decreases faster. Earthquakes with magnitudes equal to or greater than MC are assumed to be detected with a probability of one- The free parameters I and r re estimated using a maximum-likelihood estimate.

The best fitting model is the one that maximizes the loglikelihood function for four parameters: I and r, as well as a and b. As the negative log-likelihoods are computed, we changed the sign for display reasons so that the minimum actually shows the maximum likelihood estimate in Figure 2C. The circles in Figure 2B show the best fit for the dataset In Flgure 2,A. We tested four functions to fit the incomplete part of real earthquake catalogues: three cumulative distribution functions (exponential, lognormal, and normal) and an exponential decay. latter two cumulative distribution functions (CDF) are competitive when computing the likelihood score. However, the normal CDF generally best fits the data from regional to worldwide earthquake catalogues compared to the other functions. The EMR method creates a comprehensive seismicity model. To evaluate if this model is acceptable compared to the actual data, we adopt a Kolmogorov- Smirnov test (KS test) at the 0. 05 significance level to examine the goodnessof-fit (Conover, 1999). The test assumes that the two samples are random and mutually independent. The null hy- 688 pothesis HO of the test is that the