TY - JOUR
T1 - Local earthquakes detection
T2 - A benchmark dataset of 3-component seismograms built on a global scale
AU - Magrini, Fabrizio
AU - Jozinović, Dario
AU - Cammarano, Fabio
AU - Michelini, Alberto
AU - Boschi, Lapo
N1 - Publisher Copyright:
© 2020 The Author(s)
PY - 2020/12
Y1 - 2020/12
N2 - Machine learning is becoming increasingly important in scientific and technological progress, due to its ability to create models that describe complex data and generalize well. The wealth of publicly-available seismic data nowadays requires automated, fast, and reliable tools to carry out a multitude of tasks, such as the detection of small, local earthquakes in areas characterized by sparsity of receivers. A similar application of machine learning, however, should be built on a large amount of labeled seismograms, which is neither immediate to obtain nor to compile. In this study we present a large dataset of seismograms recorded along the vertical, north, and east components of 1487 broad-band or very broad-band receivers distributed worldwide; this includes 629,095 3-component seismograms generated by 304,878 local earthquakes and labeled as EQ, and 615,847 ones labeled as noise (AN). Application of machine learning to this dataset shows that a simple Convolutional Neural Network of 67,939 parameters allows discriminating between earthquakes and noise single-station recordings, even if applied in regions not represented in the training set. Achieving an accuracy of 96.7, 95.3, and 93.2% on training, validation, and test set, respectively, we prove that the large variety of geological and tectonic settings covered by our data supports the generalization capabilities of the algorithm, and makes it applicable to real-time detection of local events. We make the database publicly available, intending to provide the seismological and broader scientific community with a benchmark for time-series to be used as a testing ground in signal processing.
AB - Machine learning is becoming increasingly important in scientific and technological progress, due to its ability to create models that describe complex data and generalize well. The wealth of publicly-available seismic data nowadays requires automated, fast, and reliable tools to carry out a multitude of tasks, such as the detection of small, local earthquakes in areas characterized by sparsity of receivers. A similar application of machine learning, however, should be built on a large amount of labeled seismograms, which is neither immediate to obtain nor to compile. In this study we present a large dataset of seismograms recorded along the vertical, north, and east components of 1487 broad-band or very broad-band receivers distributed worldwide; this includes 629,095 3-component seismograms generated by 304,878 local earthquakes and labeled as EQ, and 615,847 ones labeled as noise (AN). Application of machine learning to this dataset shows that a simple Convolutional Neural Network of 67,939 parameters allows discriminating between earthquakes and noise single-station recordings, even if applied in regions not represented in the training set. Achieving an accuracy of 96.7, 95.3, and 93.2% on training, validation, and test set, respectively, we prove that the large variety of geological and tectonic settings covered by our data supports the generalization capabilities of the algorithm, and makes it applicable to real-time detection of local events. We make the database publicly available, intending to provide the seismological and broader scientific community with a benchmark for time-series to be used as a testing ground in signal processing.
KW - Benchmark dataset
KW - Earthquake detection algorithm
KW - Seismology
KW - Supervised machine learning
UR - http://www.scopus.com/inward/record.url?scp=85108273121&partnerID=8YFLogxK
U2 - 10.1016/j.aiig.2020.04.001
DO - 10.1016/j.aiig.2020.04.001
M3 - Article
AN - SCOPUS:85108273121
SN - 2666-5441
VL - 1
SP - 1
EP - 10
JO - Artificial Intelligence in Geosciences
JF - Artificial Intelligence in Geosciences
ER -