Sage Journals: Discover world-class research

Abstract

We present a application programming interface (API)-enabled relational database of global earthquake ground motion intensity measures, associated metadata, and processed time-series data. Raw ground motion records were processed by the authors using either manual or semi-automated processing procedures, and every processed record has passed a quality review by a trained analyst. Computed intensity measures include peak acceleration and velocity, pseudo-spectral acceleration response spectra, cumulative absolute velocity, Arias Intensity, and Fourier amplitude spectra. The processed time-series data, associated metadata, and ground motion intensity measures were organized into a web-served relational database consisting of 32 tables connected by primary/foreign key pairs. Ground motion metadata and intensity measures (but not time-series) from the Next-Generation Attenuation (NGA)-East and NGA-West2 projects and the Hellenic Strong-Motion Database are also contained in the database. As of this writing (June 2025) the database includes intensity measures and metadata for 76,242 multi-component ground motions recorded at 9927 stations for 1391 events, and is approximately 73.5 GB in size. The database is built using the MySQL relational database management system, and is accessible through a web interface and also an API, which allows users to retrieve data using straightforward and intuitive uniform resource locators (URLs). Compared with more traditional file-download-based methods for data release, the relational database (1) increases storage efficiency, (2) improves data integrity, and (3) enables users to query the data subset they wish to retrieve rather than downloading the entire database and loading it into memory. Furthermore, the web-served nature of the database means that users have immediate access to ground motion data following collection, review, and uploading. Periodic static releases of the database will be published as a means of archiving and facilitating reproducibility. The database has been designed to accommodate growth, with ongoing efforts to integrate global ground motion data (e.g. data development for the NGA-West3 project).

Keywords

Ground motion relational database intensity measures metadata application programming interface

Introduction and project motivation

There have been many efforts to develop collections of ground motion data and metadata (e.g. Ancheta et al., 2014; Bahrampouri et al., 2021; Bozorgnia and Stewart, 2020; Castro et al., 2022; Goulet et al., 2021; Hutchinson et al., 2024; Ji et al., 2023; Luzi et al., 2016; Palmer et al., 2022; Rekoske et al., 2020; Rennolet et al., 2018; Sandikkaya et al., 2024). While these efforts have produced substantial volumes of data, there are potential issues for researchers and engineers who wish to use all combinations of datasets for model development. These issues are related to inconsistencies and incompatibilities between databases developed in different projects. Inconsistencies may arise in the form of differences in processing procedures, metadata assignment protocols, or how intensity measures are computed. Incompatibilities arise from different organizational structures used in different databases. Data integrity may also arise within a database depending on how the data are organized (different parameter values for the same site in a flatfile, typos, etc.). When researchers and engineers utilize data from multiple databases these issues need to be considered.

The Engineering Strong-Motion Database (ESM) (Luzi et al., 2016) organized ground motion data and metadata coming from different sources in the European-Mediterranean region. The Next-Generation Attenuation (NGA) program developed a commonly used suite of databases that are divided by tectonic regime: shallow crustal events in active tectonic regions (NGA-West2—Ancheta et al., 2014), crustal events in stable continental regions (NGA-East—Goulet et al., 2021), and subduction earthquakes (NGA-Sub—Bozorgnia and Stewart, 2020). Each of the NGA projects has developed a uniformly processed dataset of recorded earthquake ground motions and associated source, path, and site metadata; however there is no linkage across the three databases. Since completion of the NGA-West2 and -East data collection efforts around 2011, the amount of seismic instrumentation has increased, particularly in the state of California (Kuyuk and Allen, 2013), which means that the quantity and spatial distribution of data produced from a single event can be much greater than similar events included in earlier datasets. This has led to efforts to curate databases from more recent events (e.g. Farghal et al., 2020; Ji et al., 2023; Rekoske et al., 2020).

Ground motion processing has historically been performed manually for each record, but processing tools have evolved to a point where fully automated procedures can be used to select corner frequencies, accept/reject records based on measured quantities like signal-to-noise ratio (SNR), assign metadata, and compute intensity measures. One such approach is implemented in the Python package gmprocess developed by the United States Geological Survey (USGS) (Hearne et al., 2019; Thompson et al., 2025). Automation is attractive because it significantly reduces the time required to process each motion and collect the corresponding metadata and intensity measures for each recording. Fully automated procedures have been leveraged to assemble empirical databases for ground motion studies in the United States and elsewhere (e.g. Aagaard, 2023a, 2023b, 2023c; Bahrampouri et al., 2021; Cabas et al., 2023; Farghal et al., 2020; Gomez et al., 2022; Hutchinson et al., 2024; Ji et al., 2023; Parker and Baltay, 2022; Rekoske et al. 2020, 2022; Thompson et al., 2023b; Wong et al., 2022). Some of these efforts implemented additional quality assurance measures beyond the default set of checks performed in the automated routines such as checking for multiple wave-train arrivals and/or comparing computed intensity measures to expected values from ground motion models (GMMs) (e.g. Ji et al., 2023; Rekoske et al. 2020). When such additional measures are implemented, they are not always well documented.

Creating algorithms that match the judgment of experienced analysts is challenging, and because that judgment is used to ensure data quality while optimizing usable bandwidth, additional measures of quality assurance may be required for accepting/rejecting recordings and for corner frequency selection for data sets to be used in important applications. Furthermore, essential metadata, such as the preferred finite-fault solutions and basin depth parameters that have been used in the NGA projects (e.g. Ahdi et al., 2022; Contreras et al., 2022; Goulet et al., 2021b), are not contained in the databases queried by gmprocess; therefore, supplemental data collection efforts are often needed.

The subject of this article is a ground motion database (GMDB) that has been developed for engineering applications and, at the time of writing (June 2025), contains data from several published databases including NGA-West2, NGA-East, and the Hellenic strong-motion database (Margaris et al., 2021), as well as data from more recent events that was processed by one or more of the authors. An important feature of the data included in the GMDB is the application of consistent protocols that include human inspection during record processing and source, path, and site metadata compilation. Following this introduction, we present the ground motion data stored in the database, and the procedures used to process new data and assign metadata using protocols consistent with those established during NGA projects. The GMDB has been assembled as a relational database, the organization of which is presented after discussion of the data. Finally, we discuss how to interact with the GMDB through a publicly accessible web portal and application programming interface (API).

Ground motion datasets

The GMDB includes ground motions (including time-series) processed by one or more of the authors for recent events in the western US (WUS), central and eastern North America (CENA), and Türkiye. The GMDB metadata (not including time-series) compilation began with the following published datasets: the NGA-West2 global database for shallow crustal earthquakes in active tectonic regions (Ancheta et al., 2014), the NGA-East database for CENA for shallow crustal earthquakes in stable continental regions (Goulet et al., 2021), and the Hellenic database from mostly shallow crustal and some subduction earthquakes within and near Greece assembled using protocols consistent with NGA projects (Margaris et al., 2021). As appropriate, metadata has been updated using results of more recent studies (e.g. updated site parameters based on new site measurements). The information from published databases were adapted into the GMDB data structure presented below, but otherwise were not modified unless explicitly stated.

The authors have processed ground motions in a consistent manner for a number of events since 2011 as part of prior studies. Those studies include site response and path studies in southern and northern California (Buckreis et al., 2023b, 2024a; Nweke et al., 2022; Wang, 2020 and Mohammed et al., 2025), studies to assess the combined data misfits of NGA-East GMMs with site response models (Ramos-Sepulveda et al., 2024), studies to examine the usability of ground motions recorded by the Community Seismic Network (CSN) (Mohammed et al., 2024), and ground motion analysis for the 2014 Napa earthquake, 2019 Ridgecrest earthquake sequence, and 2023 Kahramanmaraş, Türkiye earthquake sequence (Ahdi et al., 2020; Kishida et al. 2014 and Buckreis et al., 2024b, respectively). These efforts significantly increased the quantity of available ground motions in California, CENA, and the Mediterranean relative to NGA databases (Figure 1). The “Other” slice shown in Figure 1 includes data from Alaska, China, Japan, Taiwan, and New Zealand from the NGA-West2 database. The following subsection describes the different data collection efforts.

Figure 1.

Distribution of recordings in the Ground Motion DataBase (GMDB) by region and collection study.

Data selection, processing, and distributions

Although the domains of the recent collection efforts contributing to the GMDB are different, protocols consistent with NGA projects were used to select and curate the data in each study. In all cases, only events with magnitudes (M) ≥ 4.0 since 2011 were initially considered because spectral shape effects at small M can produce apparent and non-physical site nonlinearities (Stafford et al., 2017). A few events that were processed ended up having M < 4.0 after the metadata were refined. Wang (2020) and Buckreis et al. (2022) collectively processed records from 26 events recorded in the greater Los Angeles area and 72 events recorded in the Sacramento-San Joaquin Delta region, respectively. Ramos-Sepúlveda et al. (2024) processed records for 101 CENA events. Mohammed et al. (2024) processed data from CSN and traditional sensors for 28 greater Los Angeles area earthquakes. Mohammed et al. (2025) processed data from 28 San Francisco Bay Area (SFBA) events that had not previously been processed by Buckreis et al. (2022). Kishida et al. (2014) processed data from the August 24, 2014 (UTC) M6.02 South Napa earthquake. Ahdi et al. (2020) processed data from the July 4 M6.48, July 5 M5.47, and July 6 M7.06 (UTC dates) events in the 2019 Ridgecrest earthquake sequence. Buckreis et al. (2024b) processed data for the February 6 M7.81, M6.81, and M7.74, and February 20 M6.37 (UTC dates) events in the 2023 Kahramanmaraş, Türkiye earthquake sequence. Figures 2 –4 show the locations of events and stations in each of these studies.

Figure 2.

Locations of event focal mechanisms and stations in California and neighboring states. Symbol colors represent different studies as shown in the legend.

Figure 3.

Locations of event focal mechanisms and stations in Central and Eastern North America (CENA). Symbol colors represent different studies as shown in the legend. Boundary between Western United States (WUS) and CENA given by Moschetti et al. (2024).

Figure 4.

Locations of event focal mechanisms and stations in the Mediterranean. Symbol colors represent different studies as shown in the legend.

Raw time-series records for all available stations for each U.S. event were obtained from the Incorporated Research Institutions for Seismology (IRIS; Trabant et al., 2012) or directly from seismic network operators. Records from events in and around California were cross-checked against those available from the Center for Engineering Strong Motion Data (CESMD) data repository maintained by the California Strong Motion Instrumentation Program (CSMIP) (https://www.strongmotioncenter.org/) and those from the California Department of Water Resources (DWR) seismic network. Buckreis et al. (2024b) obtained waveforms from the Earthquake Data Center System of Türkiye (TDVMS; https://tdvms.afad.gov.tr/) and IRIS. For all regions, data screening was performed to remove apparently unreliable and duplicated records, which may exist due to multiple collocated instruments at a site. In the event of collocated accelerometers and seismometers with meaningful recorded signals, which occurred for portions of the data processed by Kishida et al. (2014), Ahdi et al. (2020), Wang (2020), Buckreis et al. (2022), Buckreis et al. (2024b), Ramos-Sepúlveda et al. (2024), and Mohammed et al. (2024), we preferred the motion recorded by the seismometer unless there is evidence of amplitude clipping, in which case the time-series recorded by the accelerometer is used. In the case of data prepared by Mohammed et al. (2025), and anticipated future efforts, the motion with the widest usable bandwidth as a function of corner frequencies selected during signal processing and sampling rate are preferred.

Each record component was processed individually according to standard protocols developed within NGA projects (e.g. Goulet et al., 2021). In the case of the records assembled in Ahdi et al. (2020), Wang (2020), and Buckreis et al. (2022, 2024b), signal processing was performed manually in R (R Core Team, 2022) using a processing code developed during NGA-West2 (Ancheta et al., 2014). The records assembled by Ramos-Sepúlveda et al. (2024) and Mohammed et al. (2025, 2024) were processed semi-automatically using a version of gmprocess (Thompson et al., 2025) that automatically adjusts high-pass corner frequencies to remove displacement artifacts and includes a manual review using a tool that allows corner frequencies to be adjusted or records to be rejected (Ramos-Sepúlveda et al., 2025). Because every record is reviewed and potentially adjusted by an analyst, the manner in which we have used gmprocess is equivalent to the NGA processing procedure, but is significantly more efficient because the automated selections made by gmprocess often agree with analysts’ judgments. Both the manual and semi-automated methods consider SNR of the windowed Fourier amplitude spectra (FAS), assess the displacement trace in the time domain, and assess the shape of the pseudo-acceleration response spectrum to select high- and low-pass corner frequencies applied using acausal Butterworth filters. The high-pass corner frequencies selected using gmprocess are lower on average than those selected manually, which can affect long-period response spectral ordinates (e.g. Section 3.2.3.2 of Kramer and Stewart, 2025).

In total, we processed 34,371 records in California, 7029 records in CENA, and 1215 records in Türkiye as summarized in Table 1. The distribution of these data with respect to M, closest distance to the rupture surface (R_RUP), and site condition as represented by time-averaged shear-wave velocity in the upper 30 m of the site (V_S₃₀) are shown in, Figures 5 –7. The additional data expand significantly the number of records over a wide range of M, distance, and site conditions compared with the previous NGA databases. Figure 8 presents the number of usable horizontal ground motions per period defined as T < 1/(1.25f_cHP), where f_cHP is the greater of the two horizontal high-pass corner frequencies selected during signal processing. The recently added data in California have a more rapid rate of decay of number of usable records with period for T > 10 s than the NGA data. This likely results from the relatively small M of the recently added events, which produce less energy at long periods and increases the likelihood that records will be noise-dominated at these long periods. Earthquake ground motions in CENA contain more high-frequency content than compared with events in WUS; therefore, the shortest-usable period is also taken into consideration (Boore and Goulet, 2014). The records processed by Ramos-Sepúlveda et al. (2024) have a higher percentage of usable short-period spectral accelerations than the previous NGA-East data. This is attributed to improvements in sensor quality and increases in Nyquist frequency since the NGA-East project, thereby improving SNR at short-periods and widening the usable frequency range.

Table 1.

Summary of the number of events, stations, and records in each dataset

Dataset	No. of Events	No. of Stations	No. of Records
NGA-West2: Bozorgnia et al. (2014)	599	4152	21,502
Napa: Kishida et al. (2014)	1	419	419
Ridgecrest: Ahdi et al. (2020)	3	816	1402
Southern CA: Wang (2020)	28	980	6225
NGA-East: Goulet et al. (2021)	82	1271	9376
Greece: Margaris et al. (2021)	458	324	2749
Delta: Buckreis et al. (2022)	70	1423	9988
Türkiye: Buckreis et al. (2024b)	4	502	1215
CENA: Ramos-Sepúlveda et al. (2024)	101	1517	7029
CSN: Mohammed et al. (2024)	28	2186	11,195
SFBA: Mohammed et al. (2025)	28	1125	5142

Figure 5.

Data distributions for California and neighboring states with respect to magnitude (M), rupture distance (R_RUP), and site condition (V_S30).

Figure 6.

Data distributions for CENA with respect to M, R_RUP, and V_S₃₀.

Figure 7.

Data distributions for the Mediterranean region with respect to M, R_RUP, and V_S₃₀.

Figure 8.

Number of usable ground motion as a function of period for (a) California and neighboring states, (b) Central and Eastern North America (CENA), and (c) Mediterranean region.

Intensity measures

Intensity measures (IMs) are parameters computed from ground motion time-series and are commonly used to quantify certain engineering attributes of ground motions. NGA databases typically include peak IMs such as peak ground acceleration (PGA) and peak ground velocity (PGV), 5%-damped pseudo-spectral accelerations (PSA) for 111 oscillator periods between 0.01 and 20 s, Arias Intensity (I_A; Arias, 1970), and times corresponding to select percentiles of I_A used to facilitate calculation of significant durations (I_A-times). NGA-Sub and NGA-East expanded the selection of IMs to include Equivalent Amplitude Spectra (EAS), which combines the two horizontal components of FAS along with smoothing and downsampling (Kottke et al., 2021). Other IMs include the cumulative absolute velocity (CAV; Electrical Power Research Institute (EPRI), 1988) and the CAV computed on acceleration time histories considering a threshold of 5 cm/s² (CAV₅; Kramer and Mitchell, 2006).

In the present dataset, all IMs excluding EAS are calculated for individual components (two orthogonal-horizontal and one vertical) as well as the combined horizontal minimum-, median-, and maximum-component (RotD0, RotD50, and RotD100, respectively) defined by Boore (2010). PGA, PGV, and PSA are obtained using the RCTC package in R (Wang et al., 2017). I_A, I_A-times, CAV, and CAV₅ are computed using Python codes validated against NGA codes. EAS is calculated using Python routines consistent with Kottke (2020). PSA are computed at 111 oscillator periods between 0.01 and 20 s, and EAS are computed at 390 frequencies between 0.01318257 and 102.3293 Hz (100 points evenly log-sampled per decade). These periods and frequencies correspond to the standard values incorporated into NGA database products.

Metadata

The ground motion data discussed in the previous section are most useful when accompanied by source, path, and site parameters. These parameters are associated with each individual ground motion, and are required for model development and for time-series record selection for use in response history analyses and other research studies and engineering applications. This section presents the sources of metadata and the methods used to assign metadata to records, sites, and earthquakes. These methods incorporate approaches for resolving conflicts between metadata contained in the original NGA datasets in some cases.

Seismic-source parameters

Seismic-source parameters include origin date and time, hypocenter coordinates (longitude, latitude, and depth), seismic moment, M, and nodal plane solution (strike, dip, and rake). These data are taken from the Global Centroid Moment Tensor (CMT) project (Ekström et al., 2012), the National Earthquake Information Center (NEIC) at the USGS, or local earthquake catalogs such as the Northern California Earthquake Data Center (NCEDC). CMT origin times, M, and nodal plane solutions were preferred when available because they are derived using global recordings that average out radiation pattern variability and provide between-region consistency. CMT hypocenter coordinates are not preferred because the solution corresponds to the center of the earthquake moment distribution in time and space (Ekström et al., 2012), which may not align well with the location of the initial slip. Therefore, the locations as reported by local networks are preferred, which was taken as the Advanced National Seismic System (ANSS) Comprehensive Earthquake Catalog (ComCat) (USGS Earthquake Hazards Program, 2017) for events in North America. For cases where the CMT does not report information, the preferred origin solution (i.e. time, M, and nodal plane solutions) as reported by the ANNS ComCat were adopted. Faulting mechanisms are assigned based on rake-angle relationships given in Ancheta et al. (2013). This approach was applied for all of the events considered in this study, and is consistent with, approaches adopted in previous NGA projects.

Parameters that describe the fault rupture surface as one or more rectangles (upper-left corner coordinates and dimensional length and width) are necessary to calculate source-to-site distances. Finite-fault solutions are often formulated for large magnitude events based on inversion of recorded surface motions. Of the post-NGA events in the GMDB, only the M6.02 August 24, 2014 South Napa (Dreger et al., 2015), M6.48 July 4, 2019 Ridgecrest and M7.06 July 6, 2019 Ridgecrest events (Xu et al., 2020), the February 6 M7.81 and M7.7, and February 20 M6.37 Kahramanmaraş, Türkiye earthquakes (Goldberg et al., 2023; Xu et al., 2023 and Barbot et al., 2023, respectively) have published finite-fault models (FFMs). The FFMs were selected and trimmed according to the procedures outlined in Contreras et al. (2022).

FFMs for the remaining events were generated using the simulation procedure described in Contreras et al. (2022), performed using an updated version of the program originally developed by Chiou and Youngs (2008). This algorithm generates a stochastic set of possible rupture surfaces given the available source metadata, and selects the most probable surface that does not result in atypically short or long finite-fault distances for any given site. We re-coded the procedure for use in Python (Buckreis, 2024) and updated the procedure to include M-based scaling relationships for stable-continental-type events (Leonard, 2014) and more recent relationships for active shallow crustal events (Leonard, 2014; Thingbaijam et al. 2017). Appropriate models for rupture area and aspect ratio given M are selected by specifying the tectonic regime (i.e. shallow crustal or stable continental). When simulating ruptures for active shallow-crustal-type events, relationships proposed by Wells and Coppersmith (1994), Chiou and Youngs (2008), Leonard (2014), and Thingbaijam et al. (2017) are used with equal weights. Simulations were performed using both nodal plane solutions with equal weight. Rupture surfaces for NGA-West2 and Margaris et al. (2021) events were re-simulated using this approach, and their distance metrics were updated.

Site / station parameters

We distinguish sites from stations because stations include information about instruments and networks that may change over time, whereas site data are constant. Furthermore, multiple stations may be installed at the same site, and separating them avoids duplication in data entry. Site parameters include location (latitude, longitude, and elevation), topographic slope, terrain class, surficial geological unit, recommended V_S₃₀, and depth parameters z_1.0 and z_2.5 which represent depths to V_S of 1.0 and 2.5 km/s, respectively. A 30 arc-sec digital elevation model from Farr and Kobrick (2000) was used in QGIS to assign terrain class defined by Iwahashi and Pike (2007) and Iwahashi et al. (2018). Geological units for sites within California were assigned from Wills et al. (2015), sites in Oregon, Nevada, and Utah were assigned units from local geologic maps (Biek et al., 2010; Ludington et al., 2005; Rowley et al., 2006; Smith and Roe, 2015), and sites across CENA were assigned using the suite of geologic maps applied for V_S₃₀ assignments by Parker et al. (2017) (individual citations are provided in https://doi.org/10.17603/ds2-w0zm-y670).

V_S ₃₀ values were assigned according to the following order of preference:

Computed from a nearby (<150 m; same geology) measured V_S profile to a depth (z_p) of at least 30 m.

Estimated using the depth extrapolation relationship of Dai et al. (2013) with region-specific regression coefficients from Kwak et al. (2017a) when a nearby shallow measured V_S profile exists (z_p < 30 m).

Estimated using regional V_S₃₀ proxy-based relationships (e.g. peat-thickness proxy model for sites located in the Delta, described in Buckreis et al., 2025a).

Estimated using a weighted combination of Kriging-interpolated (Thompson, 2018), slope- (Wald and Allen, 2007), terrain- (Yong et al., 2012), and/or geology-based proxy models as described in Wang (2020).

Geology-based V_S₃₀ proxy models exist for California (Wills et al., 2015), Oregon (Ahdi et al., 2017), Utah (Choi, 2024; McDonald and Ashland, 2008), and CENA (Parker et al., 2017); however, no models are published for Nevada, where a significant number of new sites are located. Given the relative proximity of most Nevada sites to California, we relate geologic units given by Ludington et al. (2005) to counterparts in Wills et al. (2015) and observed good agreement at the border between California and Nevada with respect to the continuity of mapped units. On this basis, geology-based V_S₃₀ estimates from Wills et al. (2015) were applied to Nevada sites. Because most of the V_S measurements in Nevada use the ReMi method (Louie, 2001), which is not considered to be reliable over the depth range considered for V_S30 (Cox and Beekman, 2011), we are unable to validate that the median values reported by Wills et al. (2015) are suitable for mapped units in Nevada. Given the proximity and continuity of mapped units and the fact that geological properties are not expected to change based on political boundaries, we consider the assignments to be appropriate.

Following the hierarchy above, some sites from the NGA-West2 and NGA-East projects were updated in cases where a measured V_S30 became available after the NGA data collection effort concluded. Measured V_S profiles were accessed from the V_S profile database (e.g. Kwak et al., 2021) and measured V_S₃₀ values were obtained from the USGS V_S30 compilation (McPhillips et al., 2020). NGA-West2 had 391 sites in California with measured V_S₃₀ values, whereas the updated California GMDB subset contains 847 sites with measured V_S₃₀ values. Similarly, the number of sites in CENA with measured V_S₃₀ values has grown from 77 in NGA-East to 134 in the GMDB. Fig. 9 presents the distribution of V_S₃₀ values partitioned by assignment method (i.e. measured vs proxy-based) and geographic region.

Figure 9.

(a) Distribution of V_S₃₀ partitioned by assignment method (computed from a measured V_S profile or estimated using proxy-based models); (b) distribution of measured and proxy-based V_S₃₀ assignments in different regions.

Shear-wave isosurface depth parameters (z_1.0 and z_2.5) were obtained from the most recently published seismic velocity models for sites located in Quaternary basins in California and the rest of the WUS (Aagaard and Hirakawa, 2021; Boyd 2022; Lee et al., 2014; Shaw et al., 2015; Magistrale et al., 2008). Sites located within the modeled domain of more than one model are assigned multiple depth values for each depth term. For California sites located outside of basins, generally having Pleistocene, Tertiary, or Mesozoic surficial deposits, a geology-based model was applied to estimate z₁ (Shams et al., 2025). Non-California sites located outside of the domains of seismic velocity models are not assigned isosurface depths. For CENA, isosurface depth parameters are not available as in the WUS, but sediment-depths relating to the thickness of sediments above basement rocks in the Atlantic and Gulf coastal plains are available from Boyd et al. (2024) and are provided in the GMDB.

Station parameters include station name, location, the network and station codes specified by the Standard for the Exchange of Earthquake Data (SEED) (IRIS, 2012), station type/housing (i.e. free-field vs in a structure), sensor depth, installation and removal dates, and information about the network that operates the station. Station locations were compared among the NGA flatfiles, IRIS, NCEDC, CESMD, and the Southern California Earthquake Data Center (SCEDC) housed at Caltech. For nearly all stations, locations agreed well and data from the catalog with the most significant digits was adopted. In some cases, the station parameters were different among the different sources of information, and were handled on a case-by-case basis using a semi-automated procedure to resolve the conflicts. Stations spaced within 1 km of each other (or ∼0.01° in coordinate precision) were flagged as initial candidate duplicate stations. These stations were then further screened based on similarity of station code and station name using fuzzy logic. Fuzzy logic assigns values ranging from zero to one, which represent completely false or completely true, respectively (Zadeh, 1988). “Fuzzy string matching” is an application of fuzzy logic with the goal of identifying strings that approximately match a pattern. Using the fuzzywuzzy package in Python (Inc, 2014), this approach assigns similarity scores to pairs of station codes and names for potential matches. A normalized aggregate score is computed as the sum of individual scores divided by two, and represents the degree to which two stations can be considered a match. We found that stations with an aggregate score larger than 0.85 and separation distances less than 0.2 km (∼0.002° coordinate precision) generally corresponded to matching stations, and were flagged as such. Pairs with low aggregate scores (i.e. lower than 0.15) and separation distance larger than 0.5 km are considered distinct. The remaining pairs were sorted by aggregate score in descending order and screened manually. The same semi-automated procedure was utilized to ascertain whether the ground motions processed by the authors were from new stations, or from existing stations in NGA databases. Using this approach, 9927 seismic stations were deemed to be unique after unifying the eleven datasets. This process corrected errors in the NGA-West2 database in which stations had previously been separated and are now aggregated (e.g. Tarzana—Cedar Hill (station sequence number, SSN 340) and Tarzana—Cedar Hill A (SSN 341), which recorded different earthquakes, but are the same station (CE 24436)).

In addition to screening for duplicate stations, the aforementioned approach was adapted to assign networks (and their associated network codes) to the majority of NGA-West2 stations, because this information was not always accurately recorded. The motivation for this effort arose from the fact that it is easier to screen for duplicate stations when their unique network and station SEED code combinations, as defined by the International Federation of Digital Seismograph Networks (FDSN), are available. This process will need to be undertaken each time new data are added. We are able to assign networks to all but 94 NGA-West2 California stations by comparing NGA-West2 station metadata to all available California stations downloaded from IRIS, NCEDC, SCEDC, and CESMD. We suspect most of these 94 stations to be older-decommissioned stations that will not produce data in the future.

Path parameters

Site-to-source distances are calculated between station locations and one or more finite-fault segments used to represent the rupture surface using P4CF (Chiou, 2021). Distance metrics include R_RUP, closest distance from the site to any point on the surface projection of the rupture surface (R_JB), distance measured perpendicular to the fault strike from the surface projection of the top edge of the rupture surface (R_x), distance measured parallel to the fault strike from the midpoint of the surface projection of the rupture surface (R_y), the average distance to the fault plane (R_avg; Chiou et al., 2000; Ohno et al., 1993), the epicentral distance (R_epi), and the hypocentral distance (R_hyp). In addition, P4CF provides the coordinates (longitude, latitude, and depth) of the closest point on the rupture surface to each site.

Relational database

The objective of the work presented herein is to develop a web-served publicly accessible relational database that users can query. Motivation for a web-served relational database is driven by three factors:

Users often desire specific fields of data that can be accessed by targeted queries rather than having to download undesired data.

The relational structure improves data/metadata integrity by avoiding repetition of data entry that would be required if all metadata were stored in a single table (e.g. site parameters for different recordings made at the same site) (Krüger, 2004).

Indexes can make database searches faster than input / output operations on local files.

The relational database utilizes the MySQL InnoDB storage engine and can be queried using Structured Query Language (SQL). Fields have predefined data types that include numeric (e.g. “INT,”“FLOAT,” etc.), temporal (e.g. “DATE” and “DATETIME”), or string types (e.g. “VARCHAR,”“MEDIUMTEXT,” etc.), which ensures data integrity by only allowing entries conforming to valid formats. This section describes the database structure, the reasoning for its organization, and other elements developed to help users access the data.

Database structure

The organizational structure of the database—the schema—defines the tables, fields, and relationships among the tables in the database. The current GMDB schema contains 32 tables that can be broadly grouped into five categories: event, site/station, ground motion, auxiliary, and junction. A list of table names in each group is provided in Tables 2 through 6. A diagram depicting the tables and their relationships is presented in Fig. 10. Linkages between tables are identified by shared fields called “keys.” A primary key consists of one or more fields that uniquely identify a table entry, and a foreign key is one or more fields in a separate table that identify the entry in the table containing the corresponding primary key. For example, network_id is the primary key in the network table and a foreign key in the station table to identify that a particular station is associated with a particular network.

Table 2.

Event tables

Table Name	Table Description	No. of Fields
aftershock_mainshock	Intra-event distance between aftershock and mainshock events	4
event	Earthquake events	34
event_type	Event types	2
finite_fault	Finite-fault model parameters	9
finite_fault_kinematic_parameter	Finite-fault kinematic parameters	10
finite_fault_segment	Finite-fault rupture surface representations	13

Table 3.

Site/Station tables

Table Name	Table Description	No. of Fields
basin_model	Model for basin parameters	4
basin_site	Basin parameters (e.g. z_1.0 and z_2.5)	7
network	Seismic network operators	8
station	Station information	12
site	Site information	28
vs30_code	Methods used to assign preferred V_S₃₀	3
z_code	Methods used to assign preferred z values	2

Table 4.

Ground motion tables

Table Name	Table Description	No. of Fields
motion	Ground motion parameters	6
path	Path metrics	29
fourier_spectra	Fourier amplitude spectra	7
instensity_measure	Intensity measures	89
response_spectra	Pseudo-spectral accelerations	10
time_series_metadata	Time-series processing metadata	27
time_series_data	Time-series data	5

Table 5.

Auxiliary tables

Table Name	Table Description	No. of Fields
citation	Citation information	4
collection	A collection groups motions, events, or stations together into a common set	4
geometry	GeoJSON geometries	7
user	User information	19
version	Database versions	5

Table 6.

Junction tables

Table Name	Table Description	No. of Fields
collection_motion	Mapping motions to collections and record sequence numbers	4
event_eqid	Mapping earthquakes to NGA earthquake identification numbers	4
event_geometry	Mapping earthquake locations to geometry objects	3
site_geometry	Mapping site locations to geometry objects	3
station_ssn	Mapping stations to NGA station sequence numbers	4
version_time_series_metadata	Mapping time-series to versions	3
vs30_citation	Citation for V_S30 assignment	3

Figure 10.

Diagram of simplified GMDB schema depicting primary and foreign key relationships; symbol colors and shapes indicate the nature of the table contents; lighter shades (e.g. path table) indicate tables whose contents are computed from values stored in other tables. Arrow direction indicates direction of foreign key dependency (e.g. event_id, path_id, and station_id are foreign keys in the motion table).

The GMDB utilizes indexed integer primary and foreign keys that are automatically assigned each time a new entry is inserted into the database. Using single integer keys facilitates simpler SQL queries when tables are joined, but does not inherently enforce uniqueness of the combination of fields that define a unique entry (which is inherent to composite keys consisting of multiple columns). We separately require certain combinations of fields to be unique using the UNIQUE INDEX option in MySQL. For example, the combination of network_code and network_name must be unique in the network table when a new network is inserted into the database. A complete description of all of the fields in each table is too lengthy to document here, but is provided at Buckreis et al. (2023a) (https://doi.org/10.34948/G4RP4K). Pertinent details about each table are included in the subsections below.

Event tables

Event metadata are organized into six tables: event, finite_fault, finite_fault_segment, finite_fault_kinematic_parameter, aftershock_mainshock, and event_type. The event table stores the most common earthquake-related metadata, including event name, hypocenter location, origin time, M, strike, dip, rake, etc. In addition to source parameters, the event table also records the unique ID assigned by the ANSS ComCat in the comcat_id field. All numeric fields are stored as “FLOAT” data types, and descriptive fields including event_name, magnitude_type, and comcat_id are stored as “VARCHAR” string objects. Longitude and latitude fields are stored as “FLOAT(8,5)” data types, which preserves up to five significant digits after the decimal point (approximately 1.11 m resolution, although event locations are typically reported with three significant digits after the decimal point). Origin times are stored as “DATETIME” objects in the datetime field which allows the use of time operations to easily and efficiently construct queries with temporal constraints. Faulting mechanisms are assigned based on rake-angle relationships defined in Ancheta et al. (2013) in the mechanism_based_on_rake field, and the presence of surface rupture is stored as a binary indicator (0=“no” or 1=“yes”) in the coseismic_surface_rupture field. Many fields including mechanism_based_on_rake, magnitude_uncertainty_study_class, basis_for_surface_rupture, extensional_regime among others are stored as integer-valued codes to facilitate efficient querying. These data types are extended to similar fields in other tables.

Rupture surface representations and associated data for each event are stored in the finite_fault, finite_fault_segment, and finite_fault_kinematic_parameter tables. The finite_fault table stores general rupture surface information including the total length, width, area, and depth to top of the rupture model, as well as an integer FFM field which takes the value of “1” if the representation is derived from a published FFM, or “0” if the rupture surface is simulated as described earlier. The finite_fault table is connected to the event table via the event_id foreign key. Although it is possible for several FFM representations to exist for a single event, the GMDB presently only records information derived from the preferred model as selected using the approach described in Contreras et al. (2022). The location and geometric properties of each rupture surface are stored in the finite_fault_segment table, and are linked to the finite_fault table via the finite_fault_id foreign key. Fault ruptures may be represented by multiple segments; therefore, the finite_fault_id foreign key represents a “one-to-many” relationship, meaning that one finite_fault entry can be related to multiple finite_fault_segment entries. Kinematic parameters including average fault displacement, average slip velocity, and more are stored in the finite_fault_kinematic_parameter table for events with published FFMs.

The aftershock_mainshock table stores data which can be used to identify aftershocks that re-rupture the mainshock fault plane or occur within the damaged zone within a time window for aftershocks (i.e. “Class 2” events) (Wooddell and Abrahamson, 2014). Inclusion of event classes facilitates investigations of ground motion differences between “Class 1” events (mainshocks, foreshocks, triggered events, and off-plane aftershocks) and Class 2 events (e.g. Abrahamson et al., 2014; Boore et al., 2014; Campbell and Bozorgnia, 2014). The table functions as a junction table between two events in the event table, where event_id corresponds to the event_id of the Class 2 event and mainshock_event_id is the event_id of the corresponding Class 1 event. The shortest distance between the surface projection of the centroid of the rupture surface of the trial aftershock and the closest point on the edge of the surface projection of the mainshock rupture surface (CR_JB) is stored in the crjb field. Only pairs of events which occur within the Gardner and Knopoff (1974) time window are considered, all others are assumed to be Class 1. Users distinguish Class 1 and Class 2 events in the aftershock_mainshock table by comparing CR_JB with a specified threshold distance of their choosing (e.g. CR_JB < 15 km can be used to suggest Class 2; Ancheta et al., 2013).

The event_type table specifies whether the tectonic regime is shallow crustal, subduction interface, subduction intraslab, outer-rise, induced, stable continental, or undetermined. In previous NGA projects, separate databases were organized for different tectonic regimes. However, the GMDB has been organized with anticipation of storing data from all three NGA databases in a single centralized location. We recognize that GMMs must be derived separately for different tectonic regimes, and that including data from multiple regimes marks a departure from past practices. However, there are benefits to combining them—for example, site and station data are unaffected by earthquake type, and therefore can be entered once for a given station and used for all event types. The alternative would be multiple databases with repeated entries, which introduces potential data integrity problems. For example, V_S₃₀ might be updated at a site in one database, but not in another, leaving users wondering which is more trustworthy.

Site / station tables

Station and site metadata are organized into seven tables: network, station, site, vs30_code, z_code, basin_model, and basin_site. The network table stores metadata about seismic networks including their name, SEED network code, operator, and operating dates reported by FDSN. The network_id is used as a foreign key in the station table, which contains information about individual stations including the name, SEED station code, station type/housing (e.g. free-field vs structure), depth, and location. The primary key in the station table is station_id. We considered but ultimately decided against storing instrument information in the database, such as serial numbers. Our reasons for not including detailed instrument information are (1) the GMDB contains adequate information to find detailed instrument information in IRIS; (2) if we were to scrape those data into our database, there is potential for conflicting information between the two data sources, and IRIS is a better source for this information; and (3) we anticipate that most ground motion modelers will not require this information in their work. We do, however, identify whether a ground motion was recorded by an accelerometer or seismometer.

The site table contains metadata related to the physical attributes of the location of the recording instrument, such as geologic and topographic information. Latitude, longitude, and elevation are also assigned to the site, which are used to populate other metadata fields from published maps. The station table contains site_id as a foreign key to map sites to stations in a one-to-many relationship. Most sites have only one station, so the site location is usually the same as the station location. Exceptions include vertical arrays consisting of two or more stations at the same site, collocated stations operated by different networks (with different SEED network and station codes), and stations spanning different epochs (e.g. decommissioned stations that were replaced by a new station with a different SEED station code). In some cases, V_S₃₀ and geologic conditions may vary over short distances, in which case each station may be assigned to its own site. An example is the Borrego Valley Differential Array (Kato et al., 1998), which contains stations spaced at distances as short as 10 m. Measured V_S₃₀ values are available at each station location, so a site was created for each station. Similarly, the number of sites may change over time as geophysical / geological data become available. For instance, two stations in a medium sized structure (370–1500 m³), but in opposite corners, are generally considered one site; however, hypothetically, they may be split into two sites if future site characterization deems the site conditions to be significantly different between the two corners.

The site table also records the preferred V_S₃₀ value recommended for analysis. The preferred V_S₃₀ value is that which best reflects the median estimate as obtained through agreed upon protocols that have evolved over time (e.g. Chiou et al. 2008, Ancheta et al. 2014, Wang 2020, Ahdi et al. 2022). As discussed previously, V_S₃₀ may be computed from a measured V_S profile or estimated from proxy-based models. The site table includes vs30_code_id to record this type of information. An integer-valued primary key is assigned to each method in the vs30_code table, which is used to populate the vs30_code_id field. The vs30_lnstd field records the uncertainty associated with the preferred V_S₃₀ value. The vs30_reference entry provides additional information related to how the preferred V_S₃₀ was obtained, such as the reference to the V_S profile used to calculate V_S₃₀ as indicated by a string containing the data provider and profile identifier (e.g., database name-unique identifier). Current providers of measured-V_S information include:

Turkish Disaster and Emergency Management Authority (AFAD): AFAD—network-station code

CESMD: CESMD—network-station code

NGA-West 2 flatfile: NGAW2—ssn

NGA-East flatfile: NGAE—ssn

USGS V_S₃₀ Compilation (McPhillips et al., 2020): USGS—Id

Shear-wave velocity profile database (VSPDB, Kwak et al., 2021): VSPDB—profile id

For sites where V_S₃₀ is estimated from proxy-based methods, vs30_reference describes the type of proxy model(s) used. Abbreviations for proxy-model types include:

“Geo”: geology- or hybrid geology-slope proxies (e.g. Wills et al., 2015).

“Kri”: Kriging interpolation if site is located near measured V_S₃₀ (e.g. Thompson, 2018).

“Terr”: terrain-based proxies (e.g. Yong, 2016).

“Pea17—group number”: proxy-based V_S₃₀ estimation in CENA proposed by Parker et al. (2017).

The expanded list of proxy models implemented in the NGA-Sub project (Ahdi et al., 2022) have not yet been implemented in the GMDB.

The basin_model and basin_site tables are used to assign z_1.0 and z_2.5 to sites within the geographic domain of seismic velocity models. Given that the boundaries described by several such models may overlap (e.g. CVM-S and CVM-H in southern California), the GMDB needed a way to allow for multiple estimates to be assigned. We store multiple depth parameters because different parameters are used by different modelers. To meet this need, the basin_site table acts as a “many-to-many” junction between the site and basin_model tables via relationships of site_id and basin_model_id foreign keys. The basin_model table contains identifying information for each model (full name, abbreviation, etc.), and each model is assigned a unique basin_model_id. Within the basin_site table pairs of basin_model_id and site_id are assigned appropriate z_1.0 and z_2.5 values. An advantage of this structure is that it is flexible enough to accommodate additional estimates as newer models are published. The site table includes fields for the preferred z_1.0 and z_2.5 estimates with corresponding foreign keys to the z_code table to record the method by which the preferred assignment was arrived at (e.g. from a measured V_S profile, proxy-based model, or CVM).

Ground motion tables

Ground-motion metadata are organized into two tables: path and motion. The path table stores all distance metrics in addition to a hanging_wall indicator defined in Ancheta et al. (2013). The motion table joins the event, path, and station tables with the time_series_metadata table, thus relating all aspects of a ground motion (i.e. source, source-to-site path, and site metadata) with the time-series and intensity measures. Each source-to-station record in the motion table is assigned a unique motion_id, and also records the user_id of the individual or entity (e.g. an NGA project) that was responsible for assembling the ground motion data.

Ground motion data are organized into five tables: time_series_metadata, time_series_data, intensity_measure, response_spectra, and fourier_spectra. The time_series_metadata table stores information directly related to the time-series data and processing. Each motion_id in the motion table relates to one entry in the time_series_metadata table for the three generalized as-recorded components (H1, H2, and V), and records information about the time-series including component azimuths, type of recorder, sampling rate, and original source of data. We depart from the SEED specification in that the GMDB does not record channel or instrument codes. Our justification is that the SEED channel code can be reconstructed from the information contained in the sampling_rate, instrument_type, and azimuth fields, which are more descriptive than channel codes. Similarly, metadata stored in the station table can be used to identify a location code by cross-checking IRIS. Processing metadata includes the method of filtering (manual vs semi-automated), filter type, filter orders, and selected high- and low-pass corner frequencies, which control record usable bandwidth.

Acceleration time-series data are stored separately in the time_series_data table, where a time_series_metadata_id foreign key is used to relate each time_series_data_id to the time_series_metadata table. The time-series data are stored within this table as a “MEDIUMTEXT” object formatted using JavaScript Object Notation (JSON), which can be read and parsed by many programming languages. The JSON strings have a dictionary-like structure where acceleration (in units of g) is represented as a list of values (e.g. {“acceleration (g)”:[a₁, a₂, …, a_N]}, where a_i represents the acceleration value and N is the number of points). Times are not stored in JSON strings because a time-array can easily be constructed from the corresponding sampling rate in the time_series_metadata table. Benefits of storing the time-series data as strings instead of “FLOAT” objects are (1) we anticipate users will rarely wish to query a portion of a time-series so we bundle them into a single string; (2) the time-series data are ordered as an array in the JSON string, but each FLOAT value would require its own time value if stored in a traditional database format; and (3) the InnoDB engine compresses strings in memory, so the JSON string format requires less memory and results in faster queries than if the data were stored as floats.

The intensity_measure table stores all IMs discussed previously except PSA and EAS for each component. PGA, PGV, CAV, CAV₅, I_A, and I_A-times at every fifth percentile (5 to 95) are stored as “DOUBLE” objects. PSA are stored in the response_spectra table, and as-recorded component FAS and EAS are stored in the fourier_spectra table. All three tables include time_series_metadata_id as foreign keys.

Auxiliary information tables

Auxiliary information describes anything that is not inherently related to ground motion data, and is organized into five tables: citation, collection, user, geometry, and version. The citation table stores information about all references used throughout the GMDB, which includes a full citation and uniform resource locator (URL) or digital object identifier (DOI). Each reference is assigned a unique citation_id which is used as a foreign key in other tables to track the original source of data (e.g. the finite_fault_citation_id field in the finite_fault table relates the FFM data to the appropriate publication). The best practice for researchers who utilize GMDB data is to reference all original sources of data.

The collection table stores metadata related to a collection of motions, events, or stations, and includes fields for a name, description, and citation_id (if related to a publication). Types of collections include published datasets (e.g. NGA-West2), data subsets describing regional data (e.g. California ground motions), and specific datasets used by researchers in connection with a particular study. Collections referenced in the event_eqid and station_ssn tables will only be of the first type because they map unique identifiers across different databases; however, those referenced in the collection_motion table can be of any type. The primary function of “collections” is to provide a means to quickly and efficiently query sub-datasets associated with particular research projects.

The user table contains information about each user including a username, email, full name, and number of data requests. Passwords are also stored in an encrypted password field, and are used to authenticate when authorized users connect to the GMDB server. As a means of quality assurance, user_id’s are assigned to each ground motion in the time_series_metadata table and correspond to the user who processed the ground motion data. User roles are also assigned in the user table; current roles include “users,” who have read-only permissions to publicly released data, “modelers” who have read-only permissions to released and restricted data, and “admins,” who have full edit privileges. Additional roles may be created as needed.

Finally, the geometry table stores geospatial data in GeoJSON format (Butler et al., 2016) for defining geographic features. The purpose of this table is to provide a container for supplementary geospatial data commonly used in ground motion analyses (e.g. the outline of a particular sedimentary basin). Data are stored in the geometry_geojson field (“MEDIUMTEXT”), which can be read and parsed by most programming languages or GIS software. The geometry_class field is used to identify the nature of the type of data contained within the GeoJSON, and is populated using codes defined in the online documentation. Name, description, and citation_id (if associated with a publication) are also stored for each entry.

Junction tables

Junction tables are used to efficiently store many-to-many relationships between two tables. There are seven junction tables in the GMDB: collection_motion, event_eqid, station_ssn, event_geometry, site_geometry, vs30_citation, and version_time_series_metadata. The collection_motion table relates the collection and motion tables and contains a primary key (collection_motion_id), two foreign keys (collection_id and motion_id), and a “record sequence number” (RSN) assigned by an individual collection (e.g. NGA-West2). The purpose of the table is to group sub-datasets of records together within the database structure to facilitate efficient querying of routine requests, and to map RSN’s from external databases to GMDB motion_ids. The many-to-many relationships defined in the table means that a collection_id may have many motion_ids, and an individual motion_id may belong to several collection_ids.

The “earthquake ID” (EQID) and “station sequence numbers” (SSNs) assigned to events and stations in NGA products could not be adopted as primary keys in the event and station tables, respectively, because NGA-West2 and NGA-East each started their assignments at “1.” Accordingly, the EQID and SSN fields are unique within, but not between, database products. We, therefore, assign unique event_ids and station_ids to ensure that the primary keys are unique; however, users may wish to search the GMDB using familiar EQIDs and/or SSNs assigned by individual collection efforts unified in the GMDB (e.g. NGA-West2 and NGA-East). The event_eqid and station_ssn tables provide these mappings by acting as junction tables between the collection and event or station tables, respectively. Both tables contain four fields: (1) a primary key (event_eqid_id or station_ssn_id), (2) collection_id foreign key, (3) event_id or station_id foreign key, and (4) eqid for the EQID or ssn for the SSN. The relationships between the primary keys (event_eqid_id and station_ssn_id) to collection_id and event_id or station_id are both “one-to-many,” meaning that a given event_id or station_id can only have one EQID or SSN for a given collection_id; however, individual event_id or station_ids may appear numerous times if they exist in multiple collections.

Sometimes information about the location of the earthquake or site is also important, such as knowing if they occur within a particular geographic area. This type of metadata is stored in the event_geometry and site_geometry tables, which act as “many-to-many” junction tables between the geometry and event or site tables, respectively, meaning that one event or site can be related to many geometry objects, and a particular geometry can also be related to many events or sites. The relationships are defined by identifying event_id and site_ids who’s locations (hypocenter or site coordinates) map within the geometry object (if a polygon) associated with a particular geometry_id. For example, if a user is only interested in data from events that originate in the Bay Area of northern California, they can limit their query by filtering on the associated geometry_id through the event_geometry table. The information in the site_geometry table can be used to identify sites that are located inside of particular basins (e.g. Los Angeles basin), or used to assign geomorphic site categories (e.g. basin, basin_edge, valley, mountain_hill—Nweke et al., 2022), provided the necessary geometry object is defined in the geometry table.

The vs30_citation table is used to provide citations for V_S₃₀ assignments in the site table, and acts as a one-to-many junction table (i.e. if several proxy-based estimates are combined to provide the recommended V_S₃₀ for a given site, then the citation_ids for each model used are related to its site_id). In cases where V_S₃₀ is computed from a measured V_S profile, the citation_id for the data publication will be assigned to the site_id.

Version control

The GMDB is constantly changing as admins enter new data and update existing fields, which poses significant challenges for reproducibility of research findings. Version control is needed so users are able to query specific data collections that were used in a study. Two general approaches for version control in relational databases are (1) schema-based, in which version information is stored within database tables and versioning is handled through SQL queries, and (2) copy-based, in which the database is copied and archived at a specific point in time and version control is achieved by querying the appropriate version. Schema-based methods are complicated to implement because they require additional tables and fields, and the SQL queries must be written specifically to retrieve data from the desired version. Copy-based methods are simple to implement, but involve significant duplication of information in the different database copies, which is inefficient with respect to memory. We adopt a blend of these approaches within the GMDB. We use a schema-based approach for the time_series_metadata, time_series_data, intensity_measure, response_spectra, fourier_spectra, user, version, and version_time_series_metadata tables, which occupy 99.85% of the total database memory. Duplicating this information with each version is deemed unnecessary and inefficient. We adopt a copy-based method for the remaining tables, which occupy only 0.15% of the database memory.

The version control approach is illustrated in Fig. 10. The schema-based version control database contains the large tables (e.g. time_series_data, intensity_measure, fourier_spectra, and response_spectra), a version table, and a junction table identifying the version of each time-series, which is easily mapped to each entry in the large tables. The version table contains the version number, description, and database name for the copy-based version control tables. Once an entry in one of the large tables is assigned a version, the version number is permanent. For example, if a time-series needs to be updated, the existing time-series will not be overwritten. Rather, a new time-series will be inserted into the time_series_data table, and the version_time_series_metadata junction table will map the time-series entities to the appropriate versions. The copy-based version control approach involves copying the small tables when new versions are released, and connecting to the correct database. By default, the API will return data from the most recent version. The best practice for researchers who utilize GMDB data is to reference the specific GMDB version used to enable reproducibility.

Application programming interface

This section discusses how the ground motion data and associated metadata in the GMDB are made publicly accessible. One approach would be to open the database to user SQL queries, but this would require users to be proficient in SQL and to understand the database structure to write queries. Moreover, allowing users to submit custom SQL queries is also a security threat. To overcome these challenges, we developed a representational state transfer (REST) API and online tool to assist users with accessing and interacting with the data. This API enables users to request data from the GMDB using relatively straightforward query string parameters appended to the end of a URL that serves as an endpoint. Using the API to retrieve data requires only a basic understanding of the database structure and no knowledge of SQL. Key features of the API are explained here, and complete documentation can be found at Buckreis et al. (2023a) (https://doi.org/10.34948/G4RP4K).

The API can be accessed through a web portal, where data can be viewed in HTML tables, or through HTTP requests, where data are returned in a JSON format. The API requires authentication and authorization to prevent unauthorized users and bots from accessing data. When using the web portal, authentication is provided through a login script, and authorization is handled by a dropdown menu defining the user’s role. When retrieving JSON data, users must first authenticate by submitting their username and password via a Basic Auth HTTP request. Successful authentication returns an authorization token that must then be included in the header of subsequent HTTP requests to retrieve data.

To request data from a certain table, the user appends an endpoint to the end of the base URL: https://gmdatabase.org/. For example, if a user is interested in earthquake-source metadata, they can query the event table: https://gmdatabase.org/events. If the URL is pasted into a web browser and the user is authenticated, results are rendered in an HTML table, as shown in the screenshot in Fig. 11 depicting a portion of the returned data. Additional query string parameters separated by ampersands can be appended following a question mark after the name of the table to refine how the data are returned. Users can sort by a specified field either ascending or descending, specify the number of records to return, or filter records by specifying a field name, delimiter, and value. For example, if the user wants to retrieve events with magnitude larger than 5 sorted from oldest to newest, the URL would be: https://gmdatabase.org/events?sort=datetime&direction=asc&magnitude>5, in which “sort,”“direction,” and “magnitude” are query string parameters and “datetime,”“asc,” and “5” are the user-defined values.

Figure 11.

Screenshot of information returned by API (https://gmdatabase.org/events).

In addition to providing access to individual tables, the GMDB has a flatfile endpoint in which data from the following tables are joined together into a flattened representation: motion, event, station, path, finite_fault, finite_fault_kinematic_parameter, network, station, site, time_series_metadata, intensity_measure, and either response_spectra or fourier_spectra. The flatfile endpoint provides information that is similar to the publicly available NGA flatfiles. The flatfile endpoint supports the same sorting and filtering capabilities that are present for the individual table endpoints, and is accessible both through the web interface and API. A flatfile can be extracted from the database through the API using the following URL: https://www.gmdatabase.org/flatfile.

Users can view time-series data through the web GUI, though results are truncated to the first 100 characters of the JSON strings to facilitate responsive web viewing (https://gmdatabase.org/timeSeriesData). However, users can access the full JSON strings by submitting an HTTP request through the API.

Summary and recommendations

The GMDB departs from past practices for disseminating ground motion data by making data available through a web-served database via an API. The GMDB currently stores the unified ground motion data from active tectonic and stable continental regions assembled by NGA-West2, NGA-East, Kishida et al. (2014), Ahdi et al. (2020), Wang (2020), Margaris et al. (2021), Buckreis et al. (2022, 2024b), Ramos-Sepúlveda et al. (2024), and Mohammed et al. (2025, 2024) The California portion of the database contains 49,797 multi-component records from 525 events, the CENA subset is made up of 16,405 multi-component records from 183 events, and the Mediterranean subset is made of 4403 multi-component records from 559 events.

As of this writing (June 2025), multiple efforts are ongoing to extend the GMDB, including global data from shallow crustal earthquakes in active tectonic regions as part of the NGA-West3 database (Buckreis et al., 2024c). The schema was developed with these expansions in mind; however, organizational additions and/or alterations are anticipated as the database grows. The source and site metadata compiled as part of ongoing projects, such as NGA-West3, are publicly available in the GMDB since these types of information are generally compiled from public sources. However, the ground motions and associated intensity measures are restricted pending formal release by the project.

The GMDB API is a complex product that is still routinely undergoing updates and advancements. Requests for individual tables and batches of time-series data are operational, and ongoing developments aim to provide customizable datasets containing ground metadata and intensity measures in an end-to-end computing workflow for the next generation of ground motion analytics. We encourage researchers and engineers who utilize the GMDB to reference the specific GMDB version used and to cite all original sources of data and metadata.

Footnotes

The authors thank three anonymous reviewers for their input on this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: Funding for this study included support from the California Department of Water Resources (DWR),Agreement 4600012415;the USGS External Research Program,contract number G21AP10370 (CENA data);Pacific Gas & Electric Company (PG&E) and the California Strong Motion Instrumentation Program (CSMIP) (Ridgecrest and CSN data);the California Department of Transportation,award 65A0780 (NGA-West3);and USGS Contract G22AP00319 (SFBA data). We gratefully acknowledge this support. The views and conclusions in this paper are those of the authors and do not reflect the policy of the USGS,the State of California,or PG&E.

ORCID iDs

Tristan E. Buckreis

Chukwuebuka C. Nweke

Pengfei Wang

Scott J. Brandenberg

Renmin Pretell

Paolo Zimmaro

Jonathan P. Stewart

Data Resources

Raw time-series records were retrieved from the Incorporated Research Institutions for Seismology (IRIS) Data Management Center ( https://www.iris.edu ),the Center for Engineering Strong Motion Data (CESMD;https://www.strongmotioncenter.org /),the California Department of Water Resources (DWR),and from the Earthquake Data Center System of Türkiye (TDVMS;https://tdvms.afad.gov.tr /). The Community Seismic Network (CSN) waveforms were obtained from the CSN website ( http://csn.caltech.edu/data ). The ground motion data were processed using R (R Core Team,2022) or gmprocess (Hearne et al.,2019;Thompson et al.,2025) with a human review GUI which is available on GitHub ( https://github.com/meramossepu/HumanReviewGUI ) and there is a versioned release available on Zenodo (v1.0,at the time of publication;doi: 10.5281/zenodo.10815137). The GMDB API is accessible at

,and static releases of each version are available at Buckreis et al. (2025b) Archive of an earthquake ground motion relational database (GMDB) for engineering applications . DesignSafe-CI.,doi: 10.17603/ds2-w0zm-y670 v1.

References

Aagaard

(2023a) Ground-motion Records and Basin Depth Models for Portland, Oregon, 2005-2022. Sacramento, CA: Center for Engineering Strong Motion Data (CESMD).

Aagaard

(2023b) Ground-motion Records and Basin Depth Models for Reno, Nevada, 2008-2022. Sacramento, CA: Center for Engineering Strong Motion Data (CESMD).

Aagaard

(2023c) Ground-motion Records and Basin Depth Models for the Great Valley, California, 2000-2022. Sacramento, CA: Center for Engineering Strong Motion Data (CESMD).

Aagaard

Hirakawa

(2021) The USGS San Francisco Bay region 3D seismic velocity model (SF-CVM) version 21.1.0: Updates for the East and North Bay and a new storage scheme for 3D Earth models. In: Poster Presentation at 2021 SCEC Annual Meeting, SCEC Contribution, Online, 19 September, p. 11252.

Abrahamson

Silva

Kamai

(2014) Summary of the ASK14 ground-motion relation for active crustal regions. Earthquake Spectra 30: 1025–1055.

Ahdi

Ancheta

Contreras

Kishida

Kwak

Kwok

Parker

Bozorgnia

Stewart

(2017) NGA-Subduction site database. In: Proceedings of the 16th World Conference on Earthquake Engineering, Paper No. 4926, Santiago, Chile, 9–13 January.

Ahdi

Kwak

Ancheta

Contreras

Kishida

Kwok

AOL

Mazzoni

Ruz

Stewart

(2022) Site parameters applied in NGA-Sub database. Earthquake Spectra 38(1): 494–520.

Ahdi

Mazzoni

Kishida

Wang

Nweke

Kuehn

Contreras

Rowshandel

Stewart

Bozorgnia

(2020) Engineering characteristics of ground motions recorded in the 2019 Ridgecrest Earthquake Sequence. Bulletin of the Seismological Society of America 110: 1474–1494.

Ancheta

Darragh

Stewart

Seyhan

Silva

Chiou

BS-J

Wooddell

Graves

Kottke

Boore

Kishida

Donahue

(2013) PEER NGA-West2 Database, PEER Report No. 2013/03, Pacific Earthquake Engineering Research Center, University of California, Berkeley, CA, pp. 134.

10.

Ancheta

Darragh

Stewart

Seyhan

Silva

Chiou

BS-J

Wooddell

Graves

Kottke

Boore

Kishida

Donahue

(2014) NGA-West2 database. Earthquake Spectra 30(3): 989–1005.

11.

Arias

(1970) A measure of earthquake intensity. In: Hansen

(ed.) Seismic Design for Nuclear Power Plants. Cambridge, MA: Massachusetts Institute of Technology Press, pp. 438–483.

12.

Bahrampouri

Rodriguez-Marek

Shahi

Dawood

(2021) An updated database for ground motion parameters for KiK-net records. Earthquake Spectra 37(1): 505–522.

13.

Barbot

Luo

Wang

Hamiel

Piatibrotova

Javed

Braitenberg

Gurbuz

(2023) Slip distribution of the February 6, 2023 Mw 7.8 and Mw 7.6 Khramanmaras, Turkey earthquake sequence in the East Anatolian Fault Zone. Seismica 2(3): 1–17.

14.

Biek

Rowley

Hayden

Hacker

Wills

Hintze

Anderson

Brown

(2010) Geologic map of the St. George and East part of the Clover mountains 30’ x 60’ quadrangles, Washington and Iron counties, Utah. Utah Geological Survey Open-file Report 242DM, Utah Geological Survey, Salt Lake City, UT.

15.

Boore

(2010) Orientation-independent, nongeometric-mean measures of seismic intensity from two horizontal components of motion. Bulletin of the Seismological Society of America 100: 1830–1835.

16.

Boore

Goulet

(2014) The effect of sampling rate and anti-aliasing filters on high-frequency response spectra. Bulletin of Earthquake Engineering 12: 203–2016.

17.

Boore

Stewart

Seyhan

Atkinson

(2014) NGA-West2 equations for predicting PGA, PGV, and 5%-Damped PSA for Shallow Crustal Earthquakes. Bulletin of the Seismological Society of America 30: 1057–1085.

18.

Boyd

(2022) Updates to and applications of the USGS National Crustal Model for seismic hazard studies: U.S. Geological Survey Open-Fle Report OFR 22-02, U.S. Geological Survey, Reston, Virginia, USA.

19.

Boyd

Churchwell

Moschetti

Thompson

Chapman

Ilhan

Pratt

Ahdi

Rezaeian

(2024) Sediment thickness map of Atlantic and Gulf Coastal Plain strata, central and eastern U.S., and their influence on earthquake ground motions. Earthquake Spectra 40(1): 89–112.

20.

Bozorgnia

Stewart

(PI, editor) (2020). Data Resources for NGA-Subduction Project, PEER Report 2020/02, Pacific Earthquake Engineering Research center, UC Berkeley (Center Headquarters), 162 pages.

21.

Buckreis

Stewart

Brandenberg

Wang

Nweke

(2022) Data Files for California Subregional Path Effect Study. DesignSafe-CI. https://doi.org/10.17603/ds2-mbee-8941

22.

Buckreis

Nweke

Wang

Brandenberg

Ramos-Sepulveda

Shams

Pretell

Mazzoni

Zimmaro

Stewart

(2023a) Web portal for the global earthquake ground motion relational database, The B. John Garrick Institute for the Risk Sciences. DOI: 10.34948/G4RP4K

23.

Buckreis

(2024) Ccldpy (v1.0.0). Zenodo. Available at: https://doi.org/10.5281/zenodo.11511219

24.

Buckreis

Nweke

Wang

Brandenberg

Ramos-Sepúlveda

Shams

Mohammed

Pretell

Mazzoni

Zimmaro

Stewart

(2025b) Archive of an earthquake ground motion relational database (GMDB) for engineering applications. DesignSafe-CI., doi: 10.17603/ds2-w0zm-y670v1

25.

Buckreis

Pretell

Sandikkaya

Kale

Askan

Brandenberg

Stewart

(2024b) Engineering attributes of ground motions from February 2023 Türkiye earthquake sequence. Earthquake Spectra 40(4): 2598–2622.

26.

Buckreis

Shams

Mohammed

Kishida

Lanzano

Luzi

Zimmaro

Bradley

Huang

J-Y

Nweke

Brandenberg

Donahue

Bozorgnia

Stewart

(2024c) Overview of the NGA-West3 ground motion database. In: 2024 SCEC Annual Meeting, Palm Springs, CA, 8–11 September.

27.

Buckreis

Stewart

Brandenberg

Wang

(2023b) Subregional anelastic attenuation model for California. Bulletin of the Seismological Society of America 113(6): 2292–2310.

28.

Buckreis

Stewart

Brandenberg

Wang

(2024a) Small-strain site response of soft soils in the Sacramento-San Joaquin Delta region of California conditioned on VS30 and mHVSR. . Earthquake Spectra 40(1): 230–260.

29.

Buckreis

Wang

Stewart

Brandenberg

Driller

(2025a) Analysis of near-surface V_S data for site response applications in the Sacramento-San Joaquin Delta in California, Environmental & Engineering Geoscience 31(3): 177–192.

30.

Butler

Daly

Doyle

Gillies

Hagen

Schaub

(2016) RFC 7946: The GeoJSON Format. DOI: 10.17487/RFC7946.

31.

Cabas

Lorenzo-Velazquez

Abayo

Ramirez

Garcia

Pérodin

Hwang

Y-W

Dashti

Ganapati

Nicolas

Whitworth

MRZ

Guerrier

Fleur

Contreras

Lagesse

Marcelin

Remington

(2023) Intersectional impacts of the 2021 MW 7.2 Nippes, Haiti, Earthquake from geotechnical and social perspectives. Bulletin of the Seismological Society of America 113(1): 73–98.

32.

Campbell

Bozorgnia

(2014) NGA-West2 ground motion model for the average horizontal components of PGA, PGV, and 5%-damped linear acceleration response spectra. Earthquake Spectra 30: 1087–1115.

33.

Castro

Benavente

Crempien

JGF

Candia

de la Llera

(2022) A consistently processed strong-motion database for Chilean earthquakes. Seismological Research Letters 93(5): 2700–2719.

34.

Chiou

(2021) P4CF [Source code]. Available at: https://github.com/bc88bc/P4CF. Accessed June 9, 2025.

35.

Chiou

Darragh

Gregor

Silva

(2008) NGA project strong-motion database. Earthquake Spectra 24(1): 23–44.

36.

Chiou

BS-J

Youngs

(2008) NGA Model for the Average Horizontal Component of Peak Ground Motion and Response Spectra (PEER Report 2008/10). Berkeley, CA: Pacific Earthquake Engineering Research Center, University of California.

37.

Chiou

BS-J

Makdisi

Youngs

(2000) Style-of faulting and footwall/hangingwall effects on strong ground motion. FY 1995 NEHRP Award Number 143-95-G-2614, Final Report, p. 21.

38.

Choi

(2024) Enhancement of shear wave velocity profile database and application for VS30 estimation in Utah. MS thesis, University of California, Los Angeles, Los Angeles, CA.

39.

Contreras

Stewart

Kishida

Darragh

Chiou

BS-J

Mazzoni

Youngs

Kuehn

Ahdi

Wooddell

Boroschek

Rojas

Ordenes

(2022) NGA-Sub source and path database. Earthquake Spectra 38(2): 799–840.

40.

Cox

Beekman

(2011) Intramethod variability in ReMi dispersion measurements and Vs estimates at shallow bedrock sites. Journal of Geotechnical and Geoenvironmental Engineering 137(4): 354–362.

41.

Dai

Hou

(2013) A shear-wave velocity model for V_S30 estimation based on a conditional independence property, Bulletin of the Seismological Society of America, 103(6): 3354-3361. doi: 10.1785/0120130025

42.

Dreger

Huang

M-H

Rodgers

Taira

Wooddell

(2015) Kinematic finite source model for the 24 August 2014 South Napa, California, earthquake from joint inversion of seismic, GPS, and InSAR data. Seismological Research Letters 86(2A): 327–334.

43.

Ekström

Nettles

Dziewonski

(2012) The global CMT project 2004-2010: Centroid moment tensors for 13,017 earthquakes. Physics of the Earth Planetary Interiors 19: 200–201.

44.

Electrical Power Research Institute (EPRI) (1988) A Criterion for Determining Exceedance of the Operating Basis Earthquake. EPRI NP-5930, Palo Alto, CA.

45.

Farghal

Baltay

Langbein

(2020) Strain-estimated ground motions associated with recent earthquakes in California. Bulletin of the Seismological Society of America 110(6): 2766–2776.

46.

Farr

Kobrick

(2000) Shuttle Radar Topography Mission produces a wealth of data. EOS Transactions American Geophysical Union 81: 583–585.

47.

Gardner

Knopoff

(1974) Is the sequence of earthquakes in southern California, with aftershocks removed, Poissonian? Bulletin of the Seismological Society of America 64: 1363–1367.

48.

Goldberg

Taumaz

Reitman

Hatem

Yolsal-Çevikbilen

Barnhart

Irmak

Wald

Öcalan

Yeck

Özkan

Thompson-Jobe

Shelly

Thompson

DuRoss

Earle

Briggs

Benz

Erman

Doğan

Altuntaş

(2023) Rapid characterization of the February 2023 Kahramanmaraş, Türkiye, earthquake sequence. The Seismic Record 3(2): 156–167.

49.

Gomez

DFC

Vanacore

Cruzado

JAM

Suárez

(2022) Preliminary crustal GMPE for Puerto Rico after the 2020 SWPR seismic sequence. AGU Fall Meeting, 16 December.

50.

Goulet

Kishida

Ancheta

Cramer

Darragh

Silva

Hashash

YMA

Harmon

Parker

Stewart

Youngs

(2021) PEER NGA-East database. Earthquake Spectra 37: 1331–1353.

51.

Hearne

Thompson

Schovanec

Rekoske

Aagaard

Worden

(2019) USGS automated ground motion processing software. USGS Software Release. Earthquake Hazards Program, Reston, VA.

52.

Hutchinson

Zhu

Bradley

Lee

Wotherspoon

Dupuis

Schill

Motha

Manea

Kaiser

(2024) The 2023 New Zealand ground-motion database. Bulletin of the Seismological Society of America 114(1): 291–310.

53.

Inc

(2014) fuzzywuzzy: Fuzzy String Matching in Python [Source Code]. Available at: https://pypi.org/project/fuzzywuzzy/0.18.0/. Accessed July 2021.

54.

Incorporated Research Institutions for Seismology (IRIS) (2012) Standard for the Exchange of Earthquake Data (SEED) Reference Manual. Washington, DC: Incorporated Research Institutions for Seismology.

55.

Iwahashi

Pike

(2007) Automated classifications of topography from DEMs by an unsupervised nested-means algorithm and a three-part geometric signature. Geomorphology 86: 409–440.

56.

Iwahashi

Kamiya

Matsuoka

Yamazaki

(2018) Global terrain classification using 280 m DEMs: segmentation, clustering, and reclassification. Progress in Earth and Planetary Science 5: 1–31.

57.

Cabas

Kottke

Pilz

Macedo

Liu

(2023) A Designsafe earthquake ground motion database for California and surrounding regions. Earthquake Spectra 39(1): 702–721.

58.

Kato

Takemura

Uchiyama

Iizuka

Nigbor

(1998) Borrego Valley downhole array in southern California: instrumentation and preliminary site effect study. In: Proceedings of the 2Nd International Symposium on the Effects of Surface Geology on Seismic Motion, Yokohama, Japan, 1–3 December.

59.

Kishida

Mazzoni

Bozorgnia

Chiou

Darragh

Haddadi

Kayen

Markham

Muin

Silva

(2014) Investigation of ground motions recorded during the 2014 South Napa earthquake. In: Proceedings SMIP16 Seminar on Utilization of Strong Motion Data, California Strong Motion Instrumentation Program, Irvine, CA, 6 October.

60.

Kottke

(2020) arkottke/pykooh: v0.3.1 (2020-07-17). Available at: https://doi.org/10.5281/zenodo.3950071

61.

Kottke

Abrahamson

Boore

Bozogrnia

Goulet

Hollenback

Kishida

Ktenidou

Rathje

Silva

Thompson

Wang

(2021) Selection of random vibration theory procedures for the NGA-East project and ground motion modeling. Earthquake Spectra 37(S1): 1420–1439.

62.

Kramer

Mitchell

(2006) Ground motion intensity measures for liquefaction hazard evaluation. Earthquake Spectra 22(2): 413–438.

63.

Kramer

Stewart

(2025) Geotechnical Earthquake Engineering. 2nd edn. New York: Taylor & Francis, p. 1060.

64.

Krüger

(2004) Why you should use a relational database instead of a spreadsheet? Cybernetics and Systems: An International Journal 35: 683–696.

65.

Kuyuk

Allen

(2013) Optimal seismic network density for earthquake early warning: A case study from California. Seismological Research Letters 84(6): 946–954.

66.

Kwak

Ahdi

Wang

Zimmaro

Brandenberg

Stewart

(2021) Web Portal for Shear Wave Velocity and HVSR Databases in support of Site Response Research and Applications. Los Angeles, CA: UCLA Geotechnical Engineering Group.

67.

Kwak

Ancheta

Mitra

Ahdi

Zimmaro

Parker

Brandenberg

Stewart

(2017a) Performance evaluation of VSZ-to-VS30 correlation methods using global VS profile database. In: Proceedings of the 3rd International Conference on Performance-Based Design in Earthquake Geotechnical Engineering (eds Taiebat

Wijewickreme

Athanasopoulos-Zekkos

Boulanger

(eds), Paper No. 399, Vancouver, BC, Canada.

68.

Lee

Chen

Jordan

Maechling

Denolle

MAM

Beroza

(2014) Full 3-D tomography for crustal structure in Southern California based on the scattering-integral and adjoint-wavefield methods. Journal of Geophysical Research: Solid Earth 119: 6421–6451.

69.

Leonard

(2014) Self-consistent earthquake fault-scaling relations: update and extension to stable continental strike-slip faults. Bulletin of the Seismological Society of America 104(6): 2953–2965.

70.

Louie

(2001) Faster, better: Shear-wave velocity to 100 meters depth from Refraction Microtremor Arrays. Bulletin of the Seismological Society of America 91(2): 347–364.

71.

Ludington

Moring

Miller

Flynn

Hopkins

(2005) Preliminary integrated databases for the United States—Western States: California, Nevada, Arizona, and Washington: U.S. Geological Survey Open-file Report OFR 2005-1305, U.S. Geological Survey, Reston, VA.

72.

Luzi

Puglia

Russo

D’Amico

Felicetta

Pacor

Lanzano

Çeken

Clinton

Costa

Duni

Farzanegan

Gueguen

Ionescu

Kalogeras

Özener

Pesaresi

Sleeman

Strollo

Zare

(2016) The engineering strong-motion database: A platform to access Pan-European accelerometric data. Seismological Research Letters 87(4): 987–997.

73.

McDonald

Ashland

(2008) Earthquake Site-Conditions Map for the Wasatach Front Urban Corridor. Utah Geological Survey Special Study 125, Utah Geological Survey, Salt Lake City, UT.

74.

McPhillips

Herrick

Ahdi

Yong

Haefner

(2020) Updated Compilation of VS30 Data for the United States. Reston, VA: U.S. Geological Survey.

75.

Magistrale

Olsen

Pechmann

(2008) Construction and verification of a Wasatch front community velocity model. Technical Report No. HQGR.060012, 14 pp. Reston, VA: US Geological Survey.

76.

Margaris

Scordilis

Stewart

Boore

Theodoulidis

Kalogeras

Melis

Skarlatoudis

Klimis

Seyhan

(2021) Hellenic strong-motion database with uniformly assigned source and site metadata for the periods 1972 – 2015. Seismological Research Letters 92(3): 2065–2080.

77.

Mohammed

Buckreis

Shams

Nweke

Brandenberg

Stewart

(2025) Data files for San Francisco Bay Area (SFBA) Ground Motion Study. West Lafayette, IN: Designsafe-ci.

78.

Mohammed

Shams

Nweke

Buckreis

Kohler

Bozorgnia

Stewart

(2024) Usability of Community Seismic Network recordings for ground motion modeling. Earthquake Spectra 40(4): 2598–2622.

79.

Moschetti

Aagaard

Ahdi

Altekruse

Boyd

Frankel

Herrick

Petersen

Powers

Rezaeian

Shumway

Smith

Stephenson

Thompson

Withers

(2024) The 2023 US national seismic hazard model: Ground-motion characterization for the conterminous United States. Earthquake Spectra 40(2): 1158–1190.

80.

Nweke

Stewart

Wang

Brandenberg

(2022) Site response of sedimentary basins and other geomorphic provinces in southern California. Earthquake Spectra 38(4): 2341–2370.

81.

Ohno

Ohta

Ikerua

Takemura

(1993) Revision of attenuation formula considering the effect of fault size to evaluate strong motion spectra in near field. Tectonophysics 218: 69–81.

82.

Palmer

Atkinson

Ladak

Molnar

Ghofrani

(2022) Eastern Canada Earthquake Ground Motion and Site Characterization Database. West Lafayette, IN: DesignSafe-CI.

83.

Parker

Baltay

(2022) Empirical map-based nonergodic models of site response in the Greater Los Angeles Area. Bulletin of the Seismological Society of America 112(3): 1607–1629.

84.

Parker

Harmon

Stewart

Hashash

YMA

Kottke

Rathje

Silva

Campbell

(2017) Proxy-based VS30 estimation in Central and Eastern North America. Bulletin of the Seismological Society of America 107(1): 117–131.

85.

R Core Team (2022) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available at: https://www.R-project.org/, Accessed June 2022.

86.

Ramos-Sepúlveda

Brandenberg

Buckreis

Parker

Stewart

(2025) High-pass corner frequency selection and review tool for use in ground-motion processing. Seismological Research Letters doi: 10.1785/0220240265.

87.

Ramos-Sepulveda

Stewart

Parker

Moschetti

Thompson

Brandenberg

Hashash

YMA

Rathje

(2024) Data-driven adjustments for combined use of NGA-East hard-rock ground motion and site amplification models. Earthquake Spectra 40(2): 1132–1157.

88.

Rekoske

Moschetti

Thompson

(2022) Basin and site effects in the U.S. Pacific Northwest estimated from small-magnitude earthquakes. Bulletin of the Seismological Society of America 112(1): 438–456.

89.

Rekoske

Thompson

Moschetti

Hearne

Aagaard

Parker

(2020) The 2019 Ridgecrest, California, earthquake sequence ground motions: Processed records and derived intensity metrics. Seismological Research Letters 91(4): 2010–2023.

90.

Rennolet

Moschetti

Thompson

Yeck

(2018) A flatfile of ground motion intensity measurements from induced earthquakes in Oklahoma and Kansas. Earthquake Spectra 34(1): 1–20.

91.

Rowley

Williams

Vice

Maxwell

Hacker

Snee

Mackin

(2006) Interim geologic map of the Cedar City 30’ x 60’ quadrangle, Iron and Washington counties, Utah. Utah Geological Survey Open-File Report 476DM, Utah Geological Survey, Salt Lake City, UT.

92.

Sandikkaya

Güryuva

Kale Okçu1

ÖO

İçen

Yenier

Akkar

(2024) An updated strong-motion database of Türkiye (SMD-TR). Earthquake Spectra 40(1): 847–870.

93.

Shams

Buckreis

Nweke

Brandenberg

Stewart

(2025) Proxy-based estimation of depth to 1.0 km/s shear wave isosurface conditional on surface geology for California sites. In: Geo-extreme 2025 Conference, ASCE Geoinstitute, Long Beach, CA, 2–5 November.

94.

Shaw

Plesch

Tape

Suess

Jordan

Ely

Hauksson

Tromp

Tanimoto

Graves

Olsen

Nicholson

Maechling

Rivero

Lovely

Brankman

Munster

(2015) Unified Structural Representation of the southern California crust and upper mantle. Earth and Planetary Science Letters 415: 1–15.

95.

Smith

Roe

(2015) Oregon geologic data compilation (OGDC)—Release 6, Oregon Department of Geology and Mineral Industries (DOGAMI). Open-File Rept. O-13-12, Electronic Dataset. https://pubs.oregon.gov/dogami/dds/p-OGDC-6.htm.

96.

Stafford

Rodriguez-Marek

Edwards

Kruiver

Bommer

(2017) Scenario dependence of linear site-effect factors for short-period response spectral ordinates. Bulletin of the Seismological Society of America 107: 2859–2872.

97.

Thingbaijam

Mai

Goda

(2017) New empirical earthquake-source scaling laws. Bulletin of the Seismological Society of America 107(5): 2225–2246.

98.

Thompson

(2018) An updated Vs30 map for California with Geologic and Topographic Constraints. Reston, VA: U.S. Geological Survey.

99.

Thompson

Boyd

Churchwell

Moschetti

(2023b) Central and Eastern U.S. Earthquake Ground Motion Database: 2010 To 2020. Sacramento, CA: Center for Engineering Strong Motion Data (CESMD).

100.

Thompson

Hearne

Aagaard

Rekoske

Worden

Moschetti

Hunsinger

Ferragut

Parker

Smith

Kottke

(2025) Automated, near real-time ground-motion processing at the U.S. Geological Survey. Seismological Research Letters 96(1): 538–553.

101.

Trabant

Hutko

Bahavar

Karstens

Ahern

Aster

(2012) Data Products at the IRIS DMC: Stepping Stones for Research and Other Applications. Seismological Research Letters 83(5): 846–854.

102.

US Geological Survey (USGS) Earthquake Hazards Program (2017) Advanced National Seismic System (ANSS) Comprehensive Catalog of Earthquake Events and Products. Reston, VA: U.S. Geological Survey.

103.

Wald

Allen

(2007) Topographic slope as a proxy for seismic site conditions and amplification. Bulletin of the Seismological Society of America 97: 1379–1395.

104.

Wang

(2020) Predictability and repeatability of non-ergodic site response for diverse geological conditions. PhD Thesis, University of California, Los Angeles, Los Angeles, CA.

105.

Wang

Stewart

Bozorgina

Boore

Kishida

(2017) R Package for Computation of Earthquake Ground Motion Response Spectra. Report No. 2017/09. PEER, University of California, Berkeley, Berkeley, CA.

106.

Wells

Coppersmith

(1994) New empirical relationships among magnitude, rupture length, rupture width, rupture area, and surface displacement, Bulletin of the Seismological Society of America 84(4): 974–1002. doi: 10.1785/BSSA0840040974

107.

Wills

Gutierrez

Perez

Branum

(2015) A next generation VS30 map for California based on geology and topography. Bulletin of the Seismological Society of America 105: 3083–3091.

108.

Wong

Darragh

Smith

Silva

Kishida

(2022) Ground motion model for shallow crustal and deep earthquakes in Hawaii and analyses of the 2018 M6.9 Kalapana sequence. Earthquake Spectra 38(1): 579–614.

109.

Wooddell

Abrahamson

(2014) Classification of mainshocks and aftershocks in NGA-West 2. Earthquake Spectra 30(3): 1257–1267.

110.

Mohanna

Meng

Ampuero

J-P

Yunjun

Hasnain

Chu

Liang

(2023) The overall-subshear and multi-segment rupture of the 2023 Mw7.8 Kahramanmaraş, Turkey earthquake in millennia supercycle. Communications Earth & Environment 4: 379.

111.

Sandwell

Smith-Konter

(2020) Coseismic displacements and surface fractures from Sentinel-1 InSAR: 2019 Ridgecrest earthquakes. Seismological Research Letters 91(4): 1979–1985.

112.

Yong

(2016) Comparison of measured and proxy-based VS30 values in California. Earthquake Spectra 32: 171–192.

113.

Yong

Hough

Iwahashi

Braverman

(2012) A terrain based site conditions map of California with implications for the contiguous United States. Bulletin of the Seismological Society of America 102: 114–122.

114.

Zadeh

(1988) Fuzzy logic. Computer 21(4): 83–93.

A Global Application Programming Interface–Enabled Earthquake Ground Motion Relational Database for Engineering Applications

Abstract

Keywords

Introduction and project motivation

Ground motion datasets

Data selection, processing, and distributions

Intensity measures

Metadata

Seismic-source parameters

Site / station parameters

Path parameters

Relational database

Database structure

Event tables

Site / station tables

Ground motion tables

Auxiliary information tables

Junction tables

Version control

Application programming interface

Summary and recommendations

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

Data Resources

References