Abstract
Introduction
Numerous scholars have proposed solutions to the problem of the reconstruction of seismic data. These methods can be broadly summarized as predictive filter-based methods, sparse transform-based methods, low-rank matrix-based methods, fluctuating equation-based methods, and data-driven machine-learning-based algorithms. The reconstruction method of seismic data based on predictive filtering has the characteristics of high reconstruction accuracy and fast computation. Therefore, it is widely used in practical industrial production. Claerbout implemented an interpolation method for prediction error filtering in the T-X (time-space) domain, but the reconstruction accuracy is limited by the number of iterations, resulting in an overly large computational effort. 1 Spitz proposed an F-X (frequency-space) predictive filter interpolation algorithm based on the linear predictability of the data in the F-X domain to extract the autoregressive coefficients of different frequency components to achieve a two-dimensional reconstruction of seismic data. 2 The reconstruction method of seismic data based on the sparse transform class is reconstructed by applying some mathematical transform to the data and using the characteristics of the data in some transform domain. Common transform methods are Fourier transform,3–5 Radon transform,6,7 Curvelet transform, 8 and Seislet transform.9,10 The reconstruction method based on various sparse transformation algorithms can reconstruct the missing data with high accuracy and does not require other geophysical a priori information, but the method is less resistant to false frequencies. The reconstruction method for seismic data based on low-rank matrix classes assumes that the original unmissed seismic data should be of low rank and that data undersampling leads to an increase in matrix rank. The current common solution idea is to reconstruct by building a low-rank arrangement or minimizing nuclear norm. Oropeza et al. 11 completed the data reconstruction by building Hankel matrices and performing matrix rank reduction using singular value decomposition and convex set projection algorithms. Emmanuel et al. 12 completed the reconstruction of irregular data by transforming the matrix rank reduction process into a nuclear norm minimization problem for solving it. Liu et al. 13 optimized the matrix rank reduction process to improve the computational speed of the algorithm by accelerating the operations using an accelerated proximal gradient descent method. Zhang et al. 14 proposed a hybrid mathematical model of rank reduction and sparse promotion and used this model to reconstruct 3D seismic data. Wang et al. 15 proposed a joint sparse facilitation method to recover irregular data sampled from non-isometric grids with good results.
With the development of machine learning field in recent years, deep network models have achieved good results in seismic data processing, imaging, and inversion.16–22 Some scholars have applied deep learning to reconstruct seismic data. the main idea is to construct a nonlinear mapping relationship solved from missing data to complete data by the network. Wang et al. 23 used a deep residual network (ResNet) to learn the mapping relationship between rule-deficient data and nondeficient data, which solved the rule-data interpolation problem but was not effective for irregular deficiencies. Meng et al. 24 proposed a seismic data reconstruction method based on unsupervised ResNet. This method does not need a complete seismic data set or a training set, which solved the label problem, and is applicable to the rule-data interpolation problem. Generative adversarial network (GAN) 25 showed strong ability in data reconstruction and Alwon et al. 26 achieved better results using generative adversarial models for in-channel interpolation of rule-sampled data. Kaur et al. 27 further proposed the use of generating an antagonistic network to reconstruct 3D seismic data. Due to the instability and susceptibility to pattern collapse in the training process of GAN, Alexey Dosovitskiy et al. 28 added the traditional L1/L2 reconstruction loss to the original error function of GAN to construct a hybrid objective function for optimization, in which the work of the discriminator remains unchanged. The generator should not only cheat the discriminator but also ensure that the generated data is as similar as possible to the original data. The error of the GAN is used to compensate for the ambiguity and lack of high-frequency information of the L1/L2 reconstruction loss, while the L1/L2 reconstruction loss provides the training stability required for the convergence of the GAN error. Phillip Isola et al. 29 discuss the difference between adding L1 and L2 norm reconstruction loss to the original GAN error function and show that the L1 norm reconstruction error introduces less ambiguity. Besides, Marano G Carlo et al. 30 put forward a recent review on the application of GANs in earthquake-related geophysical research, explained the recent research on the earthquake event generated by GAN synthesis based on artificial intelligence, and emphasized the advantages and limitations of the current adversarial learning research applied to seismology.
In fact, GANs are essentially solving a multi-solvability inverse problem, so achieving both network training stability and multi-modal output should be considered. Although the current approach of using the joint optimization of GAN loss and reconstruction loss has been shown to make the network more stable, it brings the side effect of not being able to learn the multi-modal case with complete data distribution. In this article, we adopt a new moment reconstruction loss alternative reconstruction loss proposed by Soochan Lee. 31 This error function ensures the stability of network training without degrading the fit and multi-modality of data reconstruction. The core idea is to use the maximum likelihood estimation loss to predict the sample moments of the true data distribution. The GAN training is then aided by minimizing the sample moments of the generated distribution and the predicted sample moments, rather than the L1/L2 loss in most algorithms that directly calculate the input data versus output data amplitudes.
As far as we know, GANs based on moment reconstruction error constraints have not yet been applied to geophysical research. The rest of this article is arranged as follows. In the second section, the basic principle of GAN is introduced, and the details of moment reconstruction loss-constrained GAN (MR-GAN) interpolation method are described. In the third section, the interpolation results based on Marmousi2 theoretical data and actual data are given, and compared with the traditional multi-domain transformation methods and GAN method. Finally, the fourth section concludes.
Theory and method
Principle of GANs
GAN was proposed by Google in 2014. The main idea is to build two neural networks to learn against each other. The generator G is used to fit the sample distribution, and G wants to learn samples sampled close to the true distribution to fool the discriminator D. The discriminator D is used to distinguish whether the data is sampled from the true distribution or from the distribution predicted by G. The performance of both improves over the iterations, until the discriminator has difficulty distinguishing whether the data are sampled from the data distribution or from the distribution predicted by the generator reaches the optimum.
The objective function of the GAN is defined as the minimization of the distance measure between the true data distribution and the predicted data distribution. For the optimal generator, the minimum–maximum optimization scheme of equation (1) is used to train the GAN.
Compared with other generative models, GAN's optimization approach does not require pre-modeling and does not require knowing in advance what distribution the data obeys. Instead, it samples directly from an unknown distribution, thus theoretically approximating the true data completely, which is the biggest advantage of GAN. However, the disadvantage of this method which does not require pre-modeling is that it is too free, resulting in a very unstable training process. During the model optimization process, it often happens that the generated data differs too much from the desired results or even has no relationship at all.
To improve the quality of the samples generated by the GAN model, a common optimization method is to introduce the regular term constraint of the conventional generative network error function based on the adversarial loss function and combine the advantages of both to optimize the model.
The objective function in conventional generative networks is generally defined as the difference in distance between each point of the generated data and the true data. This is expressed in the seismic data interpolation as the difference in amplitude reconstruction for each sampling point. By minimizing this objective function, the sampling points in the generated data that fluctuate excessively from the original data can be penalized to improve the quality of the GAN-generated samples and accelerate the convergence. The amplitude reconstruction loss
The final mixing error function is:
Mismatch between amplitude reconstruction loss and antagonistic loss
Then, we will show why the existing hybrid error functions are not joint optimization. Here, taking L2 loss as an example, according to the deviation variance decomposition formula,
32
the L2 error function can be decomposed into three terms, as shown in equation (5):
As shown in equation (6), when the mixing error (equation (4)) is optimal, we denote
Sample moment reconstruction loss
To make the definition of amplitude reconstruction loss consistent with the mathematical meaning of the maximum likelihood of GAN loss without losing the multi-modal nature of the model output, the sample moment reconstruction (MR) loss is used instead of the conventional amplitude reconstruction loss in this article, and the maximum likelihood is applied to the sample moments instead of being used directly to reconstruct the amplitude differences of the data. By imposing certain prior constraints on the data distribution, the sample moments correspond to the mean and variance, assuming that the data distribution obeys a Gaussian distribution. As shown in Figure 1, the overall architecture of MR–GAN follows that of GAN, but the difference is that the generator of MR–GAN generates

(a) Traditional GAN. (b) MR-GAN.
The final MR loss is defined as equation (8), replacing the amplitude reconstruction loss in equation (4) with the sample moment reconstruction loss, and the final MR loss is the weighted sum of the GAN loss and MR loss.
Moment reconfiguration loss-constrained GAN (Mr-GAN) data interpolation processing
To verify the practicality of the aforementioned method, a new error function was constructed to implement the method. The network model is divided into two parts: the generative model and the discriminative model, where the generator completes the process of reconstructing the data, using a network architecture based on the U-net structure,
33
which consists of 12 layers. The discriminator completes the evaluation of the reconstructed data and consists of five layers. The size of the convolution kernel of each layer is
Since the data-driven deep learning algorithm is sensitive to data sample features, the reconstruction effect may be affected when the variability of training data and test data features is large. To keep good feature consistency between the data to be interpolated and the training data, we construct training samples for network training by undersampling the data to be interpolated again. Before the seismic data is input, the large data is cut into 128 × 128 size blocks along the time direction and spatial direction. In the cut, the adjacent time windows overlap by 32 samples, and the average amplitude value is averaged over the overlapped area at the output to eliminate the boundary differences. All data blocks are normalized before entering the network, reconstructed, and then back-normalized to the original amplitude.
Experiment
In the following experiments, the experiments will be conducted for the regular missing and random missing data. We validated the improved MR-GAN method against the L2+GAN method as well as the F-K domain POCS (projection on convex sets) method and the curvelet domain POCS method.
To illustrate the interpolation performance of the method in this article for seismic data, three judging methods are used: mean square error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM). The three judging methods are calculated as shown in equation (10), where
Rule missing seismic data interpolation
The regular undersampled data are first interpolated, and only the case of 1 missing trace between every two traces is considered here. The data were obtained using the finite difference forward of the acoustic wave equation on the classical Marmousi2 model (after re-mining, which is shown in Figure 2). The forward parameters were chosen to be 50 Hz Ricker wavelet, and a total of 801 geophones were arranged with a trace spacing of 10 m and a shot spacing of 10 m. There were a total of 561 shots, each with 241 traces of reception, a sampling interval of 2 ms, and 501 sampling points per trace. A 50% rule deficiency is applied to the total number of geophones and then applied to the recording trace of each shot. Since the reconstruction of the direct wave is not required and the near-bias strong amplitude of the direct wave will affect the data preprocessing process, the data are excised before being input into the network and then input into the trained MR-GAN network. The results of the data reconstruction are shown in Figure 3.

Marmousi2 model after remining.

50% rule missing MR-GAN reconstruction results. (a) Original data, (b) 50% rule missing data, (c) MR-GAN reconstruction results, (d) reconstruction residuals.
The results show that the data are consistent with the original data after MR–GAN interpolation, with smooth and continuous homophase axes, good correlation with adjacent traces, and reconstructed PSNR of 22.171 dB. From the residual profile, the residual energy is very small, with only a slight error at the homophase axis where the curvature is large.
Random undersampling seismic data interpolation
The interpolation results of 50% random missing for the Marmousi2 data are shown in Figure 4. The MR-GAN network can also reconstruct the randomly sampled data, but it can be seen from the residual plots that the recovery residuals still contain some information residuals, and the final reconstruction PSNR is 18.464 dB.

50% random missing MR-GAN reconstruction results. (a) Original data, (b) 50% random missing data, (c) MR-GAN reconstruction results, (d) reconstruction residuals.
It can also be seen that the seismic traces in the middle of the missing region are less correlated than those near the missing boundary. The reconstruction results of the consecutive multi-trace missing regions show that the more neighboring sampled traces, the easier it is for the network to interpolate the missing data, while the correlation difference between the seismic traces near the missing boundary and the missing intermediate traces increases with the complexity of the interpolated region. Overall, the MR-GAN network shows better performance for processing rule-deficient data for 50% rule-deficient versus 50% random-deficient data. The continuous absence of data makes the algorithm receive less a priori information, and the randomness of the missing data increases the difficulty for the algorithm to learn its data distribution, while for regular sampling, the algorithm is more likely to capture the distribution characteristics of the data.
Comparison of interpolation effects of different methods
The interpolation results obtained using the MR-GAN network in this article are compared with the interpolation results of the network optimized using the L2+GAN error function in the following, as well as the conventional data interpolation methods, namely the F-K domain POCS method and the curvelet domain POCS method. A 50% trace of the original data is randomly missing, which can simulate a real non-uniform acquisition scene, and the reconstruction effect of the four methods is shown in Figure 5.

Comparison of the effects of different interpolation methods. (a) Original data, (b) 50% random missing data, (c) F-K domain POCS reconstruction results, (d) Curvelet domain POCS reconstruction results, (e) GAN+L2 reconstruction results, (f) GAN+MR reconstruction results, (g) F-K domain POCS reconstruction residuals, (h) Curvelet domain POCS reconstruction residuals, (i) GAN+L2 reconstruction residuals, (j) GAN+MR reconstruction residuals.
From these four types of reconstruction results and residual plots, it can be seen that the F-K domain POCS method has a large error after reconstruction, many valid information residues can still be seen in the residuals, and a lot of random noise is generated. The reconstruction results of the curvelet domain POCS method are better, although the MSE and PSNR evaluation indexes of the reconstruction are not particularly outstanding among the various methods, the hyperbolic homography axis of the reconstructed data has a small residual difference from the original single gun, and the overall reconstruction has good amplitude preservation. GAN+L2 error-optimized network reconstructed seismic traces differ from the adjacent known amplitudes, and there are local discontinuities in the adjacent trace hyperbolas, which exhibit missing boundary differences on the grayscale map, and individual traces are not recovered. In contrast, GAN+MR method used in this article, the reconstructed results have smaller residuals from the original data, and there is no effective information residual of the rules, and because the trained network parameters have been saved, the reconstructing process is equivalent to a known analytic solution, so the computation time is less. To more clearly reflect the difference in amplitude preservation of the methods for the data, 141 of the missing traces were extracted to show the amplitudes, as shown in Figure 6. In addition, the evaluation parameters of MSE, PSNR and SSIM are shown in Table 1, which proves the overall superior performance of the method in this article. Table 1 also gives the training time for the different methods and the processing time for processing this single-shot data. The training time is the time to train on the training set constructed as described previously, and no additional training was performed in the interpolation of this data. Although the GAN+MR method is more expensive to train than the GAN+L2 method, it does not have an impact on the true data interpolation processing time. The MR-GAN network approach for continuously missing regions has a slight error in the residual profiles, but it still produces closer to the true seismic data without generating additional random noise or appearing to interpolate trace ambiguities. The numerical experiments are performed using PyTorch as the backend on a PC with Windows 10 and NVIDIA GeForce RTX 1080Ti. The size of a single input data is 128*128, and there are 5000 training data, and 50 Epoch is trained. ADAM optimizer is used to train the network, and its superparameter setting (batch_size = 8, initial learning rate LR = 1e-3, decay_weight = 1e-5) is adopted.

Comparison plots of 141 traces for different interpolation methods. (a) F-K domain POCS method, (b) curvelet domain POCS method, (c) GAN+L2 method, (d) GAN+MR method.
Evaluation indexes of different interpolation methods.
Actual data interpolation
To verify the applicability of the proposed method to the post-stack data, an inline of a post-stack profile was processed with 50% random missing using the method in this article, and the reconstructed results are shown in Figure 7.

Actual post-stack data interpolation. (a) Original data, (b) 50% random missing data, (c) reconstruction results of GAN+MR method, (d) reconstruction residuals of GAN+MR.
It can be seen that the slope topographic reflections and parallel strata reconstructed by the MR-GAN network are almost indistinguishable from the original data in the shallow regions of the profile, but in the deeper regions where the structure is more complex, the reflection clutter and the presence of some random noise interference make the network interpolation difficult in the deeper steep reflections. In addition, sometimes there is a resolution difference between the original data and the interpolated data, and the record traces generated by the network present lower resolution and some non-geological artifacts. For example, the interpolated data appear as brighter and darker streaks at the missing boundary. In general, the interpolated data of MR–GAN network correlated well with the original data, and the residuals also basically showed irregular noise, and the method still showed good performance in processing the post-stack data.
Conclusions
In this article, we demonstrate the incoherence of conventional GAN loss plus L2 amplitude reconstruction loss. To prevent the phenomenon of pattern collapse in model training, we generate adversarial network training using the maximum likelihood constraint of distributed sample moments, and the model can more closely approximate the true distribution of the data because the moment reconstruction loss has the same mathematical meaning as the maximum likelihood estimate defined by GAN loss.
The experimental tests on the reconstruction problem of seismic data under two cases of regular missing and random missing, comparing other interpolation methods, and different reconstruction quality control parameters all reflect the advantages of the MR-GAN network method and verify the effectiveness of the method in seismic data interpolation. Although most of the current seismic data interpolation using deep learning uses deep prior networks for unsupervised learning, we would like to clarify that the method in this article only improves the error function in the generator optimization process, so it can be extended to deep prior GANs for unsupervised learning as well. In addition, this moment reconstruction loss-constrained GAN can be applied not only in data interpolation, but also for any conditional generation task, and can be easily extended to most GANs since it does not modify the network structure but only the error function.
Finally, we want to say that although the improved method has achieved the ideal interpolation effect, in some cases, the algorithm still needs to be further improved. For example, there is still room for improvement when there are too many continuous deletions. In addition, when there are obvious differences between the data to be reconstructed and the training data, the reconstruction error will increase. In the process of actual data processing, how to expand a large number of training data related to the actual data is also a problem to be solved. In the future, we can consider establishing corresponding data set training according to the data characteristics of different work areas, instead of applying a unified training set to all data. There are many research directions for generating countermeasure networks, and this article only discusses some of them. Other aspects, such as network stability, optimization algorithm, and network architecture, have not been discussed too much, so they can only be used as follow-up research content.
Highlights
This method is based on the error constraints of moment reconstruction. Due to the adoption of a data-driven deep learning method, it does not require any assumptions, and has good processing results for both regular and irregular missing data.
Compared with conventional GAN and other traditional methods, the proposed method has higher reconstruction accuracy and better amplitude preservation.
