Abstract
Introduction
Minimizing the sum of squared residuals is a common estimation method for regression functions, but regression functions are not always known or of interest, particularly when motivated by an underlying structural equation. If a distribution for random terms is specified, maximizing the likelihood function is an alternative estimation method that does not require an explicit regression function. However, cases can arise in which the regression function is not known, no additional moment conditions are indicated, and we have a distribution for the random quantities, but maximum likelihood estimation is difficult to implement. This article presents a simulated moments estimation method for such cases similar to the simulation-based extensions of other common estimators such as maximum simulated likelihood and simulated scores (Gourieroux & Monfort, 1993; McFadden, 1989; McFadden & Ruud, 1994; Stern, 1997; Train, 2003).
Consider a simple example in which a response

Graphs of the equation
In the following section, we present the estimator and its asymptotic properties. We then present a Monte Carlo investigation of finite sample properties via two examples. In the final section, we discuss implications and limitations of the LSSE estimator.
LSSE
To understand the LSSE estimator and its asymptotic properties, consider a response or dependent variable
with
The function
where the integral represents the regression and ε
Monte Carlo integration provides an unbiased estimator,
The term η
Denoting the simulated residual as
and, for sample size
For
The LSSE estimator is the value of θ that minimizes the sample residual vector relative to the metric determined by
The large-sample properties of the LSSE estimator are established by showing that the LSSE estimator is asymptotically equivalent to the standard nonlinear least squares estimator (NLS). Essentially, for each property considered, it can be shown that the key quantities of interest can be expressed as the standard NLS estimator plus a simulation bias that is asymptotically zero.
As shown in Appendix A, we see that if the assumptions underlying consistency and normality of the NLS estimator hold for the model under consideration, then for increasing number of Monte Carlo iterations used in the numeric integration (
Simulation Experiments
In this section, we present two Monte Carlo experiments to show how the LSSE estimator works. The first experiment presents the estimation of parameters for a simple model, and the second experiment presents the estimation of the parameters for a more complex model. We then extend the second experiment by including 10 regressors using a large sample size to show how the LSSE estimator can function in the multiple variable setting and to consider computational speed. As with any estimator for which only the asymptotic properties are generally known (e.g., maximum likelihood and generalized method of moments), the finite sample properties of the estimator will depend on the function being considered. Consequently, these examples of the LSSE estimator do not reflect general convergence rates of other models; like the maximum likelihood estimator, each model needs to be investigated separately.
Because the LSSE estimator optimizes nonlinear functions, some randomly generated data sets can produce degenerate estimates (variance estimates near zero) or simply do not converge from the automatically generated initial values. Although in a real-world analysis we would carefully consider the model, parameterization, and initial values to achieve proper estimates, such an approach is not practical when running a Monte Carlo experiment across thousands of samples. Consequently, we drop such data sets and sample new ones; we base the results only on such data sets that provide reasonable convergence.
In this section, we discuss the general process of obtaining the LSSE estimates. However, we do not present programming details as they are idiosyncratic to the programming language used, which in our case is that of the STATA software. For those familiar with STATA, we present in Appendix B the details of our programs used with STATA’s nl command (nonlinear regression) to produce LSSE estimates.
The Simple Model
Our first example is the estimation of parameters for the model

LSSE procedure for the simple model.
Data samples were drawn from the preceding model with
In Step 2a of Figure 2, we draw a sample of values from the specified distribution of δ for each individual in the data set: It is computationally more effective to draw a systematic sample from the distribution rather than a random sample (or the pseudo-random sample that a computer’s “random” number generator would obtain). For example, suppose we want to represent the distribution of a variable

Comparison between a grid sample and a random sample of a normal variable.
For computational efficiency then, we use a 200-element (i.e.,
for each observation
Table 1 presents the results of our Monte Carlo experiment. In conformance with the asymptotic properties, the mean of the parameter estimates approaches the true values for all parameters as sample size increases (the absolute bias spans 0.001 to 0.28 across parameters in the models with sample sizes of 5,000 where absolute bias is the absolute value of the difference between the estimate and the true value). However, the variance parameter ln(σ) has a slower convergence rate and therefore exhibits a larger bias for the sample sizes used here (absolute bias of 0.28 as compared with the next largest absolute bias of 0.03 for the µ parameter). Also, as expected, the standard deviation of the estimator’s sampling distribution decreases with sample size (e.g., the standard deviation for the β estimator is 1.154 using a sample size of 300 but only 0.21 using a sample size of 5,000). Moreover, as the histograms and both skewness and kurtosis indicate, the distribution of parameter estimates is converging toward normality as sample size increases (a normal distribution has a skewness of 0 and a kurtosis of 3). The averages of the robust standard error estimators are approximating the standard deviation of the parameter sampling distributions in all cases except for the variance parameter (absolute bias spanning 0.017 for α to 4.275 for ln(σ) in the
Simulation Results for the
The Complex Model
Veazie and Cai (2007) proposed a model of the relationship between a person’s sense of uniqueness θ (i.e., how dissimilar from others a person believes herself to be, which can be expressed as a function of other variables
for
and
with

Graphs of Equation 8 with δ = 0 (an inexperienced person).

LSSE procedure for the complex model.
Data samples were drawn from this model, with
Table 2 shows the results for estimates of β0, β1, and ln(σ). The estimator converges rapidly to the true parameter values (absolute bias spans 0.012 for ln(σ) to less than 0.001 for β0 in the
Simulation Results for the Complex Model (3,000 Monte Carlo Samples).
To show that the estimator applies to a multivariable setting, we expanded the preceding model such that the parameter θ
Table 3 shows the average parameter estimates across 1,000 data sets, each with a sample size of 10,000 observations, and again using the Modified Latin Hypercube to obtain 200 samples for numeric integration. In this experiment, we used a larger sample size along with more variables to provide a sense of computational cost: Analysis was done using desktop computer with a 32-bit operating system, 3.2 GHz quad-processor (although STATA only used two processors)—Each estimate took approximately 1.5 min to obtain. Results indicate that the LSSE estimator does well when the model includes more explanatory variables (i.e., the true values and mean of the estimates are similar—the largest bias being 0.009 for the variance parameter). Moreover, as in the preceding examples, the average of the robust standard error estimates closely approximates the standard deviations of the parameter sample distributions (bias spanning 0.01 to less than 0.001).
Parameters Estimates for the Complex Model With 10 Regressor Variables.
Discussion
The LSSE estimator is consistent in sample size and number of simulation draws, and if the number of simulation draws rises faster than the square root of the samples size, it is asymptotically normal. This suggests that the LSSE is a promising estimator for structural models that do not have explicit regression functions but for which we have a probability model of unmeasured quantities. The two example Monte Carlo experiments using finite samples of 300, 500, and 5,000 indicate that the estimator indeed converges toward a normal distribution with diminishing bias and increasing precision.
To automate the Monte Carlo experiments across thousands of samples, and to focus on the main properties of the LSSE estimator, we did not directly address the potential for heteroscedasticity in each model but used a robust standard error estimator instead, which uses a diagonal matrix for
The robust standard error estimator used in the Monte Carlo experiments performed well (as indicated by the similarity between the standard deviation of estimates and the mean standard error estimates). However, the estimator was that for the nonsimulated nonlinear least squares standard errors typically reported by the statistical software, which does not account for noise due to the simulation process. It should be noted that using a typical standard error estimator (i.e., a nonsimulated nonlinear least squared errors estimator) will underestimate the true standard error. The LSSE estimator that uses direct random draws is approximately
There are limitations to the LSSE method. First, the asymptotic properties depend on the regularity conditions and asymptotic properties of an unknown regression function. A pragmatic solution is to engage a Monte Carlo investigation of the finite sample properties of a proposed model prior to its estimation on real data. This is achieved by running a Monte Carlo experiment similar to those presented above, only in this case based on the model being considered and the sample size of interest. If the model produces reasonable results for the specified sample size, then the use of LSSE would be indicated. If the LSSE cannot satisfactorily reproduce model parameters for the given sample size, then either the unknown regression function is not amenable to consistent estimation by standard NLS or the rate of convergence is too slow for the model and sample size to be useful.
Second, convergence to normality of some parameters (particularly the variance parameters) can be slower than others, but it is clear that convergence toward normality is being achieved, as we expect from the estimator’s asymptotic properties. If convergence to approximate normality is not yet achieved (which often can be determined by inspecting the estimator’s bootstrapped distribution), then resampling techniques such as the bootstrap or jackknife may be used to obtain standard errors,
Finally, LSSE estimation employs the minimization of squared errors, but unlike ordinary least squares and nonlinear least squares it is not asymptotically immune to distributional assumptions. If the distribution of random quantities is misspecified, then the simulated mean will not converge to the proper expectation. In this respect, the LSSE estimator is similar to the maximum likelihood estimator because it depends on the specification of a probability model. Hence, like maximum likelihood estimation, care must be taken in specifying the distribution.
As with maximum likelihood estimation, specification tests can be useful in identifying a statistically adequate model; however, the usefulness of such tests for the distribution of latent variables will depend on the structural model’s form and characteristics of the distributions being considered. For the simple Monte Carlo experiment presented above, such tests worked reasonably well. Using a data set of 1,000 observations, we compared the LSSE estimator based on the correct log-normal distribution for δ with one misspecified as δ having a normal distribution and one misspecified as δ having a beta distribution on the unit interval. Clarke’s (2003, 2007) test for non-nested models rejected both the normal and the beta distributions in favor of the correct log-normal distribution (
Similarly, specification of the model’s functional form can also be investigated using methods appropriate to linear and nonlinear regression. For example, using a sample of 1,000 observations from our simple Monte Carlo experiment, we compared the correct specification of
General advice for use of LSSE estimation follows that for estimation of nonlinear models by any means. Because asymptotic properties of estimators for nonlinear models can be of little comfort in finite samples, Monte Carlo investigations for the given sample size ought to be used to determine whether the potential bias is acceptable, to determine the proper standard error estimator and test statistic, and, for simulation estimators such as this one, to select the number of simulations
The applied researcher should find the LSSE estimator a useful tool when other methods are not available. This is particularly true for those familiar with standard statistical software that can estimate nonlinear least squares based on user-specified functions. For example, we used the nonlinear least squares command (i.e., “nl” command) in the common statistical analysis software STATA Version 11 to implement the Monte Carlo experiments presented above; the required user-written program that calculates the conditional expectations for each observation merely needs to embed a loop, across observations, that implements a Monte Carlo integration routine. The main pragmatic trade-off is that better results accrue to larger data sets, but larger data sets require greater computational time. However, rapid increase in computational speed available in today’s computers makes this trade-off a diminishing concern.
