Mixed logit (MIXL) models, also called random parameter logit models, are commonly used discrete choice models. Stata offers built-in support for fitting panel MIXL models using simulated maximum likelihood via the cmxtmixlogit command, and there are several community-contributed commands that fit different types of panel MIXL models using either simulated maximum likelihood (Hole 2007, 2015; Gu, Hole, and Knox 2013) or Bayesian Markov chain Monte Carlo (MCMC) methods (Baker 2014).
In this article, I describe the garbage_mixl command, which fits garbage class MIXL models in Stata.1 The garbage class MIXL model (Jonker 2022) is an extension of the standard panel MIXL model that is primarily aimed at automatic screening and accounting for respondents with low data quality in discrete choice experiments. However, the garbage_mixl command can also fit standard panel MIXL models, and because of the specifics of the implementation, it provides a fast and reliable alternative to Stata’s cmxtmixlogit command and the community-contributed estimation commands of Hole (2007) and Baker (2014), although with a few limitations (see section 3) and fewer customization options.
The article is organized as follows: section 2 provides a brief overview of the standard and garbage class MIXL model. Section 3 explains the estimation strategy. Section 4 describes the command’s syntax and estimation options, and section 5 presents several examples. Section 6 provides a comparison between the garbage_mixl command and other estimation commands, and section 7 concludes.
The standard and garbage class MIXL model
The standard panel MIXL model assumes that there are N respondents that each complete T discrete choices, each from a set of J alternatives that are described by K explanatory variables. The explanatory variables can be summarized as
and the observed choices in the dependent variable
with entries equal to 1 for the alternative that was chosen and zero for all other alternatives in the choice tasks; that is,
Each respondent is presumed to have chosen the option that provides the highest utility, with the utility function specified as
and with the error term ∈ assumed to be independently and identically Gumbel distributed. This implies that the probability of respondent i choosing alternative j in choice task t can be defined as
In the standard MIXL model, the mixing distribution of the respondents’ β coefficients is multivariate normal (MVN) with mean vector μ and covariance matrix Σ; that is,
which means that the likelihood contribution of each individual respondent is defined as
and with the overall (sample) likelihood function defined as the product of respondent- specific likelihoods over all respondents in the dataset; see, for example, Train (2009).
The garbage class MIXL model is an extension of the standard panel MIXL model in which the structural part of the utility function is multiplied with a class membership parameter ,
No other modifications are involved, and the interpretation of the class-membership parameter is also straightforward: φί represents the respondent-specific probability of being assigned to the standard MIXL utility specification. If φί equals 1, respondents are entirely assigned to the standard MIXL utility specification . If φί equals 0, there is no contribution from the structural part of the utility function, and respondents make random choices based on the error term . When aggregated, the sample average reflects the class share of the MIXL model and the garbage class share. The latter provides a readily available and objectively comparable estimate of the number of low-quality respondents in the dataset (Jonker 2022).
Estimation strategy
Because the MIXL likelihood function is not analytically tractable, the integral in (1) needs to be approximated. In Stata, MIXL model estimation is typically performed using simulated maximum-likelihood methods (see, for example, Train [2009]). Simulated maximum-likelihood estimation has a few practical disadvantages, including the risk of convergence in local optima. This necessitates that multiple MIXL model estimations be fit from different initial values. Additionally, the quality of pseudo-random draws may degrade in higher dimensions. For instance, standard Halton draws should not be used for models with more than 10–12 random parameters, even though some Stata commands allow up to 20 random parameters. In contrast, the implemented Bayesian estimation approach for the garbage_mixl command offers an attractive alternative designed to provide fast and robust results, particularly in higher dimensions. The default priors are defined as follows:
Normal priors with a mean of zero and standard deviation (SD) of 10 are assigned to the population mean parameters (μ).
A Huang and Wand (2013) scaled inverse-Wishart prior is assigned to the population covariance matrix (Σ), which results in marginal uniform distributions on the correlation parameters and half-t distributions with 2 degrees of freedom and SDs of 10 on the SDs of the random parameters.
bernoulli(0.5) priors are assigned to the MIXL class-selection parameters (φί).
Gibbs sampling is used to update the μ and Σ parameters, inverse transform sampling to update the φί parameters, and a combination of Metropolis-within-Gibbs (MWG) with antithetic updates (Bédard, Douc, and Moulin 2014) and a Metropolis- adjusted Langevin algorithm (MALA) (Marshall and Roberts 2012) to update the individual-level preference parameters (βi). Both the MVG and the MALA samplers are tuned at increasing intervals during the burn-in phase, with diminishing adaptation and with target acceptance rates of 0.44 and 0.57, respectively.
To minimize the risk of nonconvergence and to ensure sufficiently accurate approximations of the posterior, the default and minimum number of MCMC draws of the garbage_mixl command is set to 50,000 burn-in and 50,000 additional MCMC draws. This implies a minimum of 100,000 MCMC draws; in comparison, the bayesmixedlogit command (Baker 2014) has a default of 1,000 MCMC draws. In addition to the large burn-in phase, an MWG sampler with antithetic updates is used to reliably converge from the initial values to the posterior distribution during the initial 10,000 draws of the burn-in phase, updating only 1 parameter at a time. The MWG sampler is also used to create an initial estimate of the empirical variance-covariance matrix based on draws 4,000–10,000. The latter is used to transition to a more efficient and faster MALA sampler that updates the βi parameters simultaneously but requires an estimate of the variance-covariance matrix to effectively explore the posterior. When one fits standard panel MIXL models, the MALA sampler is used for all respondents as the final sampling algorithm. However, in the garbage class MIXL model, there are insufficient burn-in draws to reliably tune the MALA sampler when respondents are predominantly assigned to the garbage class. Hence, for respondents who have a probability of MIXL class membership smaller than 0.5, MWG is used as the final sampling algorithm.
To further assist and improve convergence of the MCMC samplers, relatively good initial values are obtained using an initial maximum-likelihood estimation. In this initial MIXL model optimization, the integral in (1) is approximated using a standard Laplace approximation (see, for example, Bleistein and Handelsman [1986, chap. 8] and Small [2010, chap. 6]). The Laplace approximation is an analytical (rather than sampling-based) approximation and allows for a MIXL model to be fit within seconds (see table A in the online supplemental). This provides relatively good initial values for the population mean (μ), covariance matrix (Σ), and all individual-level preference parameters (βi), with all class-selection parameters (φί) initialized at 1 when fitting garbage class MIXL models. Because the Laplace MIXL estimates are used only as initial values and more precise initial estimates are superfluous, a tolerance of 1e-5 on the L2 norm of the gradient is used as the stopping criterion, and the number of Broyden- Fletcher-Goldfarb-Shannon model optimization iterations is maximized at 25 + 2 × K.
To further improve computational efficiency, all estimations are implemented in C++ and linked to Stata via a plugin (https://www.stata.com/plugins/). This implies that the Stata graphical user interface freezes during garbage_mixl estimations, which is an inherent and unavoidable limitation of Stata’s C++ plugin functionality. Another limitation of the garbage_mixl command is that it requires an equal number of choice options per choice task in the dataset. This is not an inherent limitation but enforced to simplify the data transfer from Stata to the C++ plugin and to simplify the implementation of the likelihood and gradient calculations. Finally, because the MIXL class-selection parameters are identified based on the choice data of each individual respondent, garbage MIXL models tend to be sensitive to the priors that are assigned to the MIXL class-selection parameters. To make sure that there is enough information for each respondent and the class selection is not based entirely on the priors that are used, the garbage_mixl command requires at least as many choice tasks per respondent as random parameters when fitting garbage class MIXL models.
Syntax and options
Syntax
garbage_mixldepvar [if] [in], rand(varlist) id (varname) group(varname)
rand(varlist) is required and specifies the independent variables. All independent variables’ coefficients are assumed to be random; fixed coefficients are not supported.
id(varname) is required and specifies a numeric identifier variable for the decision makers.
group(varname) is required and specifies a numeric identifier variable for the choice occasions.
burn(#) specifies the length of the burn-in period. The default and minimum value is burn(50000).
mcmc(#) specifies the number of draws to be taken from the posterior distribution of the parameters. The default and minimum value is mcmc(50000).
bernoulli(#) specifies the prior distribution for the individual-level MIXL class probabilities in the garbage class MIXL model. The default is bernoulli(0.5). Note that the garbage class MIXL results tend to be sensitive to the Bernoulli prior, particularly with limited observations per decision maker. Therefore, the garbage_mixl command requires at least as many observations per decision maker as the number of random parameters specified.
nogarbage specifies that the garbage_mixl command fit a standard panel MIXL model.
diag specifies that a diagonal (uncorrelated) MIXL model is being fit, with half-normal (10) priors on the SDs of the normal mixing distributions.
invwishart replaces the default Huang and Wand (2013) scaled inverse-Wishart prior with a conventional inverse-Wishart prior on the variance-covariance matrix. The inverse-Wishart prior is specified with a unit scale matrix and degrees of freedom equal to the number of random parameters. This prior is more informative and more restrictive than the default prior, particularly when more random parameters are specified, and is seldom recommended.
Examples
To show how the garbage_mixl command fits standard and garbage class MIXL models, let’s revisit the antibiotics example of Jonker (2022). This dataset can be downloaded from the Stata Journal website and contains 750 respondents who were recruited using a commercial survey sample provider. All respondents were asked to make 13 repeated choices between 2 types of antibiotics screening tests that were described using the following characteristics: 1) the speed in which the test results are available (fast versus slow), 2) the convenience of the diagnostic test (high versus low), and 3) the confidence in the test’s results (very high versus high). Figure 1 shows an example choice task as shown to the respondents. The dependent variable y in the dataset describes the choices that were made, and the independent variables’ speed, convenience, and confidence are dummy coded with the least favorable test characteristics selected as the omitted base- case levels.
Example discrete choice task
Example 1
Let’s start with fitting a standard panel MIXL model:
As shown, the achieved MALA acceptance rates range from 0.55 to 0.60. Even though the performance of MCMC sampling algorithms is robust to deviations from optimal acceptance rates (see, for example, Roberts and Rosenthal [2001]), it is reassuring that the MALA samplers have achieved acceptance rates close to the target rate of 0.57. Similarly, it is reassuring to observe that the Monte Carlo standard errors (MCSE) expressed as a percentage of the SDs of the posterior samples are smaller than 5–10%. This is a simple rule of thumb to determine whether the posterior distributions are sufficiently accurately approximated.
The standard panel MIXL model output presents posterior summary statistics for the MIXL mean estimates, while the (co)variance estimates are separated into SD and correlation estimates. This approach is akin to Stata’s cmxtmixlogit command, making it more convenient to interpret the degree of heterogeneity in and correlations among preferences for the explanatory variables. As indicated by the positive MIXL mean estimates, respondents on average prefer faster, more convenient, and more reliable tests. This has good face validity. There is also substantial variation in respondent-specific preferences with the MIXL SD estimates being approximately as large as or larger than the MIXL mean estimates. This is somewhat unexpected because it suggests that a substantial number of respondents in the tail of the normal distributions actually prefer slower, less convenient, and less reliable diagnostics tests. In addition, the correlations between the preference parameters all have 95% credible intervals that are positive and do not comprise 0. This is entirely unexpected because it indicates that when respondents find one particular aspect of the test more important, they also find the other aspects of the test more important (and vice versa). Of course, these unexpected results could also be explained by not appropriately accounting for a subset of respondents who flatlined on one of the choice options or randomly clicked through the survey, which is a notorious problem with discrete choice experiment data obtained from commercial survey sample providers.
Example 2
As explained, the garbage class MIXL model intends to provide MIXL estimates that are corrected for the presence of respondents with poor data quality. If such respondents are included in the model estimations, one would expect the standard panel MIXL mean estimates to be biased toward zero and the MIXL SD estimates to overstate the true amount of variability in preferences in the MIXL class. Let’s refit the model without the nogarbage option to see what happens:
When a garbage MIXL model is being fit, the achieved acceptance rates for both the MWG and the MALA updates are reported separately. It is still reassuring to observe that the achieved acceptance rates are close to the target values of 0.44 and 0.57. The MCSE expressed as a percentage of the SDs is again between 5–10%. This should be sufficient for accurate results, but to verify that the accuracy of the posterior estimates improves with additional MCMC draws, let’s rerun the model with 100,000 posterior draws before interpreting the results.
Example 3
As shown, the updated output is almost identical to the previous estimation results, but the MCSE are indeed smaller. Turning to the comparison between the garbage class and standard MIXL results, we should first notice that the size of the garbage class in the antibiotics dataset is approximately 34%. Accounting for respondents with random response patterns, the MIXL mean estimates become larger (that is, further away from zero, indicating higher average choice consistency), and the MIXL SD estimates relative to the means become smaller (implying less heterogeneity and, more importantly, a smaller proportion of respondents who prefer slower, less convenient, and less reliable diagnostic tests). Moreover, the signs of the mean posterior correlation estimates have switched and are now negative. This indicates that respondents in the MIXL class have stronger preferences for either the test’s speed or the test’s convenience and reliability. The reported negative correlations behaviorally make more sense than the positive correlations as reported by the standard panel MIXL model. Of course, the positive correlations in the standard panel MIXL model are still correct in that they accurately reflect the effect of pooling respondents with random and informative response patterns in a single MIXL estimation.
Example 4
As explained in the estimation strategy section, the default bernoulli(0.5) prior implies an equal prior probability of being in the garbage or MIXL class. For each respondent, this accurately reflects the absence of prior information about the most likely class assignment. However, at the sample level, this also corresponds to the prior expectation that the size of the garbage class is approximately 50%. The latter is probably unintended because commercial survey samples commonly result in 10–25% low-quality respondents (not 50%). Fortunately, the Bayesian estimation strategy provides a convenient framework to incorporate such prior expectations in the model estimation. For example, let’s see what happens when a bernoulli(0.8) prior is specified:
As shown, specifying a bernoulli(0.8) prior reduces the size of the garbage class to approximately 28% and reduces the size of the MIXL mean estimates. In all other respects, the results remain similar to those obtained with the default bernoulli(0.5) prior: The mean estimates are still larger than those in the standard MIXL model; the MIXL SD estimates are still smaller than the MIXL means; and the correlation estimates are still negative as opposed to positive in the standard MIXL model.
Example 5
The size of the garbage class provides a readily available estimate of the number of low-quality respondents in a dataset. Nevertheless, there are situations where it is convenient to be able to manually identify respondents with low data quality, such as when performing sensitivity analyses. Thus, the posterior probability of each included respondent belonging to the garbage class is made available via the ereturn matrix pr_gar. This matrix can, after performing a garbage class MIXL estimation, be inspected using the matrix list command:
. matrix list e(pr_gar)
e(pr_gar)[750,2]
pr_gar resp_id
r1 .88398 1
r2 .69107 2
r3 .00020 3
r4 .00089 4
r5 .99249 5
(output omitted)
r750 .00011 750
The listed matrix corresponds to the garbage class MIXL model estimates from the previous example, and the provided class membership probabilities can be used to selectively exclude respondents from the dataset. For instance, let’s observe what happens when respondents with garbage class membership probabilities ≥0.5 are removed from the sample and a standard panel MIXL model is fit on the subsample of respondents with acceptable data quality:
. local todrop
. forvalues i = 1/750 {
2. if e(pr_gar)[`i’,1]>= 0.5 {
3. local id = e(pr_gar)[`i’,2]
4. local todrop `todrop’ `id’
5. }
6. }
. foreach id of local todrop {
2. quietly drop if resp_id == `id’
3. }
As shown, the standard panel MIXL estimates are based on a subset of 540 respondents. Consequently, 28% of respondents are excluded, which roughly corresponds to the size of the garbage class in example 4. The parameter estimates are also very similar to the garbage class MIXL estimates from example 4. Conversely, the estimates markedly differ from the standard panel MIXL estimates presented in example 1, which were based on the full sample of 750 respondents.
Comparisons
To conclude the description of the garbage_mixl command, tables 1 and 2 provide a comparison of the standard and garbage MIXL model estimates as produced by the garbage_mixl command with estimation results obtained from the general-purpose JAGS software (Plummer 2007). In addition, table 1 provides comparisons with the other panel MIXL estimators that are available in Stata, that is, the default cmxtmixlogit as well as community-contributed mixlogit (Hole 2007) and bayesmixedlogit (Baker 2014) commands. All estimations were performed using Stata/MP 16.1 for Windows (x86-64) on a PC with an AMD 3950 CPU with default priors and default estimation settings for a panel MIXL model with correlated random parameters. Two changes to the default estimation settings were made—one for the mixlogit command, for which the default 50 Halton draws were deemed very inadequate (1,000 Halton draws were used instead), and one for the bayesmixedlogit command, for which the same number of draws as in JAGS and the garbage_mixl command was specified. Note that JAGS model code is provided in the online supplemental and that MCMC summary statistics for JAGS and bayesmixedlogit were created using the R-CODA package (Plummer et al. 2006).
Standard (panel) MIXL estimates
garbage_mixl(50k burn + 50k MCMC)
JAGS*(50k burn + 50k MCMC)
Cmxtmixlogit(695 Hammersley draws)
Mixlogit(1,000 Halton draws)
bayesmixedlogit*(50k burn + 50k MCMC)
Mean
Std. dev.
MCSE
Mean
Std. dev.
MCSE
Coef.
Std. err.
Coef.
Std. err.
Mean
Std. dev.
MCSE
Speed (mean)
1.23
0.08
0.0011
1.22
0.08
0.0026
1.25
0.08
1.23
0.08
1.22
0.08
0.0030
Convenience (mean)
1.84
0.11
0.0015
1.83
0.11
0.0023
1.86
0.11
1.84
0.11
1.83
0.11
0.0030
Confidence (mean)
3.62
0.18
0.0025
3.62
0.18
0.0056
3.65
0.19
3.62
0.18
3.60
0.18
0.0056
Speed (SD)
1.13
0.09
0.0023
1.12
0.09
0.0051
1.13
0.09
1.12
0.09
1.11
0.10
0.0051
Convenience (SD)
2.04
0.12
0.0022
2.04
0.12
0.0042
2.03
0.11
2.04
0.11
2.02
0.11
0.0039
Confidence (SD)
3.37
0.19
0.0031
3.37
0.18
0.0065
3.47
0.19
3.40
0.18
3.35
0.18
0.0066
Estimation time
44 sec
12 min 0 sec
31 min 32 sec
10 min 34 sec
1 day 19 hours
note: * Summary statistics of the MCMC draws created using the R-CODA package.
Garbage class MIXL estimates
garbage_mixl
JAGS
(50k burn + 100k MCMC)
(50k burn + 100k MCMC)
Coef.
Std. dev.
MCSE
Coef.
Std. dev.
MCSE
Speed (mean)
3.06
0.21
0.0068
3.07
0.22
0.0181
Convenience (mean)
5.05
0.37
0.0121
5.04
0.37
0.0245
Confidence (mean)
10.5
0.74
0.0278
10.5
0.75
0.0567
Speed (SD)
1.64
0.20
0.0088
1.65
0.19
0.0133
Convenience (SD)
4.06
0.36
0.0122
4.01
0.35
0.0216
Confidence (SD)
6.32
0.56
0.0267
6.33
0.56
0.0309
Garbage class share
0.34
0.01
0.0001
0.34
0.01
0.0003
Estimation time
1 min 56 sec
24 min 31 sec
As shown, all commands provide similar panel MIXL estimates with the main difference being the required run time of the command (table 1). In addition, the command garbage_mixl and JAGS software provide nearly identical results that are within Monte Carlo sampling precision (tables 1 and 2). This is as expected given the identical priors, large burn-in, and many posterior samples and is used as a test case to confirm that the garbage_mixl command provides correct results.
Conclusions
The garbage class MIXL model was designed to produce standard panel MIXL preference estimates that are automatically corrected for the impact of survey respondents with low-quality response patterns, such as flatlining and random responses, and provide an objective measure of data quality in discrete choice experiments. As shown in the provided examples, the inclusion of garbage respondents in standard panel MIXL estimations can have a substantial impact on the MIXL model estimates and result in substantively different conclusions about respondents’ preferences in a discrete choice experiment. Thus, the garbage_mixl estimation command presents a valuable addition to the already available MIXL model estimation commands in Stata.
One of the major benefits of the garbage_mixl command is its computational efficiency and robust performance, particularly for MIXL models with more random parameters. Another attractive feature is the implemented Laplace approximation optimization, which produces accurate initial values for the MCMC update algorithms within a few seconds. Moreover, the garbage_mixl command can fit standard panel MIXL models without a garbage class, which is convenient because the command is substantially faster than other estimation commands that can fit panel MIXL models in Stata.
The increased performance does come at a cost: The command requires an equal number of choice options per choice task, has fewer customization options, and, being based on Stata’s C++ plugin functionality, the user interface cannot provide real-time information on the progress of the model estimation. In addition, to guarantee that there is enough information for each respondent and ensure that the respondent-specific garbage class-selection parameters are not entirely dictated by the Bernoulli priors, the garbage_mixl command requires at least as many choice tasks per respondent as random parameters when fitting a garbage class MIXL model. This restriction is not imposed for standard panel MIXL model estimations.
Supplemental Material
Supplemental material
Supplemental material
Footnotes
8
I thank Prof. Dr. Bas Donkers and Prof. Dr. Martyn Plummer for their valuable comments and suggestions on the implementation of the Bayesian sampling algorithms. In addition,I want to express my gratitude to Dr. Piers Lawrence for his assistance with the implementation of the analytical derivatives used in the Laplace approximations.
9
To install the software files as they existed at the time of publication of this article,type . net sj 24-3 . net install st0753 (to install program files,if available) . net get st0753 (to install ancillary files,if available)
About the author
Marcel F. Jonker received his PhD in public health and epidemiology and is currently affiliated with the Erasmus School of Health Policy & Management and the Erasmus Choice Modelling Centre of the Erasmus University Rotterdam,The Netherlands.
References
1.
BakerM. J. 2014. Adaptive Markov chain Monte Carlo sampling and estimation in Mata. Stata Journal14: 623–661. 10.1177/1536867X1401400309.
2.
BedardM.DoucR.MoulinE.. 2014. Scaling analysis of delayed rejection MCMC methods. Methodology and Computing in Applied Probability16: 811–838. https: //doi.org /10.1007/s11009-013-9326-y.
3.
BleisteinN.HandelsmanR. A.. 1986. Asymptotic Expansions of Integrals. New York: Dover.
4.
GuY.HoleA. R.KnoxS.. 2013. Fitting the generalized multinomial logit model in Stata. Stata Journal13: 382–397. 10.1177/1536867X1301300213.
5.
HoleA. R. 2007. Fitting mixed logit models by using maximum simulated likelihood. Stata Journal7: 388–401. 10.1177/1536867X0700700306.
6.
Hole, A. R. 2015. mixlogitwtp: Stata module to estimate mixed logit models in WTP space. Statistical Software Components S458037, Department of Economics, Boston College. https://ideas.repec.org/c/boc /bocode/s458037.html.
JonkerM. F. 2022. The garbage class mixed logit model: Accounting for low-quality response patterns in discrete choice experiments. Value in Health25: 1871–1877. 10.1016/j.jval.2022.07.013.
9.
MarshallT.RobertsG. O.. 2012. An adaptive approach to Langevin MCMC. Statistics and Computing22: 1041–1057. 10.1007/s11222-011-9276-6.
PlummerM.BestN.CowlesK.VinesK.. 2006. CODA: Convergence diagnosis and output analysis for MCMC. R News6: 7–11.
12.
RobertsG. O.RosenthalJ. S.. 2001. Optimal scaling for various Metropolis- Hastings algorithms. Statistical Science16: 351–367. 10.1214/ss/1015346320.
13.
SmallC. G. 2010. Expansions and Asymptotics for Statistics. Boca Raton, FL: Chapman and Hall/CRC. 10.1201/9781420011029.
14.
TrainK. E. 2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge: Cambridge University Press. 10.1017/CB09780511805271
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.