Sage Journals: Discover world-class research

Abstract

In this article, I describe the lclogit2 command, an enhanced version of lclogit (Pacifico and Yoo, 2013, Stata Journal 13: 625–639). Like its predecessor, lclogit2 uses the expectation-maximization algorithm to fit latent class conditional logit (LCL) models. But it executes the expectation-maximization algorithm’s core algebraic operations in Mata, so it runs considerably faster as a result. It also allows linear constraints on parameters to be imposed more conveniently and flexibly. It comes with the parallel command lclogitml2, a new stand-alone command that uses gradient-based algorithms to fit LCL models. Both lclogit2 and lclogitml2 are supported by a new postestimation command, lclogitwtp2, that evaluates willingness-to-pay measures implied by fitted LCL models.

Keywords

st0601 lclogit2 lclogitml2 lclogitpr2 lclogitcov2 lclogitwtp2 latent class model conditional logit expectation-maximization algorithm lclogit fmm finite mixture mixlogit mixed logit willingness to pay

1 Introduction

The latent class conditional logit (LCL) model extends the conditional logit model (clogit in Stata) by incorporating a discrete representation of unobserved preference heterogeneity. Algebraically, the LCL likelihood function is a finite mixture of C different conditional logit likelihood functions. Stata 15 introduced the fmm command, which fits many finite mixture models; as of Stata 16, however, fmm does not support clogit as a component model. The community-contributed lclogit command (Pacifico and Yoo 2013) allows Stata users to fit LCL models. But it underuses Stata’s computing capabilities available via the Mata environment and does not allow the component conditional logit models to share any parameter in common.

In this article, I describe lclogit2, an enhanced version of lclogit. Like its predecessor, lclogit2 applies Bhat’s (1997) expectation-maximization (EM) algorithm to obtain the maximum likelihood estimates (MLEs) of LCL. The EM algorithm is an attractive method to maximize the nonconcave log-likelihood function of LCL because it offers greater numerical stability than the usual Newton-type techniques that ml maximize applies. Train (2008) provides a masterful summary of the source of this advantage.

lclogit2 comes with a parallel command, lclogitml2, that fits LCL models using the usual techniques for maximum likelihood estimation. While lclogitml2 is fully functional as a stand-alone command, it may be also used as a postestimation tool for lclogit2. The EM algorithm used by lclogit2 (and by lclogit) produces coefficient estimates without standard errors. To draw statistical inferences, users may pass active lclogit2estimates as starting values to lclogitml2and obtain the usual ml maximize output table with standard errors.

Major differences between the lclogit2 and lclogit commands may be summarized as follows. To facilitate discussion, let β _c denote a vector of coefficients for the c th clogit component, or latent class, of LCL.

First, lclogit2 estimates a given LCL specification considerably faster than lclogit by using Mata to execute the core algebraic operations of the EM algorithm. lclogit executes the same operations in the regular Stata environment.

Second, lclogit2 allows β _c to include homogeneous coefficients that are identical across classes, as well as heterogeneous coefficients that vary across classes. Hole’s (2007c) mixlogit command can fit a mixed logit model that includes a combination of nonrandom coefficients and multivariate normal random coefficients. The new feature of lclogit2 allows estimation of a latent class counterpart to such a model specification. lclogit assumes that every coefficient is heterogeneous.

Third, lclogit2 can incorporate any set of linear constraints on β _c for c = 1, 2,…, C, defined using Stata’s constraint command. The constraints may apply within a class (for example, two different coefficients in β ₁ are equal to 0) as well as across different classes (for example, a coefficient in β ₁ and the corresponding coefficient in β ₂ are the same). lclogit can incorporate within-class constraints only and has peculiar syntax requirements for inputting the constraints.

Fourth, lclogitml2 is a stand-alone estimation command. lclogitml, which accompanies lclogit, is simply a wrapper that passes lclogit estimates to another community-contributed command, gllamm (Rabe-Hesketh, Skrondal, and Pickles 2002). This difference brings about several advantages:

a. lclogitml2 uses a log-likelihood evaluator coded in Mata. It estimates a given LCL specification considerably faster than gllamm, which uses an evaluator coded in the regular Stata environment.

b. lclogitml2 can inherit linear constraints defined for lclogit2. In contrast, to impose the same constraints across lclogit and lclogitml, users must define a set of constraints to comply with the syntax requirements of lclogit and another set of constraints to comply with those of gllamm.

c. lclogitml2 is better suited to fitting a model with many heterogeneous coefficients. Suppose that each β _c consists of K heterogeneous coefficients so that there are a total of C × K heterogeneous coefficients to estimate. In ml model‘s vernacular, lclogitml2 will add C “equations”, where each equation comprises K coefficients for a particular class. In contrast, gllamm will add C × K equations, where each equation’s intercept is a particular coefficient. With a large C × K, a call to gllamm (via lclogitml) may fail with an error message stating that some equation is not found, presumably because there is a limit on the number of equations that ml model can receive from gllamm.

Finally, when lclogit2 or lclogitml2 results are active, a new postestimation tool, lclogitwtp2, can calculate willingness-to-pay (WTP) measures implied by the coefficient estimates. Within each class c, the WTP for attribute k is calculated as the ratio of the coefficient on that attribute to another coefficient that can be interpreted as the marginal utility of money. In nonmarket valuation studies, such WTP measures are often the main parameters of interest. To derive the WTP measures from lclogit or lclogitml estimates, users need to write their own postestimation programs.

2 Latent class conditional logit

Consider decision maker n making a choice from J alternatives in each of T choice occasions, where n = 1, 2,…, N. Alternative j, available to him or her in occasion t, is described by a row vector of K attributes, x _njt . Denote by y_njt a binary indicator that equals 1 if his or her choice is alternative j, and 0 otherwise. Under the conditional logit model (clogit in Stata), the joint likelihood of his or her T choices is given by

P_{n} (β) = \prod_{t = 1}^{T} \prod_{j = 1}^{J} {(\frac{e x p (x_{n j t} β)}{\sum_{h = 1}^{J} e x p (x_{n h t} β)})}^{y n j t}

where β is a column vector of K coefficients, which can be interpreted as the marginal utilities of the corresponding entries in x _njt . As a matter of fact, clogit (as well as lclogit2 and lclogitml2) can also accommodate datasets with T varying across decision makers and J varying across decision makers, choice occasions, or both. While T and J in (1) must be more accurately written as T_n and J_nt , the subscripts will be omitted for notational simplicity.¹

The LCL extends the conditional logit by incorporating a discrete representation of unobserved preference heterogeneity across decision makers. Specifically, LCL assumes that there are C distinct types, or “classes”, of decision makers and that each class c makes choices consistent with its own clogit model with utility coefficient vector β _c . Suppose that the probability that decision maker n belongs to class c is given by a fractional multinomial logit specification

π_{n c} (Θ) = \frac{e x p (z_{n} θ_{c})}{1 + \sum_{l = 1}^{C - 1} e x p (z_{n} θ_{l})}

where z _n is a row vector of decision maker n’s characteristics and the usual constant regressor (that is, 1); θ _c is a conformable column vector of membership model coefficients for class c, with θ _C normalized to 0 for identification; and Θ = ( θ ₁ , θ ₂,…, θ _C ₋₁) denotes a collection of the C − 1 identified membership coefficient vectors. Under LCL, the joint likelihood of decision maker n’s choices is given by

L_{n} (B, Θ) = \sum_{c = 1}^{C} π_{n c} (Θ) P_{n} (β_{c})

where B = ( β ₁ , β ₂,…, β _C ) denotes a collection of the C utility coefficient vectors and each P_n ( β _c ) is obtained by evaluating (1) at β = β _c .

The sample log-likelihood function under LCL can be constructed by adding up the natural log of L_n ( B , Θ) across N decision makers in the sample. The command lclogit2 computes the MLE of B and Θ by using Bhat’s (1997) EM algorithm to maximize the sample log-likelihood function. The command lclogitml2 computes the MLEs of the same coefficients by using gradient-based maximization techniques that Stata’s ml programs rely on. Unless the EM algorithm has been terminated prior to achieving convergence, lclogitml2 must produce the same estimates as the existing lclogit2 estimates when the gradient-based maximization run uses the latter set of estimates as initial values. Train (2008) provides a lucid explanation for this equivalence.²

3 Estimation: lclogit2 and lclogitml2

Both lclogit2 and lclogitml2 require the same data structure as clogit and its extensions, such as mixlogit (Hole 2007c) and lclogit (Pacifico and Yoo 2013). To aid clarification, let us consider the notation introduced in section 2. The data y_njt , x _njt , and z _n for each distinct triplet of indices {n, j, t} must be organized into a separate row in the dataset (that is, an observation in Stata’s vernacular). Within a block of data rows associated with consumer n, y_njt and x _njt thus vary from row to row, whereas z _n is repeated across all rows.

The syntax diagram for lclogit2 is as follows:

lclogit2 depvar [ varlist1 ] [ if ] [ in ], group( varname ) rand( varlist2 ) nclasses( # ) [ id( varname ) membership( varlist3 ) constraints( numlist ) seed( # ) iterate( # ) ltolerance( # ) tolerance( # ) tolcheck nolog]

The syntax diagram for lclogitml2 is similar:

lclogitml2 depvar [ varlist1 ] [ if ] [ in ], group( varname ) rand( varlist2 ) nclasses( # ) [ id( varname ) membership( varlist3 ) constraints( numlist ) seed( # ) from( init_specs ) noninteractive_options ]

The indicator y_njt in section 2 refers to each observation on the dependent variable, depvar. Within a block of data rows associated with consumer n and choice occasion t, depvar must be equal to 1 in the row describing the alternative that he or she actually chose and 0 in all the other rows.

Each command has three required options. The required option group( varname ) is identical to the namesake option in clogit, mixlogit, and lclogit and specifies a numeric variable that identifies distinct choice occasions faced by different decision makers. In the context of (1), the variable in question tells Stata which J data rows to use when evaluating the clogit formula inside the large round brackets. The variable must take a unique numeric value for each distinct pair of n and t, and the value must be repeated across all J data rows associated with that pair.

The required option rand( varlist2 ) is similar to the namesake option in mixlogit and specifies attribute variables whose utility coefficients are assumed to vary from class to class. Sometimes, users may wish to constrain a subset of utility coefficients to be identical across all classes. Such constraints can be conveniently requested by using the optional varlist1 to specify those attributes with class-invariant utility coefficients.³ To avoid contradiction, do not place a variable in both varlist2 and varlist1. The attribute vector x _njt in section 2 refers to each observation on varlist2 (and, if specified, varlist1).

Finally, the required option nclasses( # ) specifies the number of classes, C, in (3). In empirical research, it is common practice to choose the preferred number of classes by estimating an LCL specification repeatedly with different candidate values for C and inspecting which value optimizes the Bayesian information criterion (BIC). See section 5 for further discussion.

Optional options for lclogit2 include the following:

id( varname )is identical to the namesake option in mixlogitand lclogitand specifies a numeric identifier variable for decision makers. This variable is expected to identify which block of data rows is associated with each decision maker n; its value must vary from decision maker to decision maker but remain constant within all data rows for the same decision maker. The default is to assume that group() and id() are identical, which is equivalent to assuming that each decision maker has faced only one choice occasion (that is, T = 1).

membership( varlist3 ) specifies independent variables for the class membership model in (2), except the constant regressor of 1, which is always assumed to be included. Together with the constant regressor, each observation on varlist3 makes up z _n , the vector of decision maker n’s characteristics. Within a block of data rows associated with decision maker n, the numerical values of varlist3 must remain constant across all rows. The default is to assume that varlist3 is empty; that is, z _n includes only the constant regressor.

constraints( numlist ) specifies linear constraints to be applied during estimation. The constraints must be defined using the Stata command constraint prior to estimation. The default is to impose no such constraints.

When using constraint, note that equation names for the utility coefficients on varlist1 and varlist2 are Fix and Class c, respectively, where c refers to a particular class number. Therefore, the coefficient on varname in varlist1 is [Fix] varname. The coefficient on varname in varlist2 is [Class1] varname for class 1, [Class2] varname for class 2, and so on.

seed( # ) sets the seed for pseudouniform random numbers used in computing starting values. See Pacifico and Yoo (2013) for the detailed procedure. The default seed is c(seed).

iterate( # ) specifies the maximum number of iterations. The default is iterate(1000).

ltolerance( # )specifies the tolerance for the log likelihood. When the relative increase in the log likelihood over the last five iterations is less than the specified value, lclogit2 declares convergence. The default is ltolerance(0.00001).

tolerance( # ) specifies the tolerance for the coefficient vector. The default is tolerance(0.0004).

tolcheckrequests the use of an extra convergence criterion to reduce the chance of false declaration of convergence. If this option is used, lclogit2 will declare convergence when 1) the relative increase in the log likelihood is smaller than ltolerance() and 2) the relative difference in the coefficient vector is smaller than tolerance() over the last five iterations.

nolog suppresses the display of an iteration log.

As the syntax diagram above shows, many of the optional options for lclogit2 are also available for lclogitml2. Optional options unique to lclogitml2 are as follows:

from( init_specs ) is identical to the namesake option of mixlogit (Hole 2007c) and supplies custom starting values for the utility and membership coefficients, that is, B and Θ in (3). The default starting values are obtained by applying the same procedure as what Pacifico and Yoo (2013) describe for lclogit.

noninteractive_options refers to extra options for use with ml model in noninteractive mode; see [R] ml.

4 Postestimation: lclogitpr2, lclogitcov2, and lclogitwtp2

Both lclogit2 and lclogitml2 are supported by three postestimation commands: lclogitpr2, lclogitcov2, and lclogitwtp2. For each decision maker, lclogitpr2 predicts choice probabilities associated with each alternative in each choice situation that he or she has faced, as well as class membership probabilities. lclogitcov2 computes variances and covariances of class-specific utility coefficients β ₁ , β ₂,…, β _C , by considering them as a discrete random variable with probability masses given by class membership probabilities π_n ₁(Θ), π_n ₂(Θ),…, π_nC (Θ). Finally, lclogitwtp2 converts estimated utility coefficients into implied WTP measures similarly to how Hole’s (2007a) wtp works on clogit coefficients.

The remainder of this section focuses on the syntax diagram for lclogitwtp2, which provides a new postestimation tool that is not available for lclogit. The other two postestimation commands have the same functionalities and syntax diagrams as lclogitpr and lclogitcov which support lclogit, apart from the “2” suffix. Pacifico and Yoo (2013) describe lclogitpr and lclogitcov in detail.

4.1 Syntax for lclogitwtp2

The attribute vector x _njt typically includes a pecuniary attribute, which allows the researcher to estimate a utility coefficient that can be associated with the marginal utility of money. Often, the pecuniary attribute measures the cost of acquiring a particular alternative. For example, in Oviedo and Yoo (2017), each alternative is a reforestation project, and the cost is a required increase in the decision maker’s tax liabilities to finance that project. In some applications, the pecuniary attribute may measure income generated by a particular alternative instead. For example, in Doiron and Yoo (2017), each alternative is a junior nursing job, and the amount of income is salary earned from that job.

In most nonmarket valuation studies, the index function x _njt β is assumed to be linear in the pecuniary attribute. The marginal utility of money is then equal to −1 times the cost coefficient or, alternatively, to the income coefficient itself. Let β_k,c be an entry in β _c that is the utility coefficient on attribute k. The WTP for attribute k can be evaluated as −1 × β_k,c /β _cost,c or β_k,c /β _income,c, depending on whether the pecuniary attribute measures cost or income.⁴

In the “cost” case, the syntax diagram for lclogitwtp2 is

lclogitwtp2, cost( varname ) [ nonlcom nlcom_options ]

Similarly, in the “income” case, the syntax diagram for lclogitwtp2 is

lclogitwtp2, income( varname ) [ nonlcom nlcom_options ]

The required option cost( varname ) or income( varname ) identifies the pecuniary attribute variable, whose coefficient enters the denominator of the WTP formula. When lclogit2 estimates are active, lclogitwtp2 simply reports the implied WTP measures. When lclogitml2 results are active, it also acts as a wrapper for Stata’s nlcom command, which it uses to compute standard errors and confidence intervals for the implied WTP measures.

The two optional options are relevant only when lclogitml2 results are active:

nonlcom requests that the command skip the nlcom step and report the WTP measures without test statistics. The default is to execute the nlcom step.

nlcom_options refers to options of nlcom; see [R] nlcom.

5 Examples

Just as in clogit, both lclogit2 and lclogitml2 require that the data y_njt , x _njt, and z _n for each distinct triplet of indices {n, j, t} (see section 2 for the notation) be organized into a separate row in the dataset. As an example, consider transport.dta, available on the Stata Press website.⁵ This fictitious dataset has been generated to imitate a sample of N = 500 decision makers choosing from J = 4 alternative transport modes (car, public transport, bicycle, or walk) in each of T = 3 choice situations. Each choice occasion refers to a different time of the year, so the decision maker’s age in decades (age), income in $10,000s (income), and full-- or part-time employment status (parttime) may vary from occasion to occasion. Each alternative mode is described by its cost (trcost) in dollars and required travel time (trtime) in hours. Before we proceed, the contents of trtime will be modified to measure savings in travel time relative to walking. Following this change, the coefficient on trtime can be interpreted as the marginal utility of one hour saved in travel time relative to walking.

. use https://www.stata-press.com/data/r16/transport (Transportation choice data)

. quietly by id t: replace trtime = trtime[4] - trtime[_n]

The first 12 rows of the dataset are displayed below. The variables id, t, and alt identify decision makers (n = 1, 2,…, 500), choice occasions (t = 1, 2, 3), and alternatives (j = 1, 2, 3, 4), respectively. Each row of choice is y_njt , and each row of trcost and trtime is x _njt . Decision maker 1 turns out to be someone who traveled by car in all three occasions. While each row of age, income, and parttime records the decision maker’s characteristics, it is repeated only within a choice occasion, not across all data rows associated with the same decision maker. In other words, the row does not make up z _n , and the three variables cannot be included in varlist3 to model membership probabilities. Instead, users may consider interacting each demographic variable with trcost and trtime and including the interaction terms in varlist1 or varlist2. As Train (2009, chap. 3) explains, including the interaction terms is equivalent to specifying the utility coefficient on each attribute as a linear function of the demographic variables.

Like clogit, both lclogit2 and lclogitml2 require a variable that identifies all data rows associated with each distinct pair of decision maker n and choice occasion t. As the first command line below shows, such a variable can be generated using the egen command’s group() function. To include alternative-specific intercepts in the LCL model, we create in the second command line alternative-specific constants. Variable asc1 is set to 1 in all data rows for car and 0 everywhere else. Variables asc2, asc3, and asc4are similarly defined in relation to public transport, bicycle, and walk, respectively. The last variable will be excluded from the model to achieve identification.

. egen gid = group(id t)

. quietly tabulate alt, generate(asc)

How many classes, C, should LCL allow for? In many empirical studies, including my own (Yoo and Doiron 2013; Doiron and Yoo 2017, 2020; Oviedo and Yoo 2017), this question is addressed by repeatedly fitting the same LCL model with different numbers of classes and inspecting which number leads to the best model in terms of the BIC. lclogit2 calculates and stores the fitted model’s BIC in e(bic), facilitating this specification search.⁶

The lclogit2 example below shows that BIC is 2316.537 with two classes. The two-class model appears to be an optimal model for this fictitious dataset. While not reported, raising the number of classes (in ncl()) to three slightly worsens BIC to 2318.831, and raising it further to four results in numerical convergence problems. For each class c, the output table reports the estimates of utility coefficients β _c and membership probability (that is, class share) π_nc (Θ). Users interested in the estimates of Θ can inspect the full coefficient vector stored in e(b). In the present application, because z _n includes only the constant regressor (that is, varlist3 is empty), π_nc (Θ) is the same across all decision makers. If π_nc (Θ) varies from decision maker to decision maker, the output table will report sample average class shares.

To obtain standard errors for the lclogit2 estimates, users can pass the estimates through to lclogitml2 as initial values, as shown below. In the lclogitml2 output table, equations Class1, Class2, and Share1 correspond to β ₁, β ₂, and θ ₁, respectively.⁷ The estimation results are stored in Stata’s memory under the name ML_2 to be recalled later in other examples.

Note that in the present example, lclogitml2 manages to locate a slightly higher sample log likelihood than lclogit2, even though theoretically the EM algorithm should have located a local maximum. This type of numerical difference may arise because the default of lclogit2 is to declare convergence when the relative increase in the log likelihood is smaller than ltolerance() (see section 3), whereas lclogitml2 uses Stata’s gradient-based optimizers, which apply a more strict set of convergence criteria (see the help file for ml maximize). The tolcheck option of lclogit2, which was not available for lclogit, requests that the EM algorithm add the relative change in the coefficient vector as another criterion. Users who favor numerical accuracy over computational speed may execute lclogit2 with tolcheck to minimize, if not eliminate, the numerical difference.⁸

The new postestimation tool lclogitwtp2 allows users to convert the utility coefficients for Class1and Class2into their monetary equivalents or WTP measures. Because trcost measures the cost of each transport mode, the marginal utility of money is given by −1 × its coefficient. Thus, lclogitwtp2 must be executed with cost(trcost), instead of income(trcost), as the required option. The output is displayed below and includes standard errors and confidence intervals produced by nlcom because the active results are for lclogitml2.⁹ Had the active results been for lclogit2 instead, only the first table in the output would have been displayed. The coefficient on trtime in each class measures how much (in dollars) each person in that class is willing to pay for one hour saved in travel time relative to walking. To test a hypothesis involving two or more WTP coefficients, users may execute lclogitwtp2 with nlcom‘s post option and then use the test command.

Both lclogit2 and lclogitml2 allow users to impose any set of linear constraints, defined by Stata’s constraint command in the usual manner. The constraints may apply within the same class, as well as between different classes. In contrast, lclogit can incorporate within-class constraints only and has peculiar syntax requirements for inputting the constraints.¹⁰ The lclogitml2 example below constrains the coefficient on trcost to be the same across class 1 and class 2. The output is omitted from reporting because it is identical in substance to another output example to follow immediately.

In a two-class model, constraining a coefficient to be the same across class 1 and class 2 is equivalent to making that coefficient class invariant. Users can introduce class-invariant coefficients more conveniently by moving relevant attribute variables from varlist2 in rand() to varlist1 as illustrated below. The required option rand() and associated distinction between varlist1 and varlist2 are irrelevant to lclogit. The older command assumes that all coefficients vary from class to class and expects all attribute variables to be specified in the position of varlist1.

The EM algorithm used by lclogit2 fits an unconstrained model faster than a more parsimonious model that includes class-invariant coefficients or other types of between-class constraints on utility coefficients (Fiebig and Yoo 2019).¹¹ As usual, the ml maximize techniques used by lclogitml2 tend to fit constrained models faster than unconstrained models, and users may therefore consider the sequence of estimation runs above as the default approach: using lclogit2 to fit an unconstrained model and then feeding the unconstrained estimates as starting values to lclogitml2, which imposes desired constraints. When the constrained maximum is far away from the unconstrained maximum, the default approach may result in convergence failure. In such cases, users may let lclogit2 impose the constraints despite the resulting slowdown and exploit the EM algorithm’s numerical stability to locate the constrained maximum.

The new lclogit2 and lclogitml2 commands take advantage of Mata and can reduce computer run times considerably relative to their predecessors, especially when there are many estimated parameters. On a Windows 10 laptop with Intel i5-8250U CPU and 16 GB RAM, for example, the new commands can fit the unconstrained two-class model above almost twice as quickly as their predecessors: the lclogit2 run achieves convergence in about 11 seconds, and the subsequent lclogitml2 run in 9 seconds, whereas the equivalent lclogit and lclogitml runs take about 24 and 17 seconds, respectively. The run-time difference becomes more perceptible when the number of classes is increased to 3: the lclogit2 and lclogitml2 runs take about 75 and 30 seconds, whereas the lclogit and lclogitml runs take about 160 and 70 seconds. Of course, using Mata does not alter the fact that fitting a finite mixture model like LCL is a computer-intensive task. lclogit2 and lclogitml2 estimation runs in authentic applications (as opposed to the present application using an example dataset) may still take several hours, if not days, of computer time.¹²

6 Applications to other types of logit models

As explained by Cameron and Trivedi (2005, 498) and reiterated by Yan and Yoo (2019), the conditional logit (clogit in Stata) formula inside the big parentheses of (1) nests binary logit (logit) and multinomial logit (mlogit) formulas as special cases. Thus, in principle, users can use clogit to obtain the same estimation results as logit and mlogit. In practice, this requires reorganization of data beforehand. In the reshape command’s vernacular, clogit requires that the data be in “long” form, with multiple rows per each group identified by group(), whereas logit and mlogit require that the data be in “wide” form, with one row per each group. Adkins (2011) provides a detailed Stata example showing how to reorganize logit and mlogit data for the clogit analysis, which he attributes to Cameron and Trivedi (2010).

lclogit2 and lclogitml2 can estimate latent class extensions of logit and mlogit once the data have been suitably reorganized in accordance with Adkins’s (2011) example. Stata 15 introduced a new command, fmm, that can fit latent class extensions of several baseline models, including logit and mlogit.¹³ For cross-sectional data (T = 1), the latent class logit and mlogit models that lclogit2 and lclogitml2 fit are equivalent to what fmm fits. But fmm cannot fit models for panel data (T ≥ 2) that consider preference class membership as the decision maker’s time-invariant characteristic, that is, models that assume that someone from class c has the utility coefficient vector of that class throughout all time periods or choice occasions.¹⁴ lclogit2 and lclogitml2 can fit such panel models once a variable identifying decision makers has been specified in the option id().

Some stated preference surveys ask the decision makers to rank order all alternatives from most to least preferred, instead of simply asking them to choose their most preferred alternative. A popular baseline model for analyzing rank-ordered data is the rank-ordered logit (rologit) model.¹⁵ Suppose that the decision maker rank orders three different jobs described by salary, availability of on-site parking (1 for abundant and 0 for limited), and full-time contract status (1 for yes and 0 for no).¹⁶ The data organization example below satisfies rologit‘s requirements, and the dependent variable rank shows that the decision maker’s most preferred job is job A (rank = 3) and least preferred job is job B (rank = 1), with job C coming in between (rank = 2).¹⁷

As Train (2009, 156–158) points out, rologit is so closely related to clogit that users may apply clogit to replicate rologit, and users may apply the extensions of clogit such as mixlogit to estimate the corresponding extensions of rologit. It follows that users can use lclogit2 and lclogitml2 to fit what Yoo and Doiron (2013) call the latent class rank-ordered logit model.¹⁸ This requires that the rank-ordered data be reorganized in a way that allows clogit to replicate rologit. Under rologit, the probability of ranking job A, job C, and job B as best, second-best, and worst, respectively, is given by a product of two clogit probabilities: the probability of choosing job A from {A, B, C} and that of choosing job C from {B, C}.¹⁹ Therefore, in Train’s vernacular, the rank-ordered data above can be “exploded” into data on two “pseudochoices”, where the first pseudochoice is made from {A, B, C} and the second pseudochoice is made from {B, C}. The command block below explodes the rankordered data as suggested and displays the resulting pseudochoice data that satisfy the data organization requirements of clogit.

Given several pseudochoice data blocks organized as above, clogit, which replicates rologit, must be executed with the option group(gid) so that Stata can correctly identify data rows to be used in evaluating each clogit probability. lclogit2 and lclogitml2 must be executed with the options group(gid) and id(group), where the variable group in the option id() allows Stata to recognize that the utility coefficients remain invariant across all pseudochoice situations arising from the same choice situation. If more than one choice situation is observed per decision maker, id() can be altered to specify a variable that identifies individual decision makers instead.

There is a well-known variant of rank ordering known as best–worst scaling (BWS) (Louviere, Flynn, and Marley 2015). In an “object case” BWS task, the decision maker examines a set of attributes, say, {salary, parking, contract type}, and states which of those attributes are the most important (best) and least important (worst) to his or her decision making. A popular baseline model for analyzing object case BWS data is the maximum-difference (max-diff) logit model. Once its psychological foundations are stripped away, the max-diff logit model is algebraically identical to clogit, meaning that users can apply lclogit2 and lclogitml2 to fit what Yoo and Doiron (2013) call the latent class max-diff logit model. Specifically, when there are K attributes, the max-diff logit model is algebraically identical to a clogitmodel defined over K×(K−1) alternatives, where each alternative is a particular two-permutation of the K attributes, that is, a distinct candidate pair of the best and worst attributes. To facilitate the max-diff analysis, we may organize the BWS data for the three-attribute example as follows:

In the present example, each data row describes one of the 3 × 2 = 6 candidate best–worst pairs. An attribute takes a value of 1 in the row where it makes up the most important or “best” element of the pair and −1 where it makes up the least important or “worst” element. The decision maker’s BWS response appears in row 4, where the dependent variable choice takes a value of 1 and attributes parking and contract take values of 1 and −1, respectively; the decision maker has stated that parking is the best attribute and contract type the worst attribute. Given several BWS data blocks organized in this way, the max-diff logit model can be fit by running a clogit regression of choice on any K − 1 = 2 out of the K = 3 attributes, where one attribute is omitted to achieve identification and the option group(group) must be specified to identify choice situations. lclogit2 and lclogitml2 can be used to extend the baseline clogit model as usual. Note that the clogit index ( x _njt β in section 2) for each row is now the best–worst utility difference of the pair that it describes, for example, β _parking − β _contract in row 4. The term “maximum difference” refers to the assumption that the decision maker chooses the pair that maximizes the best–worst utility difference.

Another type of BWS known as “profile case” BWS is identical to the object case, except that each attribute in question is associated with a particular level descriptor. For example, the decision maker may examine and state the best and worst out of three attribute levels, {salary of $2,000, limited on-site parking, part-time contract}.²⁰ The max-diff logit model for this type of response is algebraically identical to clogit too, and the data can be organized similarly to the object case example. For a full example of how to organize profile case data, see the Canadian Journal of Economics website for Doiron and Yoo (2020).

lclogit2 and lclogitml2 assume that the LCL model has been specified in what Train and Weeks (2005) classify as the “preference space”. Each estimated coefficient on an attribute is a utility coefficient, and lclogitwtp2 should be used to obtain the corresponding WTP measure. An alternative approach is to reparameterize the model in the “WTP space” by specifying the sample log likelihood directly as a function of the WTP measures. Hole’s (2007c, 2015) mixlogit and mixlogitwtp commands allow users to fit multivariate normal mixture logit models in the preference space and WTP space, respectively. The two commands lead to substantively different estimation results because, as explained by Train and Weeks (2005), multivariate normal utility coefficients do not imply multivariate normal WTP measures, and vice versa, unless the marginal utility of money is constant across all decision makers.²¹

In the context of a finite or discrete mixture logit model, which LCL is, whether users fit the model in one space or another is less critical. As Oviedo and Yoo (2017) point out, the set of mass points in a discrete mixing distribution that maximizes the sample log-likelihood function is invariant to whether the model is parameterized in the preference space or the WTP space. Therefore, the WTP measures derived from the utility coefficients (using lclogitwtp2) are the same as what users would have obtained if they reparameterized the model to fit the WTP measures directly.

Footnotes

7 Acknowledgments

I thank an anonymous reviewer for helpful comments and constructive suggestions. I also thank users of lclogit for their feedback on the older command.

8 Programs and supplemental materials

To install a snapshot of the corresponding software files as they existed at the time of publication of this article,type

References

Adkins

L. C.

2011. Alternative specific logit. http://www.learneconometrics.com/class/6243/notes/AlternativeSpecificLogit.pdf.

Bhat

C. R.

1997. An endogenous segmentation mode choice model with an application to intercity travel. Transportation Science 31: 34–48. https://doi.org/10.1287/trsc.31.1.34.

Cameron

A. C.

Trivedi

P. K.

2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.

Cameron

A. C.

Trivedi

P. K.

2010. Microeconometrics Using Stata. Rev. ed. College Station, TX: Stata Press.

Doiron

Yoo

H. I.

2017. Temporal stability of stated preferences: The case of junior nursing jobs. Health Economics 26: 802–809. https://doi.org/10.1002/hec.3350.

Doiron

Yoo

H. I.

2020. Stated preferences over job characteristics: A panel study. Canadian Journal of Economics 53: 43–82. https://doi.org/10.1111/caje.12431.

Fiebig

D. G.

Yoo

H. I.

2019. Econometrics of stated preferences. In The Oxford Encyclopedia of Health Economics, ed. Jones

A. M.

Oxford: Oxford University Press. https://doi.org/10.1093/acrefore/9780190625979.013.92.

Hole

A. R.

2007a. wtp: Stata module to estimate confidence intervals for willingness to pay measures. Statistical Software Components S456808, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s456808.html.

Hole

A. R.

2007b. A comparison of approaches to estimating confidence intervals for willingness to pay measures. Health Economics 16: 827–840. https://doi.org/10.1002/hec.1197.

10.

Hole

A. R.

2007c. Fitting mixed logit models by using maximum simulated likelihood. Stata Journal 7: 388–401. https://doi.org/10.1177/1536867X0700700306.

11.

Hole

A. R.

2015. mixlogitwtp: Stata module to estimate mixed logit models in WTP space. Statistical Software Components S458037, Department of Economics, Boston College. https://ideas.repec.org/c/boc/bocode/s458037.html.

12.

Louviere

J. J.

Flynn

T. N.

Marley

A. A. J.

2015. Best-Worst Scaling: Theory, Methods and Applications. Cambridge: Cambridge University Press.

13.

Oviedo

J. L.

Yoo

H. I.

2017. A latent class nested logit model for rank-ordered data with application to cork oak reforestation. Environmental and Resource Economics 68: 1021–1051. https://doi.org/10.1007/s10640-016-0058-7.

14.

Pacifico

Yoo

H. I.

2013. lclogit: A Stata command for fitting latent-class conditional logit models via the expectation-maximization algorithm. Stata Journal 13: 625–639. https://doi.org/10.1177/1536867X1301300312.

15.

Rabe-Hesketh

Skrondal

Pickles

2002. Reliable estimation of generalized linear mixed models using adaptive quadrature. Stata Journal 2: 1–21. https://doi.org/10.1177/1536867X0200200101.

16.

Revelt

Train

1998. Mixed logit with repeated choices: Households’ choices of appliance efficiency level. Review of Economics and Statistics 80: 647–657. https://doi.org/10.1162/003465398557735.

17.

Train

Weeks

2005. Discrete choice models in preference space and willingness-to-pay space. In Applications of Simulation Methods in Environmental and Resource Economics, ed. Scarpa

Alberini

, 1–16, 1–16. Dordrecht: Springer.

18.

Train

K. E.

2008. EM algorithms for nonparametric estimation of mixing distributions. Journal of Choice Modelling 1: 40–69. https://doi.org/10.1016/S1755-5345(13)70022-8.

19.

Train

K. E.

2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge: Cambridge University Press.

20.

Yan

Yoo

H. I.

2019. Semiparametric estimation of the random utility model with rank-ordered choice data. Journal of Econometrics 211: 414–438. https://doi.org/10.1016/j.jeconom.2019.03.003.

21.

Yoo

H. I.

Doiron

2013. The use of alternative preference elicitation methods in complex discrete choice experiments. Journal of Health Economics 32: 1166–1179. https://doi.org/10.1016/j.jhealeco.2013.09.009.