Recent empirical studies suggest that human behavior in queues causes workload-dependent service times. We investigate the translation of empirical service times into state-dependent queueing models. To this end, we identify two types of state-dependent models, static and dynamic, and two types of corresponding behavioral mechanisms. For example, we view customer early task initiation as a static mechanism and social speedup pressure as a dynamic mechanism. For each model type, we discuss behavioral mechanisms consistent with the model assumptions and indicate how empirical service times can be translated into model input parameters. We illustrate how translating service times into dynamic models can result in invalid service rates, which provides evidence against dynamic mechanisms. For dynamic models, we find that mean service times are in general not the inverse of service rates, the directional change in service rates is not always the opposite of the directional change in mean service times, and workload measurement timing can drastically impact mean service time patterns. We provide closed-form equations to convert service times into service rates and vice versa, and find conditions under which monotonic mean service times imply monotonic service rates and vice versa. Our results provide guidelines for researchers to select and specify an appropriate state-dependent queueing model from service time data, and expand the scope of previously published analytical results.
Recent empirical work has shown that human customers and servers, for a broad spectrum of service operations, behave adaptively. The performance (speed and work content; see Sections 2 and 3.4) of servers who respond to changes in the number of customers in the system (workload) is typically studied through estimated workload-dependent mean service times. It is not always obvious, however, how to correctly interpret patterns in mean service times. We show that, in some situations, it is incorrect to estimate server performance directly from mean service times. For example, mean service times moving in one direction do not necessarily imply that server speed moves in the opposite direction, holding the work content constant.
Server behavioral response to workload can be static or dynamic. By static behavior, we mean that server performance remains constant during each service. By dynamic behavior, we mean that server performance changes during a service, as the workload changes. We provide examples of static and dynamic mechanisms that have been investigated empirically in Section 3.4. As a preview, for example, we will argue that customer early task initiation is a static mechanism because it only impacts server performance before the beginning of service; in contrast, social speedup pressure is a dynamic mechanism because the server may speed up or slow down during the service.
Numerical examples (see EC.2 of E-Companion for details): (a) monotonicity is preserved and (b) monotonicity is not preserved.
Static and dynamic behaviors are incorporated in static and dynamic queueing models. We limit our scope to models that quantify workload as the number of customers in the system, , which we refer to as the workload or the state. In a static model, a service duration depends on the state at a single point in time, typically at the beginning of service, and is independent of state changes during the service. A static model is specified through a set of state-dependent service time distributions. In a dynamic model, a service duration can depend on state changes during the service. A dynamic model is specified through a set of state-dependent service rates. We limit our attention to static and dynamic models that make certain Markovian assumptions, which are stated precisely in Section 3.1. For a static model, mean service times equal the inverse of server speed, holding work content constant. This is not true for a dynamic model, as we demonstrate in Section 4.
Ideally, empirical work informs and inspires analytical researchers to develop more useful mathematical models, which in turn inspires further empirical research, in a positive feedback loop (Fisher, 2007; Fisher et al., 2020). This loop requires empiricists to identify an appropriate model and translate findings correctly, and analysts to formulate and validate models using empirical data. We focus on connecting these two streams of work. Our work aims to provide methods for the proper translation of empirical results into analytical models. In particular, we investigate the choice of an appropriate state-dependent model and how to translate mean service times from empirical data into the inputs needed to specify the model.
Although translating mean service times into static models is relatively straightforward, this is not the case for dynamic models. Our results on dynamic models contribute to the empirical literature in three ways. First, we identify a potential pitfall in estimating service rates from mean service times. In classical queueing models, such as , the service rate is the inverse of the mean service time: . It is tempting, but incorrect, to generalize this formula to:
for state of a dynamic model (see Section 3.1 for definitions of and ). We refer to (1) as a faulty generalization. The intuitive reason that (1) does not hold is that during a customer’s service time, the system state can change, and therefore the service rate can change. We derive the correct relationship between and (see (6)) and illustrate the magnitude of the errors that result from using (1) as an approximation for (see Section 6.3). Figure 1(a) shows a first example, in which the average approximation error is . We expect this result to help empirical researchers better estimate the magnitude of the effect of workload on service rate.
Second, we highlight the possibility that mean service time data analyses might not reveal the correct directional change in service rates. Suppose that the variation in service rate is of interest, but the empirical analysis focuses on whether the amount by which mean service times change, if increases by one unit, is positive, negative, or zero. In this setting, it is tempting, but incorrect, to assume that service rates and mean service times move in opposite directions:
where is the sign function. We show that (2) does not hold in general for dynamic models, as Figure 1(b) illustrates. In this example, even though E decreases, first increases and then decreases, following an inverted-U shape. We list conditions under which weaker versions of (2) hold in Section 4.3. For example, in Corollary 1, we show that if mean service times are convex decreasing and an additional condition holds, then service rates are increasing. We expect our monotonicity results to help empirical researchers identify hidden changes in service rates that may correspond to new mechanisms.
Third, we show that for single-server dynamic models, the formula for the translation of mean service times to service rates differs greatly if the workload is measured at the end instead of the beginning of the service. Furthermore, for given service rates, a change in measurement timing can reverse the monotonicity pattern of the mean service times. This finding indicates that service times for which the state is measured at different times are not directly comparable. In general, the measurement timing choice should depend on data availability and hypothesized mechanisms. Some papers (such as Chan et al., 2012) propose models consistent with specific workload measurement timing.
Moreover, our results for dynamic models contribute to the analytical literature and expand the scope of previously published managerial insights. Recent work that is grounded in empirical findings formulates dynamic models to provide operational suggestions, and either assumes known service rates (Do et al., 2018) or estimates service rates directly (Cho et al., 2019). In contrast, we demonstrate how to translate empirical mean service times into service rates. Our results facilitate operational improvements (such as selecting staffing levels or choosing between one pooled queue versus parallel dedicated queues) through applications of analytical models calibrated with empirical data. Given that empirical findings are typically expressed in mean service times, the ability to translate service time data into service rate estimates provides the missing puzzle piece. To summarize, we make the following key contributions:
We categorize and name two major state-dependent queueing model types, static and dynamic models, and connect them to corresponding behavioral mechanisms. The choice of model should depend on both the hypothesized mechanisms and whether the model is consistent with data.
Using numerical examples, we show that although translating empirical service times into static models is always possible, the translation process may fail for dynamic models, thus providing evidence against dynamic mechanisms. In cases where translation is possible for both model types, performance measures such as mean waiting times can differ drastically.
For dynamic models, we provide closed-form expressions to compute service rates from mean service times. These expressions demonstrate that (1) does not hold in general. Also, we provide exact formulas to compute mean service times from service rates and show that service times follow phase-type (PH) distributions.
Contrary to what appears to be implicitly assumed in some of the empirical literature, for dynamic models, we show that monotone mean service times do not guarantee monotone service rates. We provide conditions that, combined with monotonicity for one sequence (service rate or mean service time), guarantee monotonicity of the other sequence, thus showing that (2) holds in certain situations, but not in general.
In dynamic models, with given service rates, we show that whether the workload is measured at the beginning or end of service leads to drastically different patterns of mean service times, both in terms of the magnitude and the directional changes.
The remainder of the paper is organized as follows: Section 2 reviews related literature; Section 3 formulates and compares the models of interest, and states assumptions; Section 4 provides results for dynamic models; Section 5 provides results for the case in which workload is measured at the end of service; Section 6 presents case studies and numerical examples; and Section 7 concludes. An electronic companion provides a table of notation, all proofs, and other supplemental materials.
Related Literature
Our work is related to both empirical work on how service times depend on workload and analytical work on the formulation and analysis of state-dependent queueing models. We review these two streams separately.
We begin by defining general terms that are needed to explain how “workload” is defined in different streams of the literature:
The analytical models that we discuss assume a constant number of servers , but some of the empirical work involves settings where the number of servers varies with time. The unfinished work is the sum of the service times of all waiting customers and the residual service times of all customers currently in service. The virtual waiting time is the time that a customer arriving at time would have to wait until commencing service. The unfinished work equals the virtual waiting time () for single-server but not for multi-server queues (Sonderman, 1979).
Empirical Studies: We focus our discussion on three issues: (a) how the workload is defined, (b) when the workload is measured, and (c) what pattern in the mean service times is revealed. Table 1 compares a sample of recent empirical papers in terms of these three issues.
Empirical studies that involve the impact of workload on service time.
This study also reports the impact of “overwork” as , but we focus only on the impact of workload.
: Indicator function for whether occupancy is above a threshold.
: Quadratic and spline functions.
: Anticipated number of customers in the next period.
How is the workload defined? In the empirical studies of interest to us, regression models are used to estimate mean service times, (or ), as a function of workload and other variables. Possible ways to define workload include the proportion of servers that are busy (), the number of customers in the system (), the number of customers in the queue (), or functions involving one or more of the aforementioned choices, and all of these definitions have been used in empirical studies (see Table 1, Column 2). The appropriate way to define workload in an empirical study depends on data availability, context, and the research question. It is important to consider which aspects of the workload are “visible” to servers, accurately and in real-time. For example, in a hospital intensive care unit (ICU), the number of busy servers (i.e., the number of occupied beds) is visible, but the queue length (i.e., the number of patients waiting to be admitted to the ICU) is ambiguous and might not be visible to those working in the ICU (Berry Jaeker and Tucker, 2017; Kc and Terwiesch, 2009, 2012). Therefore, servers can only react to changes in the number of busy servers, but not to changes in the queue length. In contrast, in a grocery store checkout lane, the number of waiting customers is visible to the server and can have an important impact on their behavior (Wang and Zhou, 2018).
When is the workload measured? Empirical researchers need to decide the time at which the workload relevant to the service time of customer is measured. Possibilities for include the time of entry to the system, the time at which service began, the time at which service ended, or an average of the workload from the beginning to the end of service (see Table 1, Column 3). The appropriate choice depends on the data availability and the research question. Arguments justifying the choice of are not always provided. Kc and Terwiesch (2012) provide an example of a well-justified choice for : workload should be measured at or near the end of an ICU patient’s length of stay (LOS) because the way in which the LOS might be impacted by workload is through the discharge of the “least-sick patient” in order to make room for a new admission. We show in Section 6.5 that for a given pattern of service rates, the pattern of corresponding mean service times can vary greatly depending on the time at which the workload is measured.
What pattern in the mean service times is revealed? Hypotheses for empirical studies of the impact of workload on server performance are typically expressed in terms of high-level features: for example, the relationship is increasing, decreasing, U-shaped, or inverted-U-shaped. The dependent variable is typically the service time for a particular customer. We show the patterns in Table 1 (Column 4), with denoting that mean service times increase with workload, denoting that mean service times decrease with workload, and a sequence of these symbols denoting a non-monotone pattern. Early studies anticipated finding generalizable patterns, but as the field has matured, researchers have come to recognize that workload can impact service times through a wide variety of mechanisms and the research focus has shifted to investigating individual mechanisms (Delasay et al., 2019).
Analytical Studies: Queueing theorists have studied state-dependent queueing models since the 1960s. Such models can potentially capture the empirically observed ways in which server performance depends on workload. As we discussed in Section 1, many of these models fall into one of the two categories: static and dynamic models.
Note: For some papers, we describe a special case of a more general model analyzed in the paper.
This study investigates multiple models that differ only in terms of the number of servers.
The model is a static model only if the rescaling parameters are set to 1 for all states.
Model Properties: In all of the dynamic models in the literature, the process is Markovian. Therefore, the model is relatively easy to solve. However, the Markovian process does not imply that service time distributions are exponential, as noted by Harris (1967). As we show in Section 4.2.2, service times in dynamic models follow infinite-state PH distributions. In static models, is not Markovian, even for exponential service times, such as in Chan et al. (2017) and D’Auria et al. (2022). Therefore, analyses of static models are more complicated, may involve other state variables ( or ), and are typically limited to single-server models. The two multi-server static models listed in Table 2 (Chan et al., 2017; D’Auria et al., 2022) assume exponential service times as well as other limiting assumptions.
Translation: For static models, empirical service time data can be used directly to estimate state-dependent service time distributions, and thus specify the service process parameters. For dynamic models, a customer’s service time is the sum of a random number of sojourn times in system states visited during the service. To specify the service process in such models, one needs to translate empirical service times to sojourn time parameters.
None of the dynamic model papers in Table 2 investigate service time distributions. In empirical studies, as illustrated by the papers in Table 1, service times are typically the variables that are used to evaluate server performance. This creates a disconnect, which we address by developing methods to translate empirical service times into dynamic models, thus enabling empiricists to compare and evaluate both types of models. Another disconnect is that many empirical papers define workload in terms of , whereas analytical papers allow the service process to depend on , , or . We focus on models with a dependence on , because is a simple function of . Consequently, translating empirical findings that involve into a model that allows dependence presents no difficulties.
Models and Notation
In this section, we start by outlining notational conventions. In Sections 3.1 to 3.4, we compare specific static and dynamic models in terms of their formulations, performance evaluation, service time definitions, and corresponding mechanisms. We state assumptions in Section 3.5.
We use uppercase bold letters to represent matrices and lowercase bold letters to represent vectors. The th element of matrix is and the th element of vector is . is an identity matrix, is a vector of ones, is a vector of zeros, and is the th unit vector (i.e., for ), all of appropriate size. is an indicator function for condition . The first-order and second-order forward differences for the sequence are and , , respectively. We call a sequence convex if it is discretely convex (see Definition 1 in Zacharias and Pinedo, 2017). We use increasing and decreasing in their non-strict sense unless noted otherwise.
We assume throughout that there is a single queue, a single customer class, and a constant number of identical parallel servers that serve customers (who never abandon) from the queue in first-come-first-served order. Servers do not idle if there are customers waiting.
Model Formulations
In both the static and dynamic models, let be the service time random variable for a focal customer for whom the system state is immediately after service begins, that is, the workload is measured at the beginning of service.
The static model we focus on assumes exponential service times (). It has a Poisson arrival process with rate . The service time for a customer whose service commences at time where . Previous literature uses the Kendall notation for this model, but we will show that static and dynamic models differ in many aspects, and using the same notation can be misleading. Therefore, we add an asterisk to distinguish the static model and denote it by .
The dynamic model we focus on assumes exponential service rates (). Let the arrival rate be and set . We use to denote the number of busy servers in state . The probability that a particular server completes service in state , , in the infinitesimal interval is , which implicitly defines the service rate in state .
Note that the natural way to specify a static model is through the parameters of the state-dependent service time distributions, for example, their means. In contrast, the natural way to specify a dynamic model is through state-dependent service rates.
Performance Evaluation
Performance evaluation is the process of evaluating performance measures (such as mean waiting time) for a queueing model with specified input parameters. Common approaches for performance evaluation include simulation and obtaining analytical solutions.
It is easy and natural to obtain performance measures in using simulation because a customer’s service time only needs to be generated once, at the beginning of service. After that, the service completion time becomes available for this particular service, and the system’s next event (a customer arrival or departure) is determined. However, obtaining its analytical results is difficult. Only boundary results for multi-server cases are provided (Chan et al., 2017) and exact solutions are only available for single-server cases (such as in Oz et al., 2017). Even for single-server cases, no published papers provide ready-to-use numerical algorithms to evaluate performance from the analytical solutions. The analytical challenge is that the process is not Markovian. To formulate the model as a Markov process, one needs to add state variables to keep track of the workload at the beginning of service for each busy server.
In contrast, it is easy and natural to obtain analytical solutions for : The process is a time-homogeneous continuous-time Markov chain (CTMC), a birth-death process with the birth rate and death rate in state , as illustrated in Figure 2(a). We refer to this infinite-state CTMC as the “underlying chain” for . While obtaining analytical results for is relatively easy, simulating it is significantly harder than . One natural way to simulate is to generate sojourn times between transitions. Once a transition occurs, we use independent random numbers to determine the type of event (an arrival or a departure) and which customer leaves the system (if the event is a departure). Parameters for sojourn times are updated after each transition. To collect service times, we need to sum up all sojourn times during a service.
Service Time Definitions
In , the distribution of the service time is defined explicitly. Translating into model specifications is straightforward: and .
In contrast, in , the distribution of is implicitly defined, through sojourn times distributions, and must be obtained through model analysis. is the time to absorption in a modified version of the underlying chain, which we call the “absorbing chain,” , as discussed in detail in Section 4.1 and illustrated in Figure 2(b). The absorbing chain begins in state , at , corresponding to the beginning of service for the focal customer, and is the time to reach the absorbing state , corresponding to the completion of service. Therefore, is a first-passage time from to in the absorbing chain, defined as:
equals the sum of a random number of independent and exponentially distributed sojourn times, that is, times from one instant at which changes to the next. Thus, the total duration of service depends on stochastic events (such as arrivals or service completions of non-focal customers) that occur after the focal customer’s service has begun.
Therefore, in , translating into is more complicated, as shown in Section 4.1, and does not follow an exponential distribution. Instead, it follows an infinite-state PH distribution, as discussed in Section 4.2.2. We define , to simplify the notation in the remainder of the paper.
We revisit the dynamic model in Section 5, with defined as the service time for a focal customer for whom the system state is immediately before service ends, with . That is, the workload is measured at the end of service.
Models and Mechanisms
and imply different server behavior. To explain the differences, we express service time as work content divided by server speed (as in Delasay et al., 2019). For example, the work content and server speed for supermarket checkout could be approximated by the number of items to scan and items scanned per time unit, respectively. Depending on how work content and speed are affected by , we identify two groups of mechanisms:
Dynamic Mechanisms
The remaining work content or the server speed changes during the service.
Static Mechanisms
Both the work content and the server speed are fixed during the service.
Dynamic mechanisms include social speedup pressure and task reduction. Static mechanisms include forgetting and customer early task initiation (see Table 2 in Delasay et al., 2019, for definitions.) By fitting both models to data, one might find evidence against the dynamic and thus against dynamic mechanisms. We illustrate this in Section 6.2.
Assumptions
We invoke the following assumptions for some of our analyses.
, , , and , for , are positive and finite real numbers, and is a positive and finite integer.
The requirements that and are standard for . Specifically, and avoid instantaneous transitions, and and ensures that is irreducible. Similarly, we assume and to avoid technical difficulties.
We invoke Assumption 1 for all derivations and theoretical results. Some formulas we derive can be used for calculations, however, even if Assumption 1 fails to hold. We discuss this in Remark 1 in Section 4.
The service rates converge at state (), that is: for .
This assumption is invoked for certain derivations. As a special case, if , for and the service rates in the model depend only on the number of busy servers at time , . Defining workload as or a function of is common in the empirical literature (see Table 1).
We set .
This assumption, which is always in effect, is made to simplify the statement of certain equations.
Dynamic Model:
Since translating into is straightforward (see Sections 3.1 and 3.3), we hereafter focus on translation issues in dynamic . In this section, we provide a series of results: formulas to compute given in Section 4.1; formulas to compute given and a PH distribution analysis of for in Section 4.2; and monotonicity results regarding the and sequences in Section 4.3. Results for given and an infinite-state PH distribution analysis of in are in EC.3.2 and EC.3.4 of E-Companion, respectively.
Service Rates Given Mean Service Times
Empirical studies typically evaluate state-dependent server performance in terms of estimated mean service times (see Section 2). Should one wish to use a dynamic model, one needs to translate the mean service times, , into service rates . We provide a closed-form solution for this task, namely, for computing given and the arrival rate , which we assume have been estimated using empirical data. We show that, in general, the equation in (1) is a faulty generalization.
We use the absorbing chain (see Figure 2(b)) to analyze . As indicated in (3), is the time to absorption in the absorbing chain, starting in state . The absorbing chain is obtained from the underlying chain (shown in Figure 2(a)) through the following modifications: (a) Remove state 0 (because service cannot begin in state 0) and add absorbing state (corresponding to service completion); (b) set the transition rate from to , to the rate of service completion for the focal customer, that is, ; and (c) set the transition rate from to , to the rate at which non-focal customers depart, that is, . The sojourn time in state is exponentially distributed with rate and its expected value . We use first-step analysis (see Kulkarni, 2010, Section 3.1) to obtain a recursive equation for . This method involves conditioning on the next transition in the absorbing chain. Since we are interested in the next-transition probabilities, we use the embedded discrete-time Markov chain (DTMC) of the absorbing chain , where is the number of customers immediately after the th transition (state change) for .
Before absorption, the next transition (“the first step”) for both and must be one of the following: focal customer finishes service, new customer arrives, or non-focal customer departs. We denote these events and their probabilities as , , , , , and , respectively, where
Therefore, using the Markov property:
with by Assumption 3.
By solving (4) for , we obtain the following expression for :
For , and , and therefore (5) and (6) reduce to:
In empirical work, it is common to assume that service times depend only on (see Table 1), which implies that for . For , this implies that all service rates are equal. For , this implies that faulty generalization (1) holds true in the tail, that is:
We implicitly invoked Assumption 1 in deriving (6), as we do for all of our derivations. However, equation (6) can be used to calculate values even if Assumption 1 does not hold. Such calculations could lead to negative or infinite values. It might not be known a priori whether Assumption 1 holds, for a set of values that have been estimated as part of an empirical study. Calculating values using (6) helps one to check whether the assumption holds.
Based on (6), (8), and Lemma EC.3 of E-Companion, we see that the numerator of is non-positive if , which implies that will be non-positive if we try to fit an model. In other words, a sharp decrease from to can cause to become negative, and hence, invalid—which we interpret as evidence that a data set is inconsistent with the dynamic model. The corresponding example in Section 6.2 illustrates this.
The absorbing chain and the Markov jump process, for , under Assumption 2: (a) absorbing chain and (b) absorbing Markov jump process.
Simulated mean waiting times with 95% confidence intervals.
Service Times Given Service Rates
Some empirical papers (such as Cho et al., 2019) estimate state-dependent service rates directly from data. Therefore, computing mean service times () from estimated service rates () is of interest, and we provide formulas for this purpose in this subsection. An estimate of the arrival rate () is needed to use the formulas. In addition to the mean, we characterize the distribution of the service time random variable. We provide these results for the model in the main text and analogous but more complicated results for the multi-server model in EC.3.2 and EC.3.4 of E-Companion.
Mean Service Times Given Service Rates
If and are known, then (7) can be used to obtain . This is not sufficient, however—we also need to obtain an initial value, such as , which is a challenging task. The correct solution to (7) is the minimal non-negative solution (see Lemma EC.1 of E-Companion). Our analysis provides the correct value for , which together with (7) can be used to obtain the remaining values in . We begin by defining the following sequence:
If then:
Furthermore, under Assumption 2,
Wang and Zhou (2018) previously provided the infinite-series representation in (10). We show that under Assumption 2, one obtains a closed-form expression involving only a finite sum, as in (11). Theorem 1 cannot be generalized directly to with . The analysis for that model is more complex. We present results for with Theorem EC.1 in EC.3.2 of E-Companion.
Service times in and non-state-dependent models are exponentially distributed; in contrast, as we show in this section, service times in state-dependent models are PH distributed.
As explained in Section 4.1, we formulate for as the time to absorption in the absorbing chain . So far, we have only studied the first moment, but the theory of PH distributions permits us to compute higher moments and the full distribution of . Most of the theory of PH distributions assumes a Markov chain with a finite number of states, but the absorbing chain for has an infinite number of states. Under Assumption 2, we can aggregate states in the tail, resulting in a finite-state Markov chain, for which the time to absorption provides the exact distribution of .
The time to absorption in a finite-state CTMC with a single absorbing state and all other states transient is, by definition, a PH random variable (Latouche and Ramaswami, 1999, Definition 2.3.1). The canonical form of the infinitesimal generator matrix (“generator” for short) is:
where is the set of transient states, is the absorbing state, is a subgenerator matrix of transition rates among the transient states, and is a column vector of absorption rates from the transient states. The row vector specifies the initial probabilities for . A PH distribution is represented by the notation PH. The parameters and suffice to fully specify the absorbing chain, because must equal (so that the rows of sum to zero), and the initial probability for state must equal (so that the initial probabilities to sum to one). EC.3.3 of E-Companion provides the nonzero elements of (12) and
Under Assumption 2, the original absorbing chain in Figure 2(b) simplifies for in that only two transitions are possible from state : To with probability and to with probability (see Figure 3(a)). Therefore, each transition is a Bernoulli trial with a constant success probability , and the number of transitions until service finish is geometrically distributed. This observation allows us to lump all states together, as shown for state in Figure 3(b).
We modify the absorbing chain (Figure 3(a)) by allowing a virtual transition (a self-transition) in state , resulting in the transition diagram in Figure 3(b). We follow Kao (1996: Example 5.6.3) in referring to the corresponding process as a continuous-time Markov jump process (MJP; a CTMC with virtual transitions). We model as the time to absorption in the absorbing MJP in Figure 3(b), where state is the absorbing state and all other states, , are transient. The arrival and service completion probabilities in state in the absorbing chain are and , respectively; the former corresponds to a virtual transition in the absorbing MJP. Hence, we obtain:
and the remaining non-zero elements are the same as in EC.3.3 of E-Companion. If , then is distributed as PH() and PH distribution formulas for the cumulative distribution function and moments apply (see EC.32 to EC.34 of E-Companion). If , then has the same distribution as , which is exponential with rate , because states in the absorbing chain are aggregated into state in the absorbing MJP.
This approach only works for . For in state , in addition to transitions to and , a transition to is possible, with probability (compare Figures 2(b) and 3(a)). Consequently, the Bernoulli success probability can change, if the process moves below state —something that was not possible for . Therefore, the states above are not identical and cannot be lumped together for . As a result, the service time does not follow a standard PH distribution because the related absorbing chain has an infinite number of states. However, we show with Theorem EC.2 in EC.3.4 of E-Companion that the distribution of in with belongs to the SPH class (an infinite-state generalization of PH; Shi et al., 2005).
Monotonicity Properties
We have seen (Figure 1(b)) that the direction of change in service rates is not always the opposite of the direction of change for mean service times, that is, (2) is indeed a faulty generalization. In this section, we present a series of results that are weaker than (2), but correct. We present a theorem with an if-and-only-if relationship between and ; and a corollary that shows how this relationship simplifies for . We present three additional corollaries with if–then relationships, stating that if one sequence is monotone, for , possibly combined with additional conditions, then the other sequence is monotone in the opposite direction, for .
if and only if:
and the following additional condition holds:
An intuitive explanation for (16) is that it sets a lower bound for , the rate of change for . Although (15) and (16) resemble the conditions for convex decreasing , we show numerically in Section 6.4 that a non-convex (convex) decreasing may (may not) lead to increasing .
If
then
Condition (19) sets an upper bound on the rate of change for before the monotonicity of is lost.
Our next corollary requires no additional conditions.
If increases () then decreases ().
Theorem 2 and Corollaries 1 and 2 hold for . In the important single-server case, the conditions in Theorem 2 and Corollary 1 simplify considerably, as we show in Corollaries 3 and 4.
For ,
if and only if:
From Assumption 1, (22) and Lemma EC.3 of E-Companion, the right-hand side of (23) is guaranteed to be non-positive.
For , if is convex decreasing ( and ) then increases ().
Corollary 3 and our other results are more widely applicable than Proposition 5 in Wang and Zhou (2018: Appendix A). We discuss the differences in detail in EC.3.6 of E-Companion.
Single-Server Model With Workload Measured at the End of the Service
So far, we have assumed that workload is measured at the beginning of service—an approach that makes sense if one views the workload as a cause and the service duration as an effect. However, empirical studies vary in whether workload is measured at the beginning of service, the end of service, or in some other way (see Table 1). It is therefore important to investigate whether the relationships that we derived for dynamic models change substantially if we modify the workload measurement timing. To achieve this, we assume that the researchers have measured workload at the end of service and aim to translate the results into a dynamic model.
We find that the relationship between mean service times and service rates changes drastically, which suggests that the choice of how to measure workload could significantly impact estimated service times.
Notably, the results in Section 4, for workload measured at the beginning of service, do not require stability. Only the results in this section, for workload measured at the end of service, require stability.
We define the system state for the focal customer as the number of customers in the system immediately before the end of service. Specifically, we suppose that the service of the focal customer begins at time and ends at time , for which ; we let denote the service time of the focal customer; and we set , as stated in Section 3.3.
If the current state is , , then the preceding (last) transition must have been either (a) the arrival of a customer to state , denoted ; or (b) the service completion of the customer preceding the focal customer, in state , denoted (meaning, after the preceding customer departed, the focal customer started service in state ). Therefore, by conditioning on the last event (last-step analysis), we obtain:
Next, to obtain , we employ the DTMC embedded to the CTMC for the system . If the service completion of the focal customer is the th transition in the embedded DTMC, then:
where and are the steady state probabilities for states and , respectively, is the arrival probability for state , and is the finish probability. The last equation follows from that the local balance equation for states and is . Note that (25) also follows from time reversibility for birth-death processes. Using (25) in (24), we obtain a recursive equation for as:
Solving for from (26), we obtain:
Equation (27) was derived under Assumption 1 but can be used for calculations even if Assumption 1 does not hold; see Remark 1.
Faulty generalization (1) example: (a) comparison of service times and (b) comparison of service rates.
Case Studies and Numerical Examples
In this section, we illustrate how empirical researchers might translate empirical findings into queueing models, as well as the implications of some of our results. First, we provide a case study based on empirical findings to illustrate how different models can lead to widely different predicted mean waiting times. Second, we perturb the case study’s empirical findings to illustrate how translation to a dynamic model could fail, which provides evidence against dynamic mechanisms. Third, focusing on dynamic models, we use simulated data to illustrate faulty generalizations (1) and (2). Fourth, we provide an example in which convex decreasing mean service times do not correspond to increasing service rates and another example in which non-convex but decreasing mean service times correspond to increasing service rates. Fifth, we illustrate how changing the timing of workload measurement can reverse the mean service time trend.
Case Study 1: Comparing Model Predictions
Wang and Zhou (2018) estimate mean service times in a supermarket checkout setting using log-linear regression. Using results from that paper and additional information provided by the authors, after averaging over the effects of explanatory variables other than (see EC.4.1 of E-Companion), we obtain:
To translate the estimated mean service times into a queueing model, one must make an assumption about asymptotic behavior. If one extrapolates to large values, (28) indicates that will eventually increase without bound, but Wang and Zhou (2018) report convex decreasing , which we take to mean that is convex decreasing within the range of their data. Therefore, we assume that follows (28) until it reaches its minimum (at ), and remains constant thereafter, resulting in:
We translate these mean service times into three models: (the model that Wang and Zhou, 2018, assumed in their theoretical work), , and (consistent with their argument for a lognormal service time distribution; see their Figures 5 and 6). We compare the mean waiting times (in seconds) for the three models, with respect to the number of dedicated checkout lanes, , which we vary from 3 to 10. The paper does not provide the total arrival rate, but we assume it to be customers per second, and that customers select a queue randomly, resulting in independent single-server queues, each with an arrival rate of . For the static models ( and ), we use to specify the mean service times and the logged variance, equal to 0.184, to fully specify the log-normal service time distributions. For the dynamic model (), we use and to compute service rates, using (8).
We use simulation to estimate utilization rates and mean waiting times (see EC.4.2 of E-Companion for details). The utilization rates for the three models are similar: for , for , and for . However, Figure 4 shows that a single sequence of estimated mean service times (from (29)) results in noticeably different estimated mean waiting times across the three models, with showing lower mean waiting times. and show similar mean waiting times, but that is not always the case, as we show in Section 6.2.
Which model is most appropriate? To answer that question, one would ideally obtain empirical data on mean waiting time (or some other performance measure)—information that is not available in this case.
Case Study 2: Translation Failure as Evidence Against Dynamic Mechanisms
Next, we illustrate how the translation of empirical results into a dynamic model can fail, providing evidence against dynamic mechanisms. We do this in a fully controlled setting, in which we abstract from the empirical mean service times in (29) to a data-generating process that is a static model. Recall that this means that service times, conditional on starting state , are exponentially distributed with rate . For the purposes of the illustration, we set , from (29) for , and .
We begin by simulating data from this process (see EC.4.2 of E-Companion for details). Then we use the simulated data to estimate the arrival rate () and the mean service times, using the following regression model:
which results in:
Suppose that an empirical researcher attempts to translate the mean service times from (31) into an dynamic model. This would be reasonable, because the researcher would not know the true data-generating process but might know that the model has been studied extensively. The researcher might be surprised to find that using (8) to compute results in a negative value:
Figure 5 illustrates these findings. One may wonder whether the negative is statistically significant. We bootstrap model (30), using a “bootstrapping pairs” approach (see Efron and Tibshirani 1994, Section 5.2) to assess the robustness of negative . We find that 100 bootstrap samples of are all negative.
The negative indicates that translation into is not possible. Within our fully controlled setting, we can draw a strong conclusion: the translation failed, because is an incorrect model. The true model is . An empirical researcher would not be able to draw such a strong conclusion, because the researcher would not know the true model. However, mean service times estimated in an empirical study that results in negative service rates, using (8), provide evidence against , as well the dynamic mechanisms that are consistent with that dynamic model—even if that evidence might not be conclusive. To see why (8) results a negative value, note that within an model, for to be large, needs to be small, and (the probability that moves from State 1 to State 2 before the service completion) needs to be small. In this case, it is impossible to find a positive that is sufficiently small to make .
Wang and Zhou (2018) mention two mechanisms through which workload could impact server performance for the single-server queues in their setting: social speedup pressure and customer early task initiation. Hypothetically, if their mean service times had been as assumed in this subsection, then the failure to translate the mean service times into a dynamic model could be taken as evidence against social speedup pressure—a dynamic mechanism. The large mean service time for customers who arrive at an idle server () could be seen as evidence of a physical setup mechanism (a static mechanism), perhaps because cashiers need to log into the cash register before serving the first customer after a break.
Illustrations of Faulty Generalizations (1) and (2)
Next, we present two examples to illustrate faulty generalizations (1) and (2), for the translation of mean service times into dynamic models. For each example, similar to Case Study 2, we specify the true model, simulate data from the true model, estimate mean services, and use (6) to translate the estimated mean service times into service rates.
The true models are models with for Example 1 (Example 2). We use the average of two sigmoid functions (see EC.4.3 of E-Companion for details) to generate . Weighted sums of sigmoid functions provide a flexible way to generate monotone and non-monotone service rates with finite limits and have been validated by real-world data (Cho et al., 2019).
Example 1 True Model Specification and Simulation
We use a U-shaped , shown in Figure 6(b). We simulate the true model for 100,000 s and exclude customers who arrived during the first 20,000 s as a warm-up period. We obtain service times for 160,345 customers and the largest starting state for service is 17.
Mean Service Time Estimation
We begin by estimating as the average of all service times that start in state . The resulting estimates are noisy for rarely visited states (see Figure 6(a)). To reduce noise, we extrapolate smoothly by setting for to . Furthermore, to translate the mean service times into service rates, we must extend beyond the range of the sample, and we do that by setting for . Figure 6(a) shows the resulting mean service times as (extended).
Translation
Figure 6(b) shows the (extended) obtained from (extended) using (6). We also plot (extended) from faulty generalization (1) and the actual service rates for comparison. The (extended) matches the true pattern better with a mean absolute percentage error (MAPE) of . In contrast, even though (extended) captures the directional change in correctly, it results in a higher MAPE of .
Example 2 True Model Specification and Simulation
We use an inverted-U-shaped , shown in Figure 7(b). We simulate the true model for s and exclude customers who arrived during the first s as a warm-up period. We obtain service times for customers and the largest starting state for service is .
Mean Service Time Estimation
We use the same approach as in Example 1 to obtain ; for (extended), we extrapolate smoothly for to , and set (extended) for . Figure 7(a) shows the resulting and (extended).
Faulty generalization (2) example: (a) comparison of service times and (b) comparison of service rates.
Translation
Figure 7(b) shows that (extended) successfully captures the increase–decrease pattern for States 1 to 7 with a MAPE of . In contrast, (extended) incorrectly shows an increasing pattern for State 7 with a higher MAPE of , which supports our concern regarding faulty generalization (2).
As a check on our work for both examples, we compute the true using (EC.34 of E-Companion), and find that their values agree closely with (extended) for frequently visited states, but diverge slightly for () for Example 1 (Example 2).
Convex Decreasing Mean Service Times Are Not Equivalent to Increasing Service Rates
Convex decreasing guarantees increasing for single-server dynamic models (Corollary 4) but the same is not true for multi-server dynamic models. We use two examples to illustrate this.
We set , , with convex decreasing (; ) (see Figure 8(a)). The resulting is non-monotone, demonstrating that convex decreasing can result in non-increasing .
Numerical examples for Theorem 2: (a) convex decreasing ; non-increasing and (b) non-convex ; increasing .
We set , , with non-convex obtained from a sigmoid function (see Figure 8(b); see EC.4.3 of E-Companion for details). The resulting is increasing, demonstrating that non-convex can result in increasing . We verified the results for both examples using simulation.
Measurement Timing can Reverse the Direction of Mean Service Times
Our analysis in Sections 4 and 5 demonstrates that in a single-server dynamic model, mean service time patterns differ markedly depending on whether the workload is measured at the beginning of service ( in (10)) or at the end of service ( in (26)).
To illustrate, we set , , and use a sigmoid function to obtain (see Figure 9; see EC.4.3 of E-Companion for details). We compute and using (10) and (26), respectively (see Figure 9). The server utilization is . The increases, which results in decreasing , but increasing. We verified these results using simulation.
Measurement timing example: comparison of and curves.
Conclusion
A growing body of empirical work documents the adaptive behavior of customers and servers in service operations, including a research stream that focuses on how workload impacts service times. We provide a framework and draw attention to important subtleties in the process of translating empirical findings into the selection and specification of state-dependent queueing models. Among the available models, we identify two model types (static and dynamic) and relate them to behavioral mechanisms. For dynamic models, we provide a formula to compute service rates from mean service times and caution against a pitfall in assessing server behavior (see (2)). This translation process is crucial because it connects empirical research to queueing models that are employed in the management of service and production operations. In particular, empirically grounded dynamic models have been used to support operational decisions such as system design and staffing (Dong et al., 2015; Do et al., 2018; Cho et al., 2019).
The static model that we focus on has not been analyzed exactly, and we focus on its insights into the service process from a behavioral perspective. Mathematical analysis of static models is challenging, but simulation analysis of such models is straightforward. Static and dynamic models correspond to different behavioral mechanisms. The choice of model type should depend on both the hypothesized mechanisms and the consistency with data.
We extend the analysis of dynamic models from the system level to the individual customer level. By formulating service times as first-passage times, we show that service times in dynamic models follow PH distributions. For single-server dynamic models, we show that workload measurement timing can have a drastic impact on service time patterns.
We identify two potential mistakes in translating service times into service rates, (1) and (2), for dynamic models. Using first-step analysis, we provide a closed-form solution for from , which corrects (1). We provide a series of conditions in the form of bounds on the forward differences of mean service times that guarantee (2) to hold. Our results demonstrate what mean service times imply about how service rates vary with workload, and reveal speed-ups and slow-downs hidden in mean service times. These results enable researchers to correctly assess the impact of different mechanisms on server performance.
Building on published empirical findings, we demonstrate two possible scenarios. First, it is possible that models of both types can be specified from the same mean service times. In this scenario, further investigation of system performance measures is needed to determine the more suitable model type. Second, the outcome of the translation process could indicate the inconsistency of dynamic models with mean service times, which provides evidence against this model type and thus its associated mechanisms. Despite the greater tractability of dynamic models, which makes it easier to obtain analytical results regarding, for example, admission policies, we demonstrate fitting them to data is not always possible.
Our work suggests several topics that would benefit from further investigation. First, generalizing static and dynamic models to non-Markovian settings would benefit researchers by providing them with a less restricted toolkit. In particular, analysis of dynamic models with non-Markovian sojourn times could be a fruitful direction for future research. Comparisons of such models with static models with general service times, as well as the translation of mean service times into the inputs needed for such models, could provide insights for both the analytical and empirical literature streams.
Second, the translation process from empirical findings to a mathematical model may require extrapolation beyond a finite dataset. Most queueing models assume unlimited waiting room, which means that one needs to specify how service time distributions or rates vary as the queue length goes to infinity. In Section 6, we address this issue by extrapolating smoothly and assuming constant for unobserved states. However, this is only one possible approach: if a log-linear regression is used to estimate mean service times, then one natural approach is to extrapolate the curve beyond the data. This approach can be problematic because it typically results in either negative or very large mean service times as .
Third, not only should the hypothesized mechanisms inform the choice of queueing model type; but they should also inform the choice of statistical methods. For example, if the researchers suspect that server speed changes dynamically during service, survival analysis methods could be appropriate (see Ding et al., 2024 for an example).
Fourth, our findings in Sections 5 and 6.5 indicate that measurement timing could be an important factor in determining service time patterns. As support, Table 1 lists five papers that measure workload only at the beginning of service, of which three report service time patterns. One paper measures workload at the end of service and finds a service time pattern. From an empirical perspective, definitions of mechanisms (such as those proposed in Table 2 of Delasay et al., 2019) should include measurement timing, which could help researchers in data collection and model selection. The names of the server early task initiation and customer early task initiation mechanisms imply that they involve a reduction in work content that occurs near the beginning of service. In contrast, the definition in Table 2 of Delasay et al. (2019) of the task reduction mechanism is silent regarding the timing of work content reduction. Kc and Terwiesch (2012) argue convincingly that task reduction occurs near the end of service in ICU settings. However, in other settings, task reduction could in principle occur earlier during the service. From an analytical perspective, a “late task reduction” mechanism is inconsistent with the static and dynamic model types that we focus on in this paper, but it might be consistent with the type of dynamic model proposed in Chan et al. (2012).
Supplemental Material
sj-pdf-1-pao-10.1177_10591478241309662 - Supplemental material for Translating Empirical State-Dependent Service Times Into Queueing Models
Supplemental material, sj-pdf-1-pao-10.1177_10591478241309662 for Translating Empirical State-Dependent Service Times Into Queueing Models by Likang Ding, Bora Kolfal and Armann Ingolfsson in Production and Operations Management
Footnotes
Acknowledgments
The authors thank the anonymous reviewers,an associate editor,and the department editor for their valuable suggestions. The authors thank Jingqi Wang and Yong-Pin Zhou for their generous sharing of regression results. The authors are grateful to Kenneth Schultz,Mohamad Soltani,and Jing Dong for their comments on earlier versions of this research.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by the Natural Sciences and Engineering Research Council of Canada (funding reference number RGPIN-2019-04355) and of the NOVA Management of Technology Endowment (project ID: RES0062478) from the University of Alberta.
ORCID iD
Likang Ding
Supplemental Material
Supplemental material for this article is available online ( ).
How to cite this article
Ding L,Kolfal B and Ingolfsson A (2024) Translating Empirical State-Dependent Service Times Into Queueing Models. Production and Operations Management 34(7): 2015–2031.
BaronOEconomouAManouA (2018) The state-dependent queue with orbit. Queueing Systems90(1): 89–123.
3.
BattRJTerwieschC (2017) Early task initiation and other load-adaptive mechanisms in the emergency department. Management Science63(11): 3531–3551.
4.
BekkerRBorstSC (2006) Optimal admission control in queues with workload-dependent service rates. Probability in the Engineering and Informational Sciences20(4): 543–570.
5.
BekkerRBorstSCBoxmaOJ, et al. (2004) Queues with workload-dependent arrival and service rates. Queueing Systems46(3): 537–556.
6.
BekkerRBoxmaOJ (2007) An queue with adaptable service speed. Stochastic Models23(3): 373–396.
7.
Berry JaekerJATuckerA (2017) Past the point of speeding up: The negative effects of workload saturation on efficiency and patient severity. Management Science63(4): 1042–1062.
8.
ChanCWFariasVFBambosN, et al. (2012) Optimizing intensive care unit discharge decisions with patient readmissions. Operations Research60(6): 1323–1341.
9.
ChanCWFariasVFEscobarG (2017) The impact of delays on service times in the intensive care unit. Management Science63(7): 2049–2072.
10.
ChanCWYom-TovGEscobarG (2014) When to use speedup: An examination of service systems with returns. Operations Research62(2): 462–482.
11.
ChoDBretthauerKCattaniK, et al. (2019) Behavior aware service staffing. Production and Operations Management28(5): 1285–1304.
12.
ConwayRWMaxwellWL (1962) A queuing model with state dependent service rates. Journal of Industrial Engineering12(2): 132–136.
13.
D’AuriaBAdanIJBekkerR, et al. (2022) An queue with queueing-time dependent service rates. European Journal of Operational Research299(2): 566–579.
14.
DelasayMIngolfssonAKolfalB (2019) Load effect on service times. European Journal of Operational Research279(3): 673–686.
15.
DeoSJainA (2019) Slow first, fast later: Temporal speed-up in service episodes of finite duration. Production and Operations Management28(5): 1061–1081.
16.
DingHTusheSKcDS, et al. (2024) Frontiers in operations: Valuing nursing productivity in emergency departments. Manufacturing & Service Operations Management26(4): 1323–1337.
17.
DoHTShunkoMLucasMT, et al. (2018) Impact of behavioral factors on performance of multi-server queueing systems. Production and Operations Management27(8): 1553–1573.
18.
DongJFeldmanPYom-TovGB (2015) Service systems with slowdowns: Potential failures and proposed solutions. Operations Research63(2): 305–324.
19.
EfronBTibshiraniRJ (1994) An Introduction to the Bootstrap. 1st ed. Boca Raton, FL: CRC Press.
20.
FisherM (2007) Strengthening the empirical base of operations management. Manufacturing & Service Operations Management9(4): 368–382.
21.
FisherMOlivaresMStaatsBR (2020) Why empirical research is good for operations management, and what is good empirical operations management?Manufacturing & Service Operations Management22(1): 170–178.
22.
GebhardRF (1967) A queuing process with bilevel hysteretic service-rate control. Naval Research Logistics Quarterly14(1): 55–67.
23.
GeorgeJMHarrisonJM (2001) Dynamic control of a queue with adjustable service rate. Operations Research49(5): 720–731.
24.
GrayWJWangPScottM (1992) An -type queuing model with service times depending on queue length. Applied Mathematical Modelling16(12): 652–658.
25.
GuptaUCSrinivasa RaoTSS (1998) On the analysis of single server finite queue with state dependent arrival and service processes: . OR Spektrum20(2): 83–89.
26.
HadidiNConollyBW (1969) On the improvement of the operational characteristics of single-server queues by the use of a queue-length-dependent service mechanism. Journal of the Royal Statistical Society: Series C (Applied Statistics)18(3): 229–240.
27.
HarrisCM (1967) Queues with state-dependent stochastic service rates. Operations Research15(1): 117–130.
28.
HillierFSConwayRWMaxwellWL (1964) A multiple server queueing model with state dependent service rate. Journal of Industrial Engineering15(3): 153–157.
KaoEPC (1996) An Introduction to Stochastic Processes. New York: Courier Dover Publications.
31.
KcDSTerwieschC (2009) Impact of workload on service time and patient safety: An econometric analysis of hospital operations. Management Science55(9): 1486–1498.
32.
KcDSTerwieschC (2012) An econometric analysis of patient flows in the cardiac intensive care unit. Manufacturing & Service Operations Management14(1): 50–65.
33.
KulkarniVG (2010) Modeling and Analysis of Stochastic Systems. 2nd ed. Boca Raton, FL: CRC Press.
34.
LatoucheGRamaswamiV (1999) Introduction to Matrix Analytic Methods in Stochastic Modeling. Philadelphia, PA: SIAM.
35.
MandelbaumAPatsG (1995) State-dependent queues: Approximations and applications. Stochastic Networks71: 239–282.
36.
OzBAdanIHavivM (2017) A rate balance principle and its application to queueing models. Queueing Systems87(1–2): 95–111.
37.
RastpourAIngolfssonAKolfalB (2020) Modeling yellow and red alert durations for ambulance systems. Production and Operations Management29(8): 1972–1991.
38.
SakumaYBoxmaOPhung-DucT (2021) An queue with workload-dependent processing speed and vacations. Queueing Systems98: 373–405.
39.
SchälM (1971) The analysis of queues with state-dependent parameters by Markov renewal processes. Advances in Applied Probability3(1): 155–175.
40.
ShanthikumarJ (1979) On a single-server queue with state-dependent service. Naval Research Logistics Quarterly26(2): 305–309.
41.
ShiDGuoJLiuL (2005) On the SPH-distribution class. Acta Mathematica Scientia25(2): 201–214.
42.
SondermanD (1979) Comparing multi-server queues with finite waiting rooms, II: Different numbers of servers. Advances in Applied Probability11(2): 448–455.
43.
TanTNetessineS (2014) When does the devil make work? An empirical study of the impact of workload on worker productivity. Management Science60(6): 1574–1593.
44.
WangJZhouY (2018) Impact of queue configuration on service time: Evidence from a supermarket. Management Science64(7): 3055–3075.
45.
ZachariasCPinedoM (2017) Managing customer arrivals in service systems with multiple identical servers. Manufacturing & Service Operations Management19(4): 639–656.
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.