Sage Journals: Discover world-class research

Abstract

Field experimentation has been widely adopted as an optimization technique in product design and marketing in several industries. Companies have successfully used field experimentation to reduce costs, increase revenues, and maintain an edge in their customer experience in highly competitive environments. However, in certain quantitative applications, such as revenue management in hospitality, to the authors’ knowledge, there is little publicly documented work on experimentation, and its use remains the privilege of big corporate brands with a small market share. This article discusses the likely causes of the sparse adoption of field experimentation for revenue management in hospitality. It also outlines opportunities that field experimentation can bring to accommodation managers and describes specific types of experimental designs that can help exploit those opportunities. By explicitly addressing the complexities of revenue management, this article aims to start a conversation about experimentation in hospitality that should benefit the industry as a whole.

Keywords

revenue management operations field experiments hospitality

Introduction

Statistical field experimentation, also referred to as control/treatment testing, A/B testing, and split testing, has been widely adopted as a development tool in product design for web-based software consumer products across industries (Kohavi & Longbotham, 2017). As the golden standard for causal inference, it allows to accurately assess the quality of various business decisions. In the context of this article, we will be mostly interested in decisions regarding pricing of products sold by hotels. Using different strategies to make those decisions can lead to differences in final revenues ranging from a few percent to over 20%, as demonstrated through field experimentation and other analytical techniques. Successful examples include the Intercontinental hotels group (Koushik et al., 2012) and the Carlson Rezidor hotel group (Pekgün et al., 2013). The primary goal of this article is to provide a starting point for revenue managers and data scientists who are interested in using field experimentation to test new strategies to increase revenues. To do so, it identifies several opportunities that field experimentation can offer to revenue managers. It also proposes three concrete experimental designs that can be used to successfully exploit those opportunities, and it highlights relevant empirical methods that can be used to extract valuable information from field experiments.

Revenue management is a set of optimization techniques and heuristics developed originally in the 1970s and 1980s to help increase revenues (and ultimately profits) through appropriate allocation and pricing of seats in airlines (Belobaba, 1987). Since then, it has evolved to adapt to a multitude of complex commercial needs (product bundling and unbundling, personalized promotions, lifetime customer value estimates, etc.). The techniques and models have been applied across the perishable industry spectrum, including hospitality (Anderson & Xie, 2010; Hormby et al., 2010), rental cars (Oliveira et al., 2017), trucking, rail cargo, and airline cargo with various degrees of market penetration.

As adoption and complexity have increased, so have alternative solutions to the original optimization problem, whether considering more realistic formulations of the problem (Belobaba & Weatherford, 1996) or using modern techniques in decision making (Rana & Oliveira, 2014). In the hotel industry, such complexity materializes at many levels, from a diverse distribution enabled by online travel agents and social networks to customer acquisition journeys that encompass several platforms. This new landscape ultimately challenges many of the assumptions underlying the traditional models of revenue management. These changes in the practice of revenue management are consistent with other societal changes driven by technological advances and by the increased availability of data. In this evolving landscape, however, there has been limited focus in the literature on evaluating the merits of different implemented approaches in realistic settings, with some exceptions (Cohen et al., 2019; Koushik et al., 2012; Pekgün et al., 2013). In the hospitality industry, the authors could not find any public article describing in detail a field experiment to decide among several possible revenue strategies.¹

Technology companies operating within hospitality such as Booking.com, Expedia, and Airbnb, which provide support to hotel owner-operators primarily in the form of cloud-based software products, have broadly adopted online field experimentation. Airbnb not only uses experimentation in product design and marketing, but it also provides pricing suggestions for their hosts (Srinivasan, 2018). Several versions of the pricing algorithm may have been tested via online field experiments, even though limited details are available about these experiments (Ye et al., 2018). Similarly, Booking.com has publicly advertised their experimentation culture in product design (Kaufman et al., 2017). Expedia has been less prolific about publicizing their experimentation efforts, but in 2019, they described some work on multiarmed bandit optimization (Parfenov, 2019). However, none of these public results refer to pricing.

Owner-operators are even less prolific about any attempts at incorporating field experiments in their decision making. Digital marketing products that target the hotel industry offer randomization and significance analysis as part of their solutions. However, there is no public figures on the adoption levels of such tools in hotel marketing. In addition, to the authors’ knowledge, it is common for large hotel chains to run field experiments as part of major revenue management systems upgrades (as reported in Koushik et al., 2012). This conveys a desire among owner-operators’ business leaders to be part of this technological shift, but also suggests a low level of adoption.

Opportunities and Challenges for Field Experiments in Hotel Revenue Strategies

Revenue managers face a wealth of tactical and strategic decisions every day. Despite the efforts to make those decisions in a data-driven way, the presence of many confounding factors makes this a challenging task. In this section, we describe some pricing processes that could be measured and improved through the use of experimentation. We also discuss the practical challenges currently hindering the implementation of experimentation programs.

Opportunities

Ancillary pricing

Should breakfast be sold at $5 or $6? For a high-fixed-costs operation such as a hotel this can be an impactful question to answer, representing differences in profits above 1%. The same question can be asked about cancelation insurance, high-speed internet, phone charging cables, gym access, or any other product that the hotel may sell (often at a high margin) as part of its accommodation offering, or room attributes (ocean view, bed size, etc.) (Masiero et al., 2015). As the price sensitivity of such items is likely to vary slowly over time, field experiments are ideally suited to help optimizing their prices.

Distribution channel optimization

Should a hotel pay an extra 2% to an online travel agency (OTA) to be listed in the first position when people search for hotels in the area? This is a common strategic question for hotels, which can easily receive 30% of their bookings through OTAs and pay them commissions of 10% to 20%. In terms of profits, this can also amount to several percentage points. The answer might differ for different hotels and different seasons, depending on how differentiated the hotel is, and how customers reach the hotel in their experience through the OTA. With the right bookkeeping and metrics, this question can be successfully answered with an experiment or a set of experiments that are run on a regular basis.

Promotion success

Was a promotion successful at increasing the hotel’s revenue? Generally, the goal of a promotion will be to stimulate the number of bookings, up to a level that is considered appropriate by the revenue management team. The promotion will be considered a success if it managed to obtain a specific target in terms of bookings. The impact in profits can be quite significant if a smaller discount can be offered to drive a similar level of volume. Experimental setups are more challenging in this context, due to price parity clauses and the difficulty of comparing two different promotions on equal footing. However, with a careful design, for certain hotels this question can also be answered by using a field experiment.

Choice of revenue management system

Which software provider should a hotel use for their revenue management system? This is a strategic decision that many hotel managers face. Case studies from software providers often claim a 5% to 20% revenue increase. However, just because a revenue management system performs well on average, it does not necessarily mean that it will be successful for all hotels. A 1% difference in performance between two revenue management systems can cover for the full cost of the system itself. Strategically, it is quite important to make this choice (or the choice to not use a system at all) based on the type of causal inference that an experiment provides, rather than considerations such as system cost, which might be irrelevant when considering the final profits.

Practical Challenges

While the potential economic value of understanding strategic pricing decisions is impactful, as suggested in previous paragraphs, there are various scientific, technological, and organizational challenges that hinder the adoption of field experimentation. The solutions to these challenges involve several stakeholders, as discussed below.

Scientific challenges

There is extensive literature on controlled experiment design and analysis. This follows from the fact that this topic is highly relevant to real-world applications. At the same time, each application needs to perform customized modifications to the experimental design to guarantee that the assumptions underlying the statistical analysis are satisfied. One of these key assumptions is the Stable Unit Treatment Value Assumption (SUTVA). Simply put, this assumption means that the treatment (i.e., the tested revenue strategy we are aiming to evaluate in the field experiment) should not affect the control units. Stochastic variations in performance due to unpredictable events (such as increased demand due to last-minute canceled flights, or cancelations due to travel plan changes) do not violate the SUTVA, but seasonality (a reference effect) does.

To use a less obvious example, we need to ensure that changing the revenue strategy of a specific room type in a given property will not affect the performance of rooms in the control group. Specifically, if two hotels of the same brand with similar offerings are in the same geographical area, an improvement in one of the hotel’s performance may be driven by a loss in the other hotel (i.e., cannibalization). A similar type of effects that violate the SUTVA and are of special importance given the competitive landscape of hotels are network effects. In a network, what happens to one node (hotel) can impact the other nodes (e.g., competitors). In the extreme scenario where a competitor’s strategy is to perfectly mimic the strategy of the treated hotel, the treatment effect might be totally washed out.

Fortunately, field experiments have been successfully run in networks and non-stationary environments in the past. In the “Experimental Designs” section, we outline three experimental designs that can help overcome some of these challenges. However, running a successful field experiment always requires verifying that the underlying assumptions are met, which increases friction for adoption.

Technological challenges

At the time of writing, no commercial experimentation system that automates or assists the design, implementation, bookkeeping, and analysis of field experiments for hotel revenue management exists. Given the large overhead in data preparation and analysis that any data science organization faces, it is hard to see a wide adoption of experimentation practices without such a system. Certain experimental designs are better suited for manual setups, as we discuss in the “Experimental Designs” section, but the burden will often be too high for small organizations.

The technological challenges do not stop there. The hotel technology stack is quite diverse, and so is the customer journey through different acquisition channels. For example, a user may consult both the hotel website and an OTA, as well as consider product bundles before making the booking decision. The experimentation system has no way of knowing the different customer journeys, and this might violate some of the experiment assumptions. Fortunately, careful experimental design can help mitigate these issues, but some degree of user control is necessary (e.g., to minimize the effect of bookings through opaque distribution channels while an experiment is running).

Organizational challenges

A culture of experimentation has been touted as being one in which decision making is democratized, with less room for top-down decision making. In addition, in an experiment-first culture, small short-term losses that may arise from running unsuccessful experiments are a necessary step for long-term gains. These are two fundamental shifts that need to happen for an organization to use experiments in their decision making, and they can feel quite foreign to conservative managers.

A revenue management team focused on experimentation does not directly make pricing decisions. Instead, they decide which experiments to run by enabling a data science team to conduct these experiments. The results from the experiments are then used to make the final decision, either to implement the tested revenue strategy, or to stop pursuing the strategy, or to follow up by running a new set of refined experiments. This is a significant structural change at the core of the organization, with the appearance of a data science unit and a change in the role of the revenue manager, who would need to be familiar with experimental statistics.

Another important organizational challenge to experimentation is that the hotel brand or management group might not own certain properties (which are franchised). This bears several consequences. Most importantly, the hotel owners may not agree to participate in an experiment involving many hotels in the brand, or to implement the learnings that result from the experiment. In addition, a brand operating under a franchise model might not benefit from revenue increases directly, and thus has a weaker incentive to optimize pricing strategies for certain properties.

Given the practical challenges outlined in this section, it is hard to see field experimentation taking over all decision making in hotel revenue management. Even for all the opportunities outlined earlier in this section, the cost of setting up an experiment, learning about the techniques and executing them, will be often too high for most hotel operators. The focus, thus, should be first on high-value strategic decisions, such as the choice of a revenue management system. Only once expertise is built through the execution of such tasks, and the cost of running experiments is reduced, one can start considering some of the other opportunities described earlier.

Experimental Designs

Many of the challenges described in the previous section can be addressed through a careful experimental design. In the “Property Splits,” “Alternating Periods,” and “High-Frequency Price Updates” sections, we outline three types of experimental designs that can help overcome many of the challenges discussed previously. In the “Empirical Techniques” section, we highlight well-established empirical techniques that can be used to interpret the results of these experiments and establish causal claims.

Property Splits

Running an experiment on multiple properties, some of which receive the treatment (treated group) and some of which do not (control group), is perhaps the simplest conceptual design. Unfortunately, the simplest version of this design (in which half the properties receive the treatment and the other half do not) often is not appropriate to handle real-world complexity. The daily revenues of different properties and their composition are hardly ever comparable. A common way of handling this diversity is the use of stratification. In the current context, one might group properties according to their revenues (or by property type or amenities) to create homogeneous sets of properties and run experiments within each set. These combined experiments will have higher power than the original experiment and will not be biased. If the experiment treats stay nights which are quite close to the starting date of the experiment, the remaining inventory will vary dramatically across stay nights and across properties (some properties’ booking periods are much longer than others). Remaining inventory will often be an important covariate to control via stratification.

A naive random split of properties is also subject to suffer from network effects. Network effects can be minimized by performing a cluster-based randomization, rather than a simple binomial randomization. The difference between the two types of randomization is illustrated in Figure 1. The unit of inference of a cluster-based randomization experiment can either be the cluster itself (e.g., the cluster-level daily revenue) or the individual elements in the cluster (e.g., the hotel daily revenue), with a trade-off between simplicity and experimental power being made in the choice. Clustering methods can use an ad hoc definition of clusters based on intuitive features (e.g., type of hotel, city), or they can be designed based on correlation coefficients of historical data, clustering techniques (e.g., K-means), or propensity score matching (Rosenbaum & Rubin, 1985). Stratification and cluster-based randomization can be combined, for example, to control for the impact of different revenue managers operating several hotels within different geographies.

Figure 1.

A Binomial Randomization (Left Panel) and a Cluster-Based Randomization (Right Panel). The Solid Lines Represent Strong Connections (Such as Shared Customers or the Same Revenue Managers), Whereas the Dotted Lines Represent Weak Connections That Can Carry Small Network Effects.

These procedures may not address biases that arise from reference effects. Temporal randomization can add an extra level of protection against biased sampling. For example, we can run an experiment that swaps the treatment and control properties in the middle of the experiment. This type of procedure can help handling reference effects such as seasonality, as we will discuss further in the next section.

These techniques help address the scientific and technological challenges discussed in the “Opportunities and Challenges for Field Experiments in Hotel Revenue Strategies” section. However, there are many things that could go wrong if the experimenter has not taken into consideration all the relevant variables. For this reason, it is always advisable to simultaneously run a closure test (or A/A test) as part of the experiment. For hotel rooms, which are highly differentiable over time, a good way of running a closure test is by dividing the properties into two groups. Stratification, clustering, and any other experimental design features are applied to both groups. In one of the groups, we run the controlled experiment, while in the other group no hotel is treated. The latter group is used as the A/A test, and no difference should be observed in the “treated” hotels in that group within the confidence intervals.

This design can be implemented manually most of the time, so that no complex experimentation system is needed to execute the experiment. However, its granularity is typically coarse. Accordingly, it can answer the question “what is the best revenue management system for a hotel group?,” but not necessarily “what is the best revenue management system for a specific hotel?” This is not always a weakness, because many questions about revenue strategy need to be asked at the hotel group level, but it does not allow for smaller or highly heterogeneous hotel groups to benefit from experimentation. For these types of applications, we next present two other types of experimental designs.

Alternating Periods

Not all hotel operators have enough properties to run a meaningful field experiment using the design presented above. Even for large operators, it might be interesting (and cost effective) to infer the effect of an intervention by using a relatively small set of hotels. The design described in this section expands on the temporal randomization used in the previous section to address reference effects. In this case, the temporal split doubles as a way of reducing reference effects and creating control and treatment samples. As only a few changes to the revenue strategy are required, the execution of this experimental design can also be done manually for certain applications, even though the execution overhead is significantly heavier than the previous experimental setup.

Figure 2 shows an example of how this temporal split could be implemented. As in the “Property Splits” section, this temporal split is along the dimension of a stay night. This means that each stay night will use one revenue strategy for all its bookings. Depending on the length of the booking window, the experiment start date may need to be several weeks before any stay night is treated so that the vast majority of bookings are received when the treatment is applied. The techniques introduced in the previous section can help handle the heterogeneity arising from different stay nights having different levels of remaining inventories at the beginning of the experiment.

Figure 2.

Experiment With Temporal Split Using Stay Nights. The Boxes Illustrate Individual Stay Nights. Stratification for Each Day of the Week is Included, so That a Treated Monday is Compared to a Control Monday, a Treated Tuesday is Compared to a Control Tuesday, and so on.

Stratification by room types may be highly desirable, because different room types in a hotel will often be characterized by different product offerings. The treatment and control periods may, however, need to be synchronized between room types to avoid the risk of cannibalization between them.

The above architecture is careful to stagger treatment and control weeks to avoid reference effects that a monotonically varying demand over a period of a month could cause. If the experiment is run for a longer period, then the treatment and control weeks can be randomized and seasonal effects will disappear on average. Other reference effects might be relevant in this context. For example, a national holiday falling in a treatment period can strongly bias the results of the experiment. One can remove the days affected by the holiday from the analysis, but any such decision needs to happen at the time of the experimental design with human input or using anomaly detection methods on historical data.

Another common reference effect that can affect this experimental design is the business mix. One can use empirical techniques that can alleviate this concern when analyzing the results (e.g., building a synthetic control sample or using propensity score reweighing).

The example of Figure 2 uses treatment/control windows of 2 weeks. A different time window can be chosen, but short time windows are at risk from contamination between treatment and control regions through long stays, while long time windows are more sensitive to reference effects.

Performing a closure test in this experimental design is extremely important, because the temporal split can create heterogeneous groups, and covariates could have a strong impact on the experiment results. A good way to control for temporal heterogeneity is to build control samples from some room types or a similar property (i.e., having a multi-control design). An equivalent experimental setup that does not treat any night is then applied on these room types or properties, for the same nights that the experiment runs (as long as the risk of cannibalization is low).

High-Frequency Price Updates

The main challenge the experimental design described in the previous section faces is temporal heterogeneity. Hotel rooms are highly differentiable products, and the same room sold for one night can have a very different intrinsic value from the same room sold for a different night.

A different way of segmenting customers in an unbiased way is to split the inventory according to the two revenue strategies over time. If a hotel is selling 100 rooms for a specific night, 50 of those rooms can be treated, while the other 50 rooms are used for control. To be able to sell both the treatment and control rooms, the time at which each strategy offers its rooms can be randomized. The first strategy might be active for X hours, whereas the second strategy might be deployed for the next X hours. The value of X depends on the context. This design is illustrated in Figure 3. Assuming that customers make their buying decision on one booking period, this strategy is effective at randomizing the potential customers that would want to buy a room for a specific night. Very often, the variance in revenue for a single night (which is the unit of measurement in this case) is much smaller when compared to multiple nights. Consequently, this experimental design can provide a significant increase in the power of the experiment, which might need to be run only for a few stay nights.

Figure 3.

An Example Illustrating a Random Split of Sale Periods Between Two Revenue Strategies (Treatment and Control). In This Example, There are 5 Periods Per Day (288 Min Long) and All the Bookings are Received in the 3 Days Before the Stay Night. Illustrative Booking Curves Show the Number of Sales Associated With Each Strategy Increasing as the Stay Night Approaches (Solid Line for Treatment and Dashed Line for Control). Selling the Inventory From the Treatment Strategy Is Not Possible When the Control Strategy Is Active and Vice Versa, as Evidenced by the Flat Areas on the Booking Curves.

The randomization of the periods in which each strategy is active is an effective way of handling potential biases arising from business mix and intra-day booking patterns. A natural way to build additional control samples to perform a closure test for this design is to split the inventory into three groups: one sample for treatment and two samples for control. This can be combined with ideas from the “Alternating Periods” section in cases where the inventory size does not allow creating three samples.

This design is not suited for all purposes. For example, it is hard to see how any question related to distribution channel optimization could be answered with this design, given that OTA contracts currently do not support dynamic changes in commissions and visibility. This design can also be suboptimal for questions which are concerned about an inconsistent customer acquisition journey, because that breaks the assumption that a customer makes the buying decision within one booking period.

Other technological and commercial challenges may hinder the implementation of this design in practical situations. The execution of this experimental design requires several frequent changes in the revenue strategy, making a manual implementation impractical, and thus an experimentation system is required. Even with an experimentation system, technological challenges may exist in other parts of the hotel technology stack (such as prices not being updated in the OTAs fast enough). Commercial challenges (such as OTAs offering automatic rebookings) may also limit the applicability of this design.

Empirical Techniques

Designing a field experiment and extracting information from the data after running the experiment require dedicated analysis methods. Field experimentation has been developed across several academic disciplines, particularly in econometrics (Angrist & Pischke, 2008), computer science (Wohlin et al., 2012), political science (Morton & Williams, 2010), and medical sciences (Matthews, 2006), so it is hard to find a single book or paper that synthesizes all the relevant techniques. This section does not aim to serve this purpose either. Instead, we introduce some of the empirical techniques that can be most relevant in a hospitality context. We cover techniques that are useful for pre-experimental analysis, identification of treatment effects, and post-experimental verifications that experimental assumptions held during the course of the experiment.

Most generally, the objectives of the analysis of a field experiment will be:

Determining the main confounding factors,

Choosing the experimental design to randomize over those confounding factors,

For confounding factors that cannot be controlled in the experimental design, establishing the methodology to control for them,

Verifying that experimental assumptions were not violated during the experiment,

Extracting (or identifying) the average treatment effect, or in certain cases, the distribution of the treatment effect over some dimensions of interest.

We have already implicitly identified some of the relevant confounding factors that can affect an experiment throughout this article. Discussions with the business operations units can be helpful in identifying which of these factors may be relevant to investigate for a specific experiment. A correlation analysis can be helpful in unveiling potential confounders. A cross-correlation analysis between potential confounders can also be used to understand whether some of them are superfluous. A principal component analysis is a powerful tool that can help identify the most relevant confounders. The experimental designs presented in the “Property Splits,” “Alternating Periods,” and “High-Frequency Price Updates” sections address some of these likely confounders and involve popular techniques to handle confounding factors in the design phase.

Not all confounding factors can be managed through an experimental design. In this case, we may know before the experiment that there will be, for example, an imbalance in the business mix between treatment and control. One technique that can help manage this situation post-experiment is by creating a synthetic control (Abadie et al., 2010). This procedure is feasible for a small number of confounding factors, but becomes challenging when the number of confounding factors is large.

The verification process of an experiment will at least involve making sure that the assumptions of the experimental setup hold during the experiment period. After running the experiment, one can use placebo tests to make sure that there is no implicit confounders that affect the results. In the design of the “Property Splits” section, one can perform a cross-sectional placebo test by randomly splitting the properties into control and treatment (alternatively, one can randomly split only the control properties). One can repeat this process multiple times, creating several unique synthetic datasets, that can be used to verify the absence of biases when the experiment was running, and that the true estimate of the experiment is not an artifact of cross-correlation. A similar process across time with historical data can be used to perform an inter-temporal placebo test.

Typically, the data from a field experiment will be used to first estimate the average treatment effect. One can compare the averages by running a t-test (or an analysis of variance [ANOVA] test for the case with more than two populations). More generally, the process of estimating the average treatment effect is often based on a regression specification, as follows:

$y_{i} = α + β {Treated}_{i} + K {Controls}_{i} + ϵ_{i},$ (1)

where y_i is the performance metric of interest (e.g., revenue) for observation i (e.g., day-property), α is an intercept parameter, Treated_i is a binary variable to indicate whether observation i is in the treated or control condition, Controls_i is a vector of control variables (e.g., confounding factors), and ϵ_i is an i.i.d. error term assumed to follow a normal distribution. The average treatment effect is captured by the coefficient β. For robustness purposes, it is typical to estimate Equation 1 both with and without controls. Using a regression specification allows us to incorporate several modeling assumptions, such as time fixed effects (to capture seasonality), property fixed effects, and explicitly controlling for confounding factors.

The unit of analysis is encapsulated in the definition of y_i. For experiments using the designs from the “Alternating Periods” section or the “High-Frequency Price Updates” section, a natural unit of analysis is the stay night or the room-night. For experiments using the design from the “Property Splits” section, one might use the property (or multiple properties) as the unit of analysis. A second common identification strategy is based on a difference-in-differences specification, but we do not discuss this in any detail in this article. For impactful experiments, it is common to use several empirical techniques, as a way of gaining confidence in the quality of the analysis and in the robustness of the results.

It became common in the field experiment literature to go beyond the average treatment effect, by investigating heterogeneous treatment effects; either via the distribution of the treatment effect across different dimensions or by investigating the factors that contribute to the heterogeneity of the treatment effect. Estimating heterogeneous treatment effects could be of interest to large hotel groups that run experiments across several properties. In this case, one may infer that a specific treatment is most effective for certain set of properties (even if the experiment was not designed to differentiate between property characteristics). Choosing the dimensions for the heterogeneous treatment effect depends on the context and often relies on domain expertise.

Summary and Implications

Over the last two decades, field experimentation has expanded from the academic world into the commercial spotlight by driving a wide range of business decisions. This has led software companies to run concurrently thousands of experiments with multiple objectives. Despite its strong focus on data and algorithms, revenue management in the hospitality industry has largely remained detached from this trend: the use of field experimentation is mainly constrained to big hospitality brands, counts a handful of business cases, and has very little public documentation (in academic publications or elsewhere).

The objectives of this article were to motivate the use of field experimentation in hotel revenue management and to provide a starting point for revenue managers or data scientists who want to use field experimentation to increase revenues. To achieve the first objective, we discussed the economic opportunities of field experimentation in several important revenue decisions faced by hotel managers. We then identified the relevant challenges that the industry needs to overcome for moving toward an experimentation-first culture. Although not negligible, these challenges are not insurmountable either, especially in light of the great economic opportunity.

To achieve our second objective, three experimental designs, which are particularly well suited to control for common confounding factors, have been introduced. These three designs balance several trade-offs between the experiment’s granularity (entire hotel group, individual hotel, or small number of room nights) and the implementation feasibility (manual, requiring a simple experimentation system, requiring a sophisticated system). Table 1 summarizes some of the advantages and limitations of each design, making their complementarity explicit. Statistical methods that can be used to evaluate the results from experiments have also been discussed. We note that these designs and empirical techniques are also applicable to other industries in which customers cannot be randomly split in different groups for experimentation, such as brick-and-mortar retail, car rentals, and airlines. We hope that by making these designs explicit and publicly available, the barrier to entry will be reduced for hotel operators interested in field experimentation. Finally, we also hope that this article will stimulate further publications detailing the setups and results of field experiments in the hospitality industry.

Table 1.

Advantages and Limitations of the Three Experimental Designs Introduced in This Paper.

Design	Advantages	Limitations
Property splits	• Potentially short experimental duration.• Straightforward to execute (can be done manually).	• Need several comparable properties.• Cannot be applied by small owners with a single or few properties.
Alternating periods	• Available to independent properties or property groups.• No need to find similar properties to perform the experiment.• Can potentially be executed manually.	• Need external controls for highly variable seasonality patterns.• Experiment needs to run for longer.• Cumbersome implementation without the aid of technological solutions.
High-frequency price updates	• Can answer questions that are relevant at the level of a few stay nights (e.g., should a promotion offer 10% or 20% discount).• Available to independent properties or property groups.	• Connection speed to distribution channels needs to be known.• Customers can be alienated if they check prices several times before making decisions.• Not resilient to cancelations and price-matching policies.• Require a sophisticated experimentation system.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,or publication of this article.

Funding

The author(s) received no financial support for the research,authorship,or publication of this article.

ORCID iD

Maxime C. Cohen

Author Biographies

David Lopez Mateos is Chief Science Officer at Pace,where he leads the data science efforts and,most prominently,the design of their dynamic pricing engine and the data validation procedures and internal tools. David has worked on model building for finance and computer vision problems in an early-stage AI startup in the past. David holds Computer Science and Physics degrees from MIT and did his PhD in experimental Particle Physics at Caltech.

Maxime C. Cohen is a Professor of Retail and Operations Management,co-director of the Retail Innovation Lab,and a Bensadoun Faculty Scholar at McGill University. His core expertise lies at the intersection of data science and operations. He has collaborated with Google,Oracle Retail,IBM Research,Via,and Spotify as well as several retailers and startups. He holds a PhD in Operations Research from MIT and a bachelor and master from the Technion.

Nancy Pyron journey in Revenue Management and Pricing began over 20 years ago spanning diverse industries including hotel,rental car,airline,and retail. Her career includes user training,change management,system design,and system implementation. Implementation design focused on creating long-term strategies to support the needs of business,technical,and analytic teams,while ensuring sustainable and maintainable systems. Nancy received an MS in Operations Research from Stanford and a BS in Applied Mathematics from University of North Texas

References

Abadie

Diamond

Hainmueller

(2010). Synthetic control methods for comparative case studies: Estimating the effect of California’s tobacco control program. Journal of the American Statistical Association, 105(490), 493–505.

Anderson

C. K.

Xie

(2010). Improving hospitality industry sales: Twenty-five years of revenue management. Cornell Hospitality Quarterly, 51(1), 53–67.

Angrist

J. D.

Pischke

J.-S.

(2008). Mostly harmless econometrics: An empiricist’s companion. Princeton University Press.

Belobaba

P. P.

(1987). Air travel demand and airline seat inventory management [PhD thesis]. Massachusetts Institute of Technology.

Belobaba

P. P.

Weatherford

L. R.

(1996). Comparing decision rules that incorporate customer diversion in perishable asset revenue management situations. Decision Sciences, 27(2), 343–363.

Cohen

Jacquillat

Serpa

(2019). A field experiment on airline lead-in fares [Technical report, Working Paper].

Hormby

Morrison

Dave

Meyers

Tenca

(2010). Marriott International increases revenue by implementing a group pricing optimizer. Interfaces, 40(1), 47–57.

Kaufman

R. L.

Pitchforth

Vermeer

(2017). Democratizing online controlled experiments at Booking.com. https://arxiv.org/abs/1710.08217

Kohavi

Longbotham

(2017). Online controlled experiments and A/B testing. Encyclopedia of Machine Learning and Data Mining, 7(8), 922–929.

10.

Koushik

Higbie

J. A.

Eister

(2012). Retail price optimization at Intercontinental Hotels Group. Interfaces, 42(1), 45–57.

11.

Masiero

Heo

C. Y.

Pan

(2015). Determining guests’ willingness to pay for hotel room attributes with a discrete choice model. International Journal of Hospitality Management, 49, 117–124.

12.

Matthews

J. N. S.

(2006). Introduction to randomized controlled clinical trials. CRC Press.

13.

Morton

R. B.

Williams

K. C.

(2010). Experimental political science and the study of causality: From nature to the lab. Cambridge University Press.

14.

Oliveira

B. B.

Carravilla

M. A.

Oliveira

J. F.

(2017). Fleet and revenue management in car rental companies: A literature review and an integrated conceptual framework. Omega, 71, 11–26.

15.

Parfenov

(2019). How we optimized hero images on Hotels.com using multi-armed bandit algorithms. https://medium.com/expedia-group-tech/how-we-optimized-hero-images-on-hotels-com-using-multi-armed-bandit-algorithms-4503c2c32eae

16.

Pekgün

Menich

R. P.

Acharya

Finch

P. G.

Deschamps

Mallery

Sistine

J. V.

Christianson

Fuller

(2013). Carlson Rezidor hotel group maximizes revenue through improved demand management and price optimization. Interfaces, 43(1), 21–36.

17.

Rana

Oliveira

F. S.

(2014). Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning. Omega, 47, 116–126.

18.

Rosenbaum

P. R.

Rubin

D. B.

(1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39(1), 33–38.

19.

Srinivasan

(2018). Learning market dynamics for optimal pricing. https://medium.com/airbnb-engineering/learning-market-dynamics-for-optimal-pricing-97cffbcc53e3

20.

Wohlin

Runeson

Höst

Ohlsson

M. C.

Regnell

Wesslén

(2012). Experimentation in software engineering. Springer.

21.

Qian

Chen

C.-H.

Zhou

De Mars

Yang

Zhang

(2018, August). Customized regression model for Airbnb dynamic pricing [Conference session]. 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, England.