Abstract
Introduction
Statistical field experimentation, also referred to as control/treatment testing, A/B testing, and split testing, has been widely adopted as a development tool in product design for web-based software consumer products across industries (Kohavi & Longbotham, 2017). As the golden standard for causal inference, it allows to accurately assess the quality of various business decisions. In the context of this article, we will be mostly interested in decisions regarding pricing of products sold by hotels. Using different strategies to make those decisions can lead to differences in final revenues ranging from a few percent to over 20%, as demonstrated through field experimentation and other analytical techniques. Successful examples include the Intercontinental hotels group (Koushik et al., 2012) and the Carlson Rezidor hotel group (Pekgün et al., 2013). The primary goal of this article is to provide a starting point for revenue managers and data scientists who are interested in using field experimentation to test new strategies to increase revenues. To do so, it identifies several opportunities that field experimentation can offer to revenue managers. It also proposes three concrete experimental designs that can be used to successfully exploit those opportunities, and it highlights relevant empirical methods that can be used to extract valuable information from field experiments.
Revenue management is a set of optimization techniques and heuristics developed originally in the 1970s and 1980s to help increase revenues (and ultimately profits) through appropriate allocation and pricing of seats in airlines (Belobaba, 1987). Since then, it has evolved to adapt to a multitude of complex commercial needs (product bundling and unbundling, personalized promotions, lifetime customer value estimates, etc.). The techniques and models have been applied across the perishable industry spectrum, including hospitality (Anderson & Xie, 2010; Hormby et al., 2010), rental cars (Oliveira et al., 2017), trucking, rail cargo, and airline cargo with various degrees of market penetration.
As adoption and complexity have increased, so have alternative solutions to the original optimization problem, whether considering more realistic formulations of the problem (Belobaba & Weatherford, 1996) or using modern techniques in decision making (Rana & Oliveira, 2014). In the hotel industry, such complexity materializes at many levels, from a diverse distribution enabled by online travel agents and social networks to customer acquisition journeys that encompass several platforms. This new landscape ultimately challenges many of the assumptions underlying the traditional models of revenue management. These changes in the practice of revenue management are consistent with other societal changes driven by technological advances and by the increased availability of data. In this evolving landscape, however, there has been limited focus in the literature on evaluating the merits of different implemented approaches in realistic settings, with some exceptions (Cohen et al., 2019; Koushik et al., 2012; Pekgün et al., 2013). In the hospitality industry, the authors could not find any public article describing in detail a field experiment to decide among several possible revenue strategies. 1
Technology companies operating within hospitality such as Booking.com, Expedia, and Airbnb, which provide support to hotel owner-operators primarily in the form of cloud-based software products, have broadly adopted online field experimentation. Airbnb not only uses experimentation in product design and marketing, but it also provides pricing suggestions for their hosts (Srinivasan, 2018). Several versions of the pricing algorithm may have been tested via online field experiments, even though limited details are available about these experiments (Ye et al., 2018). Similarly, Booking.com has publicly advertised their experimentation culture in product design (Kaufman et al., 2017). Expedia has been less prolific about publicizing their experimentation efforts, but in 2019, they described some work on multiarmed bandit optimization (Parfenov, 2019). However, none of these public results refer to pricing.
Owner-operators are even less prolific about any attempts at incorporating field experiments in their decision making. Digital marketing products that target the hotel industry offer randomization and significance analysis as part of their solutions. However, there is no public figures on the adoption levels of such tools in hotel marketing. In addition, to the authors’ knowledge, it is common for large hotel chains to run field experiments as part of major revenue management systems upgrades (as reported in Koushik et al., 2012). This conveys a desire among owner-operators’ business leaders to be part of this technological shift, but also suggests a low level of adoption.
Opportunities and Challenges for Field Experiments in Hotel Revenue Strategies
Revenue managers face a wealth of tactical and strategic decisions every day. Despite the efforts to make those decisions in a data-driven way, the presence of many confounding factors makes this a challenging task. In this section, we describe some pricing processes that could be measured and improved through the use of experimentation. We also discuss the practical challenges currently hindering the implementation of experimentation programs.
Opportunities
Ancillary pricing
Should breakfast be sold at $5 or $6? For a high-fixed-costs operation such as a hotel this can be an impactful question to answer, representing differences in profits above 1%. The same question can be asked about cancelation insurance, high-speed internet, phone charging cables, gym access, or any other product that the hotel may sell (often at a high margin) as part of its accommodation offering, or room attributes (ocean view, bed size, etc.) (Masiero et al., 2015). As the price sensitivity of such items is likely to vary slowly over time, field experiments are ideally suited to help optimizing their prices.
Distribution channel optimization
Should a hotel pay an extra 2% to an online travel agency (OTA) to be listed in the first position when people search for hotels in the area? This is a common strategic question for hotels, which can easily receive 30% of their bookings through OTAs and pay them commissions of 10% to 20%. In terms of profits, this can also amount to several percentage points. The answer might differ for different hotels and different seasons, depending on how differentiated the hotel is, and how customers reach the hotel in their experience through the OTA. With the right bookkeeping and metrics, this question can be successfully answered with an experiment or a set of experiments that are run on a regular basis.
Promotion success
Was a promotion successful at increasing the hotel’s revenue? Generally, the goal of a promotion will be to stimulate the number of bookings, up to a level that is considered appropriate by the revenue management team. The promotion will be considered a success if it managed to obtain a specific target in terms of bookings. The impact in profits can be quite significant if a smaller discount can be offered to drive a similar level of volume. Experimental setups are more challenging in this context, due to price parity clauses and the difficulty of comparing two different promotions on equal footing. However, with a careful design, for certain hotels this question can also be answered by using a field experiment.
Choice of revenue management system
Which software provider should a hotel use for their revenue management system? This is a strategic decision that many hotel managers face. Case studies from software providers often claim a 5% to 20% revenue increase. However, just because a revenue management system performs well on average, it does not necessarily mean that it will be successful for all hotels. A 1% difference in performance between two revenue management systems can cover for the full cost of the system itself. Strategically, it is quite important to make this choice (or the choice to not use a system at all) based on the type of causal inference that an experiment provides, rather than considerations such as system cost, which might be irrelevant when considering the final profits.
Practical Challenges
While the potential economic value of understanding strategic pricing decisions is impactful, as suggested in previous paragraphs, there are various scientific, technological, and organizational challenges that hinder the adoption of field experimentation. The solutions to these challenges involve several stakeholders, as discussed below.
Scientific challenges
There is extensive literature on controlled experiment design and analysis. This follows from the fact that this topic is highly relevant to real-world applications. At the same time, each application needs to perform customized modifications to the experimental design to guarantee that the assumptions underlying the statistical analysis are satisfied. One of these key assumptions is the
To use a less obvious example, we need to ensure that changing the revenue strategy of a specific room type in a given property will not affect the performance of rooms in the control group. Specifically, if two hotels of the same brand with similar offerings are in the same geographical area, an improvement in one of the hotel’s performance may be driven by a loss in the other hotel (i.e.,
Fortunately, field experiments have been successfully run in networks and non-stationary environments in the past. In the “Experimental Designs” section, we outline three experimental designs that can help overcome some of these challenges. However, running a successful field experiment always requires verifying that the underlying assumptions are met, which increases friction for adoption.
Technological challenges
At the time of writing, no commercial experimentation system that automates or assists the design, implementation, bookkeeping, and analysis of field experiments for hotel revenue management exists. Given the large overhead in data preparation and analysis that any data science organization faces, it is hard to see a wide adoption of experimentation practices without such a system. Certain experimental designs are better suited for manual setups, as we discuss in the “Experimental Designs” section, but the burden will often be too high for small organizations.
The technological challenges do not stop there. The hotel technology stack is quite diverse, and so is the customer journey through different acquisition channels. For example, a user may consult both the hotel website and an OTA, as well as consider product bundles before making the booking decision. The experimentation system has no way of knowing the different customer journeys, and this might violate some of the experiment assumptions. Fortunately, careful experimental design can help mitigate these issues, but some degree of user control is necessary (e.g., to minimize the effect of bookings through opaque distribution channels while an experiment is running).
Organizational challenges
A culture of experimentation has been touted as being one in which decision making is democratized, with less room for top-down decision making. In addition, in an experiment-first culture, small short-term losses that may arise from running unsuccessful experiments are a necessary step for long-term gains. These are two fundamental shifts that need to happen for an organization to use experiments in their decision making, and they can feel quite foreign to conservative managers.
A revenue management team focused on experimentation does not directly make pricing decisions. Instead, they decide which experiments to run by enabling a data science team to conduct these experiments. The results from the experiments are then used to make the final decision, either to implement the tested revenue strategy, or to stop pursuing the strategy, or to follow up by running a new set of refined experiments. This is a significant structural change at the core of the organization, with the appearance of a data science unit and a change in the role of the revenue manager, who would need to be familiar with experimental statistics.
Another important organizational challenge to experimentation is that the hotel brand or management group might not own certain properties (which are franchised). This bears several consequences. Most importantly, the hotel owners may not agree to participate in an experiment involving many hotels in the brand, or to implement the learnings that result from the experiment. In addition, a brand operating under a franchise model might not benefit from revenue increases directly, and thus has a weaker incentive to optimize pricing strategies for certain properties.
Given the practical challenges outlined in this section, it is hard to see field experimentation taking over all decision making in hotel revenue management. Even for all the opportunities outlined earlier in this section, the cost of setting up an experiment, learning about the techniques and executing them, will be often too high for most hotel operators. The focus, thus, should be first on high-value strategic decisions, such as the choice of a revenue management system. Only once expertise is built through the execution of such tasks, and the cost of running experiments is reduced, one can start considering some of the other opportunities described earlier.
Experimental Designs
Many of the challenges described in the previous section can be addressed through a careful experimental design. In the “Property Splits,” “Alternating Periods,” and “High-Frequency Price Updates” sections, we outline three types of experimental designs that can help overcome many of the challenges discussed previously. In the “Empirical Techniques” section, we highlight well-established empirical techniques that can be used to interpret the results of these experiments and establish causal claims.
Property Splits
Running an experiment on multiple properties, some of which receive the treatment (treated group) and some of which do not (control group), is perhaps the simplest conceptual design. Unfortunately, the simplest version of this design (in which half the properties receive the treatment and the other half do not) often is not appropriate to handle real-world complexity. The daily revenues of different properties and their composition are hardly ever comparable. A common way of handling this diversity is the use of
A naive random split of properties is also subject to suffer from network effects. Network effects can be minimized by performing a cluster-based randomization, rather than a simple binomial randomization. The difference between the two types of randomization is illustrated in Figure 1. The

A Binomial Randomization (Left Panel) and a Cluster-Based Randomization (Right Panel). The Solid Lines Represent Strong Connections (Such as Shared Customers or the Same Revenue Managers), Whereas the Dotted Lines Represent Weak Connections That Can Carry Small Network Effects.
These procedures may not address biases that arise from reference effects. Temporal randomization can add an extra level of protection against biased sampling. For example, we can run an experiment that swaps the treatment and control properties in the middle of the experiment. This type of procedure can help handling reference effects such as seasonality, as we will discuss further in the next section.
These techniques help address the scientific and technological challenges discussed in the “Opportunities and Challenges for Field Experiments in Hotel Revenue Strategies” section. However, there are many things that could go wrong if the experimenter has not taken into consideration all the relevant variables. For this reason, it is always advisable to simultaneously run a closure test (or A/A test) as part of the experiment. For hotel rooms, which are highly differentiable over time, a good way of running a closure test is by dividing the properties into two groups. Stratification, clustering, and any other experimental design features are applied to both groups. In one of the groups, we run the controlled experiment, while in the other group no hotel is treated. The latter group is used as the A/A test, and no difference should be observed in the “treated” hotels in that group within the confidence intervals.
This design can be implemented manually most of the time, so that no complex experimentation system is needed to execute the experiment. However, its granularity is typically coarse. Accordingly, it can answer the question “what is the best revenue management system for a hotel group?,” but not necessarily “what is the best revenue management system for a specific hotel?” This is not always a weakness, because many questions about revenue strategy need to be asked at the hotel group level, but it does not allow for smaller or highly heterogeneous hotel groups to benefit from experimentation. For these types of applications, we next present two other types of experimental designs.
Alternating Periods
Not all hotel operators have enough properties to run a meaningful field experiment using the design presented above. Even for large operators, it might be interesting (and cost effective) to infer the effect of an intervention by using a relatively small set of hotels. The design described in this section expands on the temporal randomization used in the previous section to address reference effects. In this case, the temporal split doubles as a way of reducing reference effects and creating control and treatment samples. As only a few changes to the revenue strategy are required, the execution of this experimental design can also be done manually for certain applications, even though the execution overhead is significantly heavier than the previous experimental setup.
Figure 2 shows an example of how this temporal split could be implemented. As in the “Property Splits” section, this temporal split is along the dimension of a stay night. This means that each stay night will use one revenue strategy for all its bookings. Depending on the length of the booking window, the experiment start date may need to be several weeks before any stay night is treated so that the vast majority of bookings are received when the treatment is applied. The techniques introduced in the previous section can help handle the heterogeneity arising from different stay nights having different levels of remaining inventories at the beginning of the experiment.

Experiment With Temporal Split Using Stay Nights. The Boxes Illustrate Individual Stay Nights. Stratification for Each Day of the Week is Included, so That a Treated Monday is Compared to a Control Monday, a Treated Tuesday is Compared to a Control Tuesday, and so on.
Stratification by room types may be highly desirable, because different room types in a hotel will often be characterized by different product offerings. The treatment and control periods may, however, need to be synchronized between room types to avoid the risk of cannibalization between them.
The above architecture is careful to stagger treatment and control weeks to avoid reference effects that a monotonically varying demand over a period of a month could cause. If the experiment is run for a longer period, then the treatment and control weeks can be randomized and seasonal effects will disappear on average. Other reference effects might be relevant in this context. For example, a national holiday falling in a treatment period can strongly bias the results of the experiment. One can remove the days affected by the holiday from the analysis, but any such decision needs to happen at the time of the experimental design with human input or using anomaly detection methods on historical data.
Another common reference effect that can affect this experimental design is the business mix. One can use empirical techniques that can alleviate this concern when analyzing the results (e.g., building a
The example of Figure 2 uses treatment/control windows of 2 weeks. A different time window can be chosen, but short time windows are at risk from contamination between treatment and control regions through long stays, while long time windows are more sensitive to reference effects.
Performing a closure test in this experimental design is extremely important, because the temporal split can create heterogeneous groups, and covariates could have a strong impact on the experiment results. A good way to control for temporal heterogeneity is to build control samples from some room types or a similar property (i.e., having a
High-Frequency Price Updates
The main challenge the experimental design described in the previous section faces is temporal heterogeneity. Hotel rooms are highly differentiable products, and the same room sold for one night can have a very different intrinsic value from the same room sold for a different night.
A different way of segmenting customers in an unbiased way is to split the inventory according to the two revenue strategies over time. If a hotel is selling 100 rooms for a specific night, 50 of those rooms can be treated, while the other 50 rooms are used for control. To be able to sell both the treatment and control rooms, the time at which each strategy offers its rooms can be randomized. The first strategy might be active for

An Example Illustrating a Random Split of Sale Periods Between Two Revenue Strategies (Treatment and Control). In This Example, There are 5 Periods Per Day (288 Min Long) and All the Bookings are Received in the 3 Days Before the Stay Night. Illustrative
The randomization of the periods in which each strategy is active is an effective way of handling potential biases arising from business mix and intra-day booking patterns. A natural way to build additional control samples to perform a closure test for this design is to split the inventory into three groups: one sample for treatment and two samples for control. This can be combined with ideas from the “Alternating Periods” section in cases where the inventory size does not allow creating three samples.
This design is not suited for all purposes. For example, it is hard to see how any question related to distribution channel optimization could be answered with this design, given that OTA contracts currently do not support dynamic changes in commissions and visibility. This design can also be suboptimal for questions which are concerned about an inconsistent customer acquisition journey, because that breaks the assumption that a customer makes the buying decision within one booking period.
Other technological and commercial challenges may hinder the implementation of this design in practical situations. The execution of this experimental design requires several frequent changes in the revenue strategy, making a manual implementation impractical, and thus an experimentation system is required. Even with an experimentation system, technological challenges may exist in other parts of the hotel technology stack (such as prices not being updated in the OTAs fast enough). Commercial challenges (such as OTAs offering automatic rebookings) may also limit the applicability of this design.
Empirical Techniques
Designing a field experiment and extracting information from the data after running the experiment require dedicated analysis methods. Field experimentation has been developed across several academic disciplines, particularly in econometrics (Angrist & Pischke, 2008), computer science (Wohlin et al., 2012), political science (Morton & Williams, 2010), and medical sciences (Matthews, 2006), so it is hard to find a single book or paper that synthesizes all the relevant techniques. This section does not aim to serve this purpose either. Instead, we introduce some of the empirical techniques that can be most relevant in a hospitality context. We cover techniques that are useful for pre-experimental analysis, identification of treatment effects, and post-experimental verifications that experimental assumptions held during the course of the experiment.
Most generally, the objectives of the analysis of a field experiment will be:
Determining the main confounding factors,
Choosing the experimental design to randomize over those confounding factors,
For confounding factors that cannot be controlled in the experimental design, establishing the methodology to control for them,
Verifying that experimental assumptions were not violated during the experiment,
Extracting (or identifying) the average treatment effect, or in certain cases, the distribution of the treatment effect over some dimensions of interest.
We have already implicitly identified some of the relevant confounding factors that can affect an experiment throughout this article. Discussions with the business operations units can be helpful in identifying which of these factors may be relevant to investigate for a specific experiment. A correlation analysis can be helpful in unveiling potential confounders. A cross-correlation analysis between potential confounders can also be used to understand whether some of them are superfluous. A principal component analysis is a powerful tool that can help identify the most relevant confounders. The experimental designs presented in the “Property Splits,” “Alternating Periods,” and “High-Frequency Price Updates” sections address some of these likely confounders and involve popular techniques to handle confounding factors in the design phase.
Not all confounding factors can be managed through an experimental design. In this case, we may know before the experiment that there will be, for example, an imbalance in the business mix between treatment and control. One technique that can help manage this situation post-experiment is by creating a
The verification process of an experiment will at least involve making sure that the assumptions of the experimental setup hold during the experiment period. After running the experiment, one can use
Typically, the data from a field experiment will be used to first estimate the average treatment effect. One can compare the averages by running a
where
The
It became common in the field experiment literature to go beyond the average treatment effect, by investigating
Summary and Implications
Over the last two decades, field experimentation has expanded from the academic world into the commercial spotlight by driving a wide range of business decisions. This has led software companies to run concurrently thousands of experiments with multiple objectives. Despite its strong focus on data and algorithms, revenue management in the hospitality industry has largely remained detached from this trend: the use of field experimentation is mainly constrained to big hospitality brands, counts a handful of business cases, and has very little public documentation (in academic publications or elsewhere).
The objectives of this article were to motivate the use of field experimentation in hotel revenue management and to provide a starting point for revenue managers or data scientists who want to use field experimentation to increase revenues. To achieve the first objective, we discussed the economic opportunities of field experimentation in several important revenue decisions faced by hotel managers. We then identified the relevant challenges that the industry needs to overcome for moving toward an experimentation-first culture. Although not negligible, these challenges are not insurmountable either, especially in light of the great economic opportunity.
To achieve our second objective, three experimental designs, which are particularly well suited to control for common confounding factors, have been introduced. These three designs balance several trade-offs between the experiment’s granularity (entire hotel group, individual hotel, or small number of room nights) and the implementation feasibility (manual, requiring a simple experimentation system, requiring a sophisticated system). Table 1 summarizes some of the advantages and limitations of each design, making their complementarity explicit. Statistical methods that can be used to evaluate the results from experiments have also been discussed. We note that these designs and empirical techniques are also applicable to other industries in which customers cannot be randomly split in different groups for experimentation, such as brick-and-mortar retail, car rentals, and airlines. We hope that by making these designs explicit and publicly available, the barrier to entry will be reduced for hotel operators interested in field experimentation. Finally, we also hope that this article will stimulate further publications detailing the setups and results of field experiments in the hospitality industry.
Advantages and Limitations of the Three Experimental Designs Introduced in This Paper.
