Abstract
Online reviews generated by consumers are becoming increasingly influential in today’s rapidly changing service business and particularly in the hotel industry (Mathwick and Mosteller 2017; Y. Wang et al. 2020; L. Wu et al. 2016). This is driven by the trends of globalization, aging populations, reduced travel costs, and increased leisure time—the service-intensive hotel industry has witnessed corresponding rising demand (Mohammed, Guillet, and Law 2014). Moreover, due to the evolution of Web 2.0, the number of hotel reviews posted on the websites of online travel agents (OTAs) such as Booking.com and TripAdvisor.com has grown enormously (K. Lu and Elwalda 2016). Recent market research has shown that over 49% of travelers will not choose a hotel without reviewing online comments (World Travel Market 2014), and approximately 35% of consumers modify their schedules after checking posts on OTAs (L. Wu et al. 2016). In addition, online reviews can have an important influence on service organizations’ bottom line. For instance, Mathwick and Mosteller (2017) reported that a 1% improvement in online reputation could result in a 1.4% growth in revenue per hotel room.
Today, online reviews enable consumers to share their experiences and opinions at an unprecedented scale and speed. Such reviews present a substantial amount of rich information on competitors, particularly in the form of service comparisons (W. Wang, Yi, and Dai 2018). Although online reviews have been adopted throughout all areas of both service and manufacturing industries, the information included tends to be incredibly valuable for service industries (Mathwick and Mosteller 2017; L. Wu et al. 2016). Compared to physical products that typically have multiple features that can be easily classified and evaluated, the measurements that constitute “excellent” or “terrible” services tend to be complicated to objectively identify and define (Mankad et al. 2016). As a result, the subjective customer opinions that are embedded in online reviews become much more informative by comparison. Notably, the use of online reviews works for all service sectors, and managers today need to monitor and analyze both negative and positive online reviews in order to track the products, services, promotions, and sales offered by their competitors (Jin, Ji, and Gu 2016). Pelsmacker, Tilburg, and Holthof (2018) highlight that the volume and valence of online reviews reflect the competitive marketing strategies of service providers and can have an effect on their market performance. Therefore, it is of great importance to develop an approach to support the analysis of the competitiveness of a service provider and the identification of its key competitors by using online reviews.
A comprehensive literature review shows that research bridging online reviews and competitor identification in service research is in its infancy, as there is a lack of operational approaches to extend the scope of either area. On the one hand, prior studies have revealed the use and effects of online reviews in various fields, such as marketing (K. Lu and Elwalda 2016; Pelsmacker, Tilburg, and Holthof 2018; Ye, Law, and Gu 2009), information systems research (Chen and Yao 2016; Filieri et al. 2018; Mariani, Borghi, and Gretzel 2019), and innovation management (Algesheimer et al. 2011; K. Lu and Elwalda 2016; Moe and Trusov 2011; Zhan et al. 2020). However, none of these studies takes the perspective of service providers, and so they do not advance current discourse in relation to identifying key competitors and improving services. Moreover, most of these studies consider only a limited amount of information from online reviews (e.g., they might extract a single summary opinion from a review), which cannot provide managers with an integrated and comprehensive set of competitors. On the other hand, the service literature has established that consumer evaluations of a service are greatly affected by interactions among consumers, operational approaches, information systems, staff, and companies (Brown and Dev 2000). These factors have been studied in relation to service quality, service representatives, and service blueprinting (Holloway and Beatty 2003; Rapp et al. 2015; Tsai and Lu 2006), suggesting that services involve companies and customers in co-creation (Holloway and Beatty 2003; Kumar et al. 2010). In spite of their theoretical and practical implications, these factors have largely been overlooked in the operationalization of models that can identify competitor sets and harvest the value of online customer reviews (Antons and Breidbach 2018). Notably, an analytical framework is required that can integrate relevant attributes of a service and help companies analyze online customer reviews and identify their key competitors. Accordingly, research has increasingly suggested that new approaches, such as data analytics and machine-learning methods, are needed to improve service systems (Gur and Greckhamer 2019; Jin, Ji, and Gu 2016). This study argues that insights into the competitor set are more likely to be captured in rich online reviews than through company-based questionnaires. Therefore, the lack of an analytical framework centering on the identification of the competitor set is a critical oversight.
The main objective of this study is to develop an analytical approach to help managers harvest information from online reviews that will allow them to identify their competitor set. The study setting is the hotel industry, but the approach could be used by service companies in general. It is based on the integration of an improved
This research makes three key contributions to the literature and practice. First, the service attributes identified from consumers’ online reviews can support hotel managers in evaluating their perceived quality of services and their competitive environment. Importantly, those attributes depend on the market segment served by a particular hotel. Second, as online reviews normally include information on competitors, we propose a more effective analytical framework, based on a set of machine-learning techniques, for service managers to determine their key competitors and to identify their own company’s weaknesses and strengths. This will, in turn, allow them to develop appropriate marketing strategies and make appropriate service improvements. Third, the proposed framework offers the opportunity for real-time analysis of the competitor set by applying analytical techniques. That is, it enables managers to conduct dynamic analysis to monitor their key competitors and changes to the market environment by applying up-to-date information from consumers’ online reviews.
Literature Review
Two fields of the literature relate to the present study: the use of Online Reviews for Value Co-Creation and Service Improvement subsection and Competitor Identification in the Service Domain subsection. Also, the existing methods and approaches for competitor identification are compared in subsection Methods and Approaches for Competitor Identification, and the settings for research regarding customers’ hotel selection via OTAs are presented in subsection Research Settings: Customer Hotel Selection via OTAs.
Online Reviews for Value Co-Creation and Service Improvement
The impact of consumer-company interactions on consumer evaluation of a service has long been seen as the process nature of services in the literature (Antons and Breidbach 2018; Brown and Dev 2000; Parasuraman, Berry, and Zeithaml 1993). Identifying these interactions can help companies to enhance their understanding of the “customer encounter,” which is defined as a customer’s direct interactions with the service during a specific period (Ordenes et al. 2014). Studies show that the encounters are important for consumers’ evaluation of service quality (Parasuraman, Berry, and Zeithaml 1993), customer loyalty (Brodie et al. 2011; Kumar et al. 2010), and customer satisfaction (Algesheimer et al. 2011; Nasution and Mavondo 2008). According to L. Wu et al. (2016), key encounters between companies and consumers can happen in different ways, such as face-to-face interactions, telephone, email, and the internet. To increase the quality of these encounters, the literature has studied the co-creative nature of services and consumers’ evaluations are treated as the outcomes of the multiple activities provided and resources applied during the service (Kumar and Pansari 2016; Mathwick and Mosteller 2017).
Moreover, the service literature summarizes critical factors in the service process that enhance consumers’ realization of value (Filieri et al. 2018; Nasution and Mavondo 2008). When receiving services, consumers combine activities offered by the company with external resources and use various approaches to generate value for themselves (A. C. C. Lu, Gursoy, and Lu 2016; Ordenes et al. 2014). During the consumer-company interactions, consumers’ value creation can be affected by the information platform (e.g., online forums and communities) provided by the companies (Antons and Breidbach 2018; Thakur 2018). These value co-creation platforms offer both the company and the consumer access to information that enables various activities, and different results are possible based on how the interaction proceeds. Companies work as value facilitators who support consumers in their value creation by offering them the necessary information and resources (Ordenes et al. 2014).
To facilitate the value co-creation and offer the right services to consumers (Kumar et al. 2010), it is important for companies to gain insights into consumers’ evaluations of their experiences and their perception of the value of the company’s services in a context defined by the consumers (Gao et al. 2018; Parasuraman, Berry, and Zeithaml 1993). This can be achieved by companies via harvesting online customer reviews through information platforms during or following interactions (Gur and Greckhamer 2019; Jin, Ji, and Gu 2016). According to Tan et al. (2018), although online reviews cannot directly lead to value generation for companies, they can result in internal process development and actionable information for decision making if proper analytical approaches are in place. For example, an analytical approach can be developed to enable managers to collect all their online reviews and other sources of information on their interactions with consumers, so that they can evaluate the competitive environment effectively and respond to consumers’ feedback in a timely manner (W. Wang, Yi, and Dai 2018). Also, the company’s weaknesses and strengths can be evaluated, which in turn will allow managers to develop appropriate marketing strategies and service improvements. Moreover, the analysis of online information can be done much more quickly and with information that is much more up to date than could be done using traditional means (Antons and Breidbach 2018). Nonetheless, studies have been rare in the service literature that systematically investigate the value co-creation process by harvesting the value of online reviews.
Competitor Identification in the Service Domain
Competition in the service industries is widely regarded as complex and dynamic (Du, Hu, and Damangir 2015; Nam, Joshi, and Kannan 2017). To identify competitors, managers normally focus on a small group of companies because of their bounded rationality and limited managerial resources (Peteraf and Bergen 2003). This approach is in line with the cognitive categorization view, which explains why companies identify only simple competitor sets and pay attention to just a few categories of business rivals (Baum and Lant 2003; Hatzijordanou, Bohn, and Terzidis 2019).
Service competitors can be defined in different ways. According to Ng, Westgren, and Sonka (2009), competition can be interpreted differently by stakeholders within a value chain, as they may have different perceptions of rivals. Most of the literature uses the concept of service substitution to define competitors, whereby service attributes (e.g., pricing and service cycle time) are compared to identify which other service providers are most similar (e.g., Clark and Montgomery 1999). The important service attributes classified by the early research build a strong foundation of understanding the dimensions on which competitors are best defined. For example, by identifying a company’s service shortfalls and strengths, SERVQUAL can help to define the competitor set and to determine which competitors share particular disadvantages and advantages attributes (Brown and Dev 2000; Parasuraman, Berry, and Zeithaml 1993). In this way, managers are recommended to evaluate the strengths and weaknesses of the company as well as those of its competitors through predefined scales and measurements (Clark and Montgomery 1999). However, this approach to competitor identification has been criticized for its subjective bias. Ng, Westgren, and Sonka (2009) suggest that when interpreting competition, managers may have different “blind spots” due to their personal characteristics and experiences.
The literature suggests that service improvement begins by comparing what consumers believe a firm ought to provide with what they perceive the firm’s actual service to be (Antons et al. 2018; Brown and Dev 2000; Gao et al. 2018). Accordingly, the competitors of service providers can be defined from the customer perspective by benchmarking the market preference for particular service attributes (Baum and Lant 2003; Sidhu, Nijssen, and Commandeur 2000). In other words, it aims to contribute to the knowledge regarding how customers define the competitor set for a focal firm. This approach is consistent with the view of service demand, which defines competitors as all the companies that aim to meet a similar set of customer demands. According to Sidhu, Nijssen, and Commandeur (2000), in comparison with other perspectives, the customer perspective normally identifies a wider and larger competitor set (i.e., direct and immediate competitors, as well as potential competitors), which may even span different industries, and so competitive boundaries are blurry. Identifying competitors from the customer perspective can, nevertheless, reduce the adverse effects of short-sightedness, competition asymmetry, and the “competitive blind spots” of managers (J. B. Kim, Albuquerque, and Bronnenberg 2011; Ng, Westgren, and Sonka 2009).
In addition, how to evaluate competitors is another important topic within the growing body of service literature. The superiority of relative service metrics over absolute measures of satisfaction is emphasized (e.g., Keiningham, Buoye, and Ball 2015). Recent studies show that instead of using a numerical value to measure customer satisfaction (i.e., the absolute measures), a ranking of competitors according to customer satisfaction is found to be more strongly associated with their share of wallet (Keiningham et al. 2015). According to Keiningham et al. (2014), although the use of relative metrics is robust in measuring service success, the question of “relative to whom” might be challenging. Therefore, developing an effective approach to competitor identification is a critical preliminary step in understanding customers’ perceptions of and attitudes to service providers.
Methods and Approaches for Competitor Identification
To identify competitors in the service domain, researchers have applied different methods and approaches in terms of the nature of the data resources and expertise required. Two commonly used methods have been surveys and archival studies.
According to Gur and Greckhamer (2019), quantitative empirical studies using cross-sectional survey data are the most common method for identifying competitors. Such research has generally taken a company perspective or a customer perspective (Gur and Greckhamer 2019). On the one hand, national or international statistical data have been widely utilized in empirical studies of competitor identification (Cooper and Inoue 1996; J. Wu and Olk 2014). On the other hand, researchers have explored how companies satisfy customer needs and analyzed customers’ brand-switching behaviors (DeSarbo and Grewal 2007; Wieringa and Verhoef 2007). The study of brand switching (e.g., Roos, Edvardsson, and Gustafsson 2004; Wieringa and Verhoef 2007) normally employs behavioral panel data in which one observes brand switching, for example, in examining customer perceptions of brands, and may apply a log-linear modeling framework to investigate which brands have similar image profiles to identify whether they form sharing or switching partitions.
The second method is the archival study. The early research analyzing firms’ archival data to identify their competitors focused on defining strategic groups of firms that share certain characteristics such as strategies, resources, and environment (Peteraf and Bergen 2003). In the domain of service research, the banking sector was one of the earliest “laboratories” for researchers using archival data to identify competitors through the analysis of strategic groups (Amel and Rhoades 1988). Various approaches have been applied to identify the strategic groups so as to find closely competing companies within the same industry. However, the main issue with this method is that the structure and boundaries of these strategic groups are usually ambiguous (Baum and Lant 2003). In addition, the method has typically employed a two-step approach. The researchers first apply factor analysis to identify the underlying dimensions and then they identify the strategic group using cluster analysis. As a result, multidimensional scaling (MDS) has been widely adopted to overcome the weaknesses of the cluster analysis such as the inconsistency of the factors that emerge and the overlooking of the time factor (DeSarbo and Grewal 2007).
To benchmark our proposed method against relevant studies, we summarize previous approaches for mapping and analyzing competitive market structures in Table 1. Previous studies have depended on data captured from questionnaires and surveys. For example, Cooper and Inoue (1996) apply archival data (questionnaires collected by Rogers National Research) to determine the preferences of different customer segments while DeSarbo and Grewal (2007) use survey data to investigate purchase intentions for vehicles through an asymmetric MDS approach.
Comparison of Studies Analyzing and Mapping Competitive Market Structure.
With the development of information technologies, service companies today are paying more attention to understanding their competition from a customer’s perspective given the large volume, variety, and veracity of user-generated content. J. B. Kim, Albuquerque, and Bronnenberg (2011) extracted data from Amazon.com on customer search patterns. Netzer et al. (2012) used user-generated textual data from an online automobile forum to identify competitive market structures. Du, Hu, and Damangir (2015) combined sales data from Automotive News together with search trends from Google Trends to illustrate evolving customer preferences. Nam, Joshi, and Kannan (2017) aggregated textual data from a social tagging platform to identify user-generated social tags. Additionally, the development of competitor sets is related to a subset of product attributes. Studies show that product attributes beyond marketers’ control can changes customers’ buying decisions (Baum and Lant 2003; Du, Hu, and Damangir 2015; J. B. Kim, Albuquerque, and Bronnenberg 2011).
Although previous approaches for identifying competitors have their merits, there are some challenges related to basic assumptions, data availability, and the visualization of large product categories. First of all, previous studies such as Cooper and Inoue (1996) and DeSarbo and Grewal (2007) collected their data through questionnaires and surveys. However, these data limit the potential to study consumer durables in large markets involving thousands of products. For example, customers are not likely to buy durable goods (e.g., vehicles or household appliances) very often. Therefore, these approaches are bounded by the cognitive capacity of customers. According to Ringel and Skiera (2016), even when studying just a handful of alternative products that customers tend to consider at the same time, it is questionable whether respondents can appropriately recall previous buying decisions or predict future purchase intent. Also, questionnaires and surveys tend to be time-consuming to complete and costly to administer and cannot be used to indicate real-time customer behaviors (J. B. Kim, Albuquerque, and Bronnenberg 2011; Nam, Joshi, and Kannan 2017; Ringel and Skiera 2016). Besides, the models developed by Cooper and Inoue (1996) and Du, Hu, and Damangir (2015) were based on a number of mathematical assumptions and there were ambiguities regarding consumer search intentions. Thus, the methods and results may not be fully applicable to companies in practice.
For a market involving a small number of products, it is relatively simple to demonstrate the competitive market structure by presenting dots on an XY graph, where each dot indicates a different product. However, as the number of products increases, the graphical presentation rapidly becomes a dense clump of dots, making it difficult to interpret the results (J. B. Kim, Albuquerque, and Bronnenberg 2011; Ringel and Skiera 2016). Although an additional dimension can be applied to mitigate this effect (DeSarbo and Grewal 2007), this should be avoided wherever possible because it tends to be difficult to check and explain the results (Ringel and Skiera 2016). Meanwhile, the selection of similarity measures can be ambiguous but can play an important part in the analysis. Appendix E shows some examples of the MDS maps generated using different similarity measures (we further explain this in Model Evaluation section).
Furthermore, MDS techniques are especially sensitive to the size of the data set being analyzed (Moore and Holbrook 1982; Ringel and Skiera 2016). It is inherent to the technique that the accuracy of data positions deteriorates when the data set becomes large (Buja et al. 2008). Issues such as the circular bending effect are common in MDS analysis (Carroll and Arabie 1980). This can result in inaccurate identification of competitive structures and, in particular, competitive relationships can be shown to be tighter than they actually are (Diaconis, Goel, and Holmes 2008; Moore and Holbrook 1982; Ringel and Skiera 2016). Nonetheless, our proposed method enables companies to identify their competitors effectively even for large product categories—that is, categories containing over 6,000 products. A comprehensive competitive map is created which enables companies to conduct real-time analysis of their competitor sets with consideration of specific strengths and weaknesses.
Research Settings: Customer Hotel Selection via OTAs
Like many other labor-intensive service industries, the hotel industry is under increasing pressure (e.g., to lower its costs and offer more high-quality services) and is highly concerned with competitor identification (Brown and Dev 2000; Kim and Canina 2011; Mohammed, Guillet, and Law 2014). Competitor identification is, therefore, a vital initial step in market evaluation, service improvement, and strategy development (J. Y. Kim and Canina 2011). As information search is often customers’ initial step, at which companies can affect their decision making, it is important to understand how online customers select hotels, especially in the era of big data. According to A. C. C. Lu, Gursoy, and Lu (2016), customers today want to compare products on different attributes before making their decisions. While a tremendous amount of external information is available to customers, they tend to use a small number of hotel attributes in their prepurchase information search. Previous studies have found that the importance customers attach to particular types of information in their prepurchase search depends on, for example, situational factors (e.g., risk perceptions and previous experience), product characteristics (e.g., type of trip and destination type), decision complexity (e.g., number of alternatives), and consumer characteristics (e.g., educational level and culture; A. C. C. Lu, Gursoy, and Lu 2016; Tan et al. 2018).
The information search process is quite different through OTAs. According to the 2018 Chinese Travel Consumer Report, 1 over 77% of hotel bookings in China are made through OTA websites, and this figure increases to 81% for bookings made on a phone app. In the present study of hotel selection from the customers’ perspective, data were obtained from Ctrip (www.Ctrip.com), a leading OTA that provides flight tickets, hotel reservations, and tourist resort products in China. There are two main reasons for using Ctrip.com. First, according to Shao and Kenney (2018), Ctrip has become one of the largest and fastest growing OTAs. It attracts over 135 million users in the Chinese market, and over the period 2017–2018 had a compound growth rate of 25%. Ctrip’s 2018 annual report shows that its net income had reached US$4.5 billion for the full year of 2018, a 16% rise year on year. Second, unlike the data from other platforms, data generated from Ctrip.com can be considered “open” (Ctrip 2017), and researchers and organizations have used data from Ctrip.com to monitor and analyze challenging issues in diverse fields (Leung, Law, and Lee 2011; Shao and Kenney 2018; Ye, Law, and Gu 2009).
OTAs make the search process simple and effective. To access hotel information from OTAs, customers usually are required to enter some basic information such as their destinations and dates of check-in and check-out. To understand customers’ requirements as well as avoid information overload, leading OTAs filter the hotels for customers based on certain predefined criteria, such as star rating, price range, and location. Star rating has been identified as the most important selection criterion for customers when they select a hotel via an OTA. On Ctrip.com, customers need to specify a hotel star rating (i.e., from two to five stars) before doing their initial search (as shown in Appendix D). Notably, the 2017 Ctrip Hotel White Paper shows that over 76% of the hotel searches on Ctrip.com were associated with a star rating (while 11.61% of the searches were price-related). 2 The report also identifies a rapidly increasing demand for highly rated hotels when customers were searching Ctrip’s listed hotels by star rating. This, in turn, indicates that hotels listed on Ctrip.com are more likely to compete with each other within the same star rating.
Although we acknowledge that a variety of factors can affect potential customers’ search process, given that the data were retrieved from Ctrip.com, this study applies star rating as a primary filtering criterion. In this regard, the list of hotels returned from a customer search can be regarded as a common set of hotels. Within the common set of hotels, customers are presented with several types of important information, such as a brief description of the hotel, lowest price, customer ratings, number of reviews, and customers’ recommendation rate. Based on the information provided, customers evaluate the alternatives to form a consideration set—a set of preferred hotels to minimize the risk related to their selection (Mankad et al. 2016; W. Wang, Yi, and Dai 2018). Studies find that customers increasingly rely on online peer-to-peer reviews in their prepurchase evaluation of the hotels within their consideration sets (Filieri et al. 2018; K. Lu and Elwalda 2016). The final decision of a customer is from their consideration set, which derives in turn from the common set of hotels offered by the OTA (Pan, Zhang, and Law 2013).
Methodological Framework
Online review data are now playing an important role in every service industry. It offers an understanding of customer preferences and allows an assessment of a company’s reputation (Holloway and Beatty 2003; L. Wu et al. 2016). However, the use of online review data to identify competitors has been overlooked. In this study, we use the hotel industry as an example and present an analytical framework based on a set of machine-learning techniques that will identify a service provider’s competitors. It further recognizes the relative importance of different service attributes affecting customers’ decision making in various market segments. We demonstrate the applicability of the proposed analytical framework on a sample of over 8 million customer reviews of 6,409 hotels in 50 Chinese cities, taken from Ctrip.com. Given that the service review data are diverse in its format, the underlying analytical framework can help the service provider to identify the competitors in the online battlefield more comprehensively and cost-effectively.
Figure 1 illustrates the overall process of the framework, which comprises three main steps. The first step is to collect all the available data, and Step 1: Data Collection and Exploratory Analysis section explains how both structured data (i.e., customer review ratings) and unstructured data (i.e., customer review text comments) are extracted and used. The second step (Step 2: The Improved

The analytical framework based on the improved
Step 1: Data Collection and Exploratory Analysis
In this study, we used a data crawler and downloaded all available hotel data from the Ctrip.com website for 50 key tourist cities designated by the China National Tourism Bureau (2016). To ensure the consistency of the data, we consider only the most popular hotels, that is, those on the first 10 pages for each city. It is important to note that the number of hotels listed on each page of Ctrip is fixed at 25 and is not affected by screen size. This gave a total of 12,500 hotels. The hotels with fewer than 100 online reviews and with blank reviews, duplicate hotels, and those lacking complete information (i.e., the number of rooms, price, and recommendation rate) were excluded to improve the validity and reliability of the data. The final data set contains 6,409 hotels with 8,374,102 online reviews posted between January 1, 2016, and December 30, 2016.
As shown in Table 2, for each hotel, we collect three types of structured data, namely, customer review data, hotel description data, and hotel search ranking. Customer review data comprise the number of customer reviews (used as a measure of customer engagement), recommendation rate, the overall customer rating, and a four-dimensional rating of hotel quality (i.e., ratings of the hotel’s location convenience, staff service, facilities, and cleanliness). The hotel description data include the hotel’s star rating (star), the number of hotel rooms (rooms), and the price of a standard room (price), where a standard room is the primary type offered by each hotel, which is also usually the cheapest. The hotel search ranking (ranking) is included, filtered by Ctrip’s hotel star ratings: two-star and below (economy), three-star (comfortable), four-star (high end), and five-star (luxury). Appendix A shows an example of a customer review on Ctrip.com. Reviews can be posted only by users who have at least reserved a hotel on the website, and they all give a summary rating, which can range from one to five. Other than the structured data, we also collect the unstructured textual comments posted by customers to conduct the in-depth text analysis for the identification of key service attributes.
Explanation of Structured Variables.
As discussed in Online Reviews for Value Co-Creation and Service Improvement section, customer-company interactions have long been considered an important resource for service providers in value co-creation and service improvement (Brodie et al. 2011). This phenomenon has been further enhanced by the digitization and development of information communication technologies. Traditionally, from the perspective of economic theory, price is treated as a primary strategic variable for hotels, especially in the short term (Weatherford and Bodily 1992), and the intensity of price competition increases when more rooms of similar quality are traded in a relatively small area (Choi 1991). However, online hotel competition today is strongly tied to the OTAs’ algorithms, and customer engagement (i.e., the number of reviews) becomes the most important factor for the selected OTA (e.g., Ctrip.com) to consider the hotel’s popularity and reflect this in its customer recommendation system where the hotel is ranked in the search return.
Studies show that customer engagement in the online platform reflects the popularity of service providers and can influence as much as 20%–50% of online purchase decisions (Kumar and Pansari 2016; Thakur 2018). Customers are likely to log on to online platforms like Ctrip.com to check reviews as part of their service evaluation (Shao and Kenney 2018; Ye, Law, and Gu 2009). On the one hand, research suggests that online reviews can affect the business of the service industry on a multisided platform like online marketplaces (Gur and Greckhamer 2019; Jin, Ji, and Gu 2016). On the other hand, by posting online reviews, customers can generate important social value within the community (Kumar and Pansari 2016; Thakur 2018). Therefore, this study considers the act of posting online reviews as one of the most influential expressions of customer engagement and takes the total number of customer reviews of each hotel in 2016 as a proxy for online customer engagement. Table 3 displays descriptive statistics for the variables used in the study, and we take customer engagement as the dependent variable in our model.
Descriptive Statistics for Variables.
Step 2: The Improved k NN Model
In order to measure how similar hotels are, we constructed an improved
To enhance the efficiency of competitor identification from a large-scale online review data, an improved
Given a training data set
where
Sorting the distances of each subset in ascending order, we can get the nearest neighbor set for the test hotel
In the process of calculating the predicted value
After getting the
The key advantages of the improved
This study applies two approaches to evaluate the reliability and validity of the improved
Then, this study weights each hotel attribute differently to reduce the prediction bias of the improved
Step 3: LDA Model
The process of competitor identification uses the quantitative hotel attribute information from the customer reviews and then combines the hotel description data and search ranking information to analyze the competitive relationships among hotels. Although such an analytical method provides managers with an effective way to scan the market for competitors, it overlooks the textual information in customer reviews (i.e., the unstructured data) and in particular the rich information on hotel attributes. According to Mankad et al. (2016), the textual content of customer reviews has more customer insight than the quantitative hotel attribute rating. Therefore, we take the unstructured data in customer reviews as the object and use the LDA model to extract customer insights from the textual comments. The LDA model is a powerful and widely used topic-modeling algorithm (Blei 2012; Blei, Ng, and Jordan 2003). It constructs a three-layer Bayesian structure of documents, topics, and key words and regards documents as a probability distribution of implicit topics and topics as a probability distribution of key words. In addition, the distribution of key words for different topics varies, so all reviews can be viewed as consisting of two probability distributions:
For a given set of text data, LDA uses a probabilistic framework to infer the set of hidden topics from the customer reviews and decomposes each review into a mixture of these topics with different probabilities (Blei 2012; Blei, Ng, and Jordan 2003). The text information from customer reviews is clustered into different topics, and the attribute key word vector of the data is constructed from the topics. The focal hotel can then benchmark its service against that of competitors and pay more attention to the relevant topics.
Results
We conduct an in-depth analysis of the data (including hotel description data, hotel search rankings, and review comments) and empirically test how well the analytical framework can identify key competitors and determine the importance of particular hotel attributes in different market segments. The outputs can be used to identify the focal hotel’s strengths and weaknesses, with visual representation of the results, all of which support more informed strategic marketing decisions. Moreover, this study also takes an unstructured view to harvest customer comments from the competitors’ online reviews. It allows managers to identify “hot topics” that capture users’ perceptions of the hotel and compare the “hot topics” among competitor hotels to make an appropriate market response within specific market segments.
Identifying the Importance of Hotel Attributes
After the data collection, all variables are normalized to prevent those with a high variance from dominating those with a lower one. The first step of the improved

The importance of hotel attributes for different hotel star ratings.
From the results, rooms is the most important attribute for all hotels, from two-star to five-star, with weight values of 0.33, 0.24, 0.28, and 0.30, respectively. In terms of other attributes, the outcomes are quite different for hotels with different star ratings. For two-star hotels, ranking (0.28) is the second most important attribute and has a more important influence on customer engagement than for other hotels, which is also true of price (0.17). Compared with two-star hotels, the weight of ranking (0.21) for three-star hotels is reduced but is still second in importance. Price (0.14) is third, followed by location convenience (0.09), recommendation (0.08), staff service (0.07), and customer rating (0.07). For four-star hotels, the weights of ranking (0.16) and price (0.10) are lower than for two- and three-star hotels. They both become less important in this segment. Additionally, the weight of location convenience (0.12) is higher, and location becomes the third most important attribute, followed by a cluster of three attributes, recommendation (0.07), customer rating (0.08), and staff service (0.08). For five-star hotels, location convenience (0.18) is the second most important attribute and is higher than for hotels with other star ratings. Note that ranking (0.12) and price (0.07) are less important for five-star hotels than they are for other hotels. Nevertheless, ranking is still the third most important attribute for five-star hotels. It is followed, in order, by recommendation (0.11), customer rating (0.08), and staff service (0.09), and the remaining attributes consist of cleanliness (0.03) and facilities (0.02).
Identifying the Competitor Set
The second step is to divide the training data set into four clusters based on the weighted
Generally, when a customer inputs the name of a focal hotel in the search box on Ctrip.com, other hotels will additionally appear among the search results, as shown in Appendix D. These additional hotels are recommended according to the search algorithm of Ctrip.com. Moreover, customers tend to include the top-ranked hotels recommended by an OTA in their consideration sets for a final selection decision (Chen and Yao 2016). Consequently, it is of interest to examine both customers’ and the OTA’s competitor sets (i.e., competitors identified from the two different perspectives) and to compare “matches” and “mismatches.” The purpose of this comparison is to evaluate the effectiveness of the managerial competitor identification model proposed in this study (based on the customer perspective) and to find the reason for any “mismatch.”
We randomly take a two-star hotel, Home Inn Guangzhou, and a five-star hotel, the Westin Guangzhou, as two focal hotels, and identify their competitor sets using the proposed model and the OTA recommendations. In Figure 3, focal hotels are highlighted in red, and the hotels within red dotted boxes are competitors that are identified (matched) from both the

Identifying the competitor set with different attributes. (A) Competitor sets of Home Inn Guangzhou based on the improved
In addition, we note that Home Inn Guangzhou has advantages on the customer rating, location convenience, staff service, cleanliness, and recommendation attributes, and the room price is at a medium to high level in Figure 3A. Although Home Inn Guangzhou has these advantages compared with its competitors, it is not the most popular hotel in the market. A practical recommendation for the hotel managers of Home Inn Guangzhou would be to prioritize its investment in its search ranking and facilities to achieve more customer engagement. It also can be noted that the popular nearby hotel sets recommended by the OTA include three-star as well as two-star hotels. Thus, we find that the OTA has a tendency to cross-sell other star-rated hotels to customers. In Figure 3C, our approach identifies that staff service is the Westin Guangzhou’s distinct advantage, and the location and customer rating attributes are superior, but a few competitors have similar characteristics to the focal hotel. The facilities and recommendation attributes and customer engagement are at a medium to high level, while the rest are at or below the average level of competitors. We conclude that the Westin Guangzhou has better customer engagement than its competitors because of its competitive advantages in staff service, location convenience, and customer rating but needs to address weaknesses concerning online search ranking, room cleanliness, and pricing.
Predicted Customer Engagement
After

Predicting hotel customer engagement for different hotel star ratings. (A) Two-star hotels. (B) Three-star hotels. (C) Four-star hotels. (D) Five-star hotels.
Model Evaluation
We evaluate the performance of the improved
Comparison of the Improved
To benchmark our improved
Robustness Check
After evaluating the model, we perform a check to ascertain that the stability of the improved
One-Way Analysis of Variance of 2016 and 2018 Results for the Different Hotel Star Rating.
As a further check on robustness, we pick 30 hotels at random and plot the 2016 and 2018 predicted values in Appendix F. The plots are similar, which again indicates that the improved
Unpacking Customer Reviews
We used a data crawler and downloaded all customer reviews of the Westin Guangzhou, and its nine competitors identified from the improved
The “perplexity” value was used as the benchmark to determine the number of topics (Blei, Ng, and Jordan 2003). The smaller the perplexity value, the better is the fitness of the model with different numbers of topics. Consistent with the study of Hoffman, Bach, and Blei (2010), the perplexity value was evaluated using five-fold cross-validation, and the results suggest the five most appropriate topics for the LDA model used in this study. The five topics were interpreted as location, amenities, value, experience, and transaction. Specifically, location is the place where the hotel is situated and whether it is convenient for customers; amenities indicates the useful services and features provided when staying at the hotel; value is associated with the customer perceived value for money after or during the hotel stay; experience mainly refers to the overall experience of the customer’s stay; transaction is mostly about transactional behaviors and the mechanics of the customer’s stay (it mostly appears during check-in or check-out and/or before customers arrive at the hotel). These topics capture most of the textual information in customer reviews. However, these topics are destination specific, in that they may not be the same if our data had been collected from a different set of hotels, particularly for location (Topic 1) and experience (Topic 4). Each topic contains different attribute key words with particular probabilities that the key words belong to that topic. Table 6 lists the 10 attribute key words that were most likely to appear in each topic, in descending order.
Most Likely Words in Each Topic.
The strength of each topic can be computed by the LDA model, and Table 7 compares topic strengths for the Westin Guangzhou with those of its nine competitors identified from the improved
The Strengths of the Five Topics Across 10 Competing Hotels.
Based on the topic strengths of different hotels, we conducted additional analysis to classify and examine the sentiments of the customer reviews to further understand each topic identified. In this way, we used the overall hotel ratings as a proxy for the “emotion recognition” of those review comments (Liu 2006; Ye, Law, and Gu 2009) and classified comments on hotels with a star rating above the overall average online review score as positive, while review comments on hotels below a star rating of three were assumed to be negative comments; the rest were designated neutral comments. As neutral comments presumably do not significantly drive customer behavior (Liu 2006), this study considers only positive and negative comments to analyze the preferences of customers. Appendix G shows two examples (the Westin Guangzhou Hotel and its major competitor, the Sofitel Guangzhou Sunrich Hotel) of co-occurrence networks generated using this approach. The size of the node represents the frequency of the key words, and the line thickness represents how often particular pairs of key words occurred in the same comment.
Discussion and Implications
By analyzing over 8 million customer reviews extracted from Ctrip.com, one of the world’s largest hotel OTAs, the calculated weights for different attributes reveal customers’ preferences regarding hotel selection. “Rooms” is the most important attribute affecting customer engagement across all star ratings. The number of hotel rooms is closely related to the star rating, in that four- and five-star hotels have significantly more rooms than two- and three-star hotels. This indicates that hotels with more rooms are likely to receive more reviews, which in turn leads to more bookings being made. This finding concurs with that of Phillips et al. (2015). It is important to note that once hotels are built, the rooms attribute is fixed. However, it is not reasonable to ignore this attribute, especially for a hotel chain, as managers should consider it in their site selection and procurement of hotels (Song and Ko 2017).
The findings show that the importance of particular hotel attributes varies across different hotel segments according to hotel star ratings. For two-star hotels, ranking is the second most important attribute after rooms. As two-star hotels account for a large proportion of the total number of city hotels, their higher ranking in the search results and higher number of hits per hotel make it easier for them to be included in customers’ consideration sets (Chen and Yao 2016). Therefore, the managers of budget hotels should have a strategy to optimize their return in a search through an OTA. Also, the results indicate that in this market segment, price is of concern to customers, more so than location, but this finding is the opposite of that reported by Mohammed, Guillet, and Law (2014). Similar to two-star hotels, the ranking attribute is important for three-star hotels. The customers are likely to have greater expectations of a three-star hotel, for instance, in terms of staff service, facilities, and cleanliness. Regarding the customers of four-star hotels, hotel location (close to scenic resorts, a commercial district, and a public transport hub) becomes more important than the price when they select a hotel. Also, customers pay more attention to recommendation, customer rating, and staff service.
For five-star hotels, although location is only the third most important attribute, it has a higher weight than for hotels of other star ratings. The price and ranking attributes are given lower weights than for other hotels. On the one hand, this suggests that customers are willing to spend more on a convenient location. On the other hand, Pavlou and Dimoka (2006) point out that the information on these luxury hotels can be easily found on OTAs’ websites, as there are a relatively small number of them in a given city. We also note that the five-star hotels have a lower weight on staff service, facilities, and cleanliness than on location convenience, which is counter to the findings of Nasution and Mavondo (2008). A possible explanation is that the customers take it for granted that a luxury hotel will offer high-quality services and facilities, and so they pay more attention to hotel location instead.
Apart from important attributes identified among different hotel segments, this study compared the list of competitors identified from a customer perspective with the list of the hotels recommended by the OTA (i.e., Ctrip.com) and found that the two lists of hotels do not completely match. Thus, there may be inconsistencies between the consumer perspective and OTAs’ interests. Additionally, although the proposed
Implications for Research
This study extends the application of online reviews in service research. Although other recent studies have given attention to competition analysis, service research has mostly involved analysis of survey and archival data (Wieringa and Verhoef 2007; J. Wu and Olk 2014), and the findings are limited by the sources of data (such as cross-sectional data and small sample sizes) and simplistic approaches to the analysis (such as the use of ordinary least squares regression models). As a result, the conclusions usually are neither reliable nor robust (Gao et al. 2018). In contrast, this study extends the literature by proposing and verifying an analytical framework based on a set of machine-learning techniques that has rarely been applied in the competitive environment of service industries (Gur and Greckhamer 2019). We proposed an improved
Moreover, the view that approximates the customer perspective on competitors can reduce managerial “blind spots,” shortsightedness, and competition asymmetry (Baum and Lant 2003), which is consistent with the study of Li and Netessine (2012). While previous studies have identified that attributes such as location, company size, price, and service are often the main factors that define competitors in the hotel industry (J. Y. Kim and Canina 2011), this study indicates that the importance of these attributes varies with hotel star ratings. Last but not least, according to the analytical framework, the market environment can be displayed graphically. The proposed approach can also be adopted in different service industries to determine the perceived quality of services and develop an effective strategy for service improvement.
Implications for Practice
Our findings have important implications for service managers, online consumers, and OTAs in harvesting online reviews to improve their service performance and decision making.
First of all, the proposed analytical framework can help service managers to gain a better understanding of their key competitors as well as customers to make appropriate market responses in a timely manner. Online reviews contain vast amounts of information and reflect customers’ demand preferences for hotels. The improved
The developed analytical framework also provides clear managerial implications for online consumers and OTAs by not only illustrating an effective approach but also producing several visual analytics as examples to follow. In particular, the developed analytical framework consists of competitor set graphs (as shown in Figure 3), LDA topic modeling (presented in Table 6) together with the key word co-occurrence networks (in Appendix G). These could be developed into a software application. Such an app would help online consumers to conduct content analysis and offer valuable insights by harvesting a large number of reviews from different platforms to support their hotel selection. Furthermore, the analytical framework developed in this study can be applied within a broad range of fields to support OTAs’ information management, processing, and interpretation. For example, the analytical approach can be adopted by OTAs to improve their analytics. In this way, it can further enhance their operational practices to offer better search results.
Conclusions and Future Research Directions
The purpose of this study is to provide insights into the procedures that could be used by service managers to identify competitors and recognize the relative importance of different product attributes based on online customer reviews. We propose an analytical framework based on a set of machine-learning techniques, including an improved
First of all, this study identifies competitors from the customer perspective. However, as competitors can be determined from different perspectives, it might be difficult for managers to agree with the results generated from our proposed framework. Thus, further research should include other perspectives (e.g., the managers’ perspective) to identify other competitor sets, and then compare these, so as to formulate more accurate marketing strategies. Secondly, this study could be extended by segmenting the market in ways other than star ratings, for example, by hotel brand or type (e.g., business and leisure). In this way, future research could more systematically analyze competitors from multiple perspectives for multiple types of market segmentation. Finally, we use hotel data collected from Ctrip.com, which is one of the largest OTAs in China. But taking data from just one source may make the results prone to bias. Future studies can be conducted to verify the analytical framework using data from different sources. Given the dramatic rise of peer-to-peer services (e.g., Airbnb and Flipkey.com) in the hospitality marketplace, the developed framework should be used to investigate the impact of the sharing economy by comparing hotels at a specific destination, available via OTAs (e.g., Expedia and Ctrip.com), with accommodation available through peer-to-peer platforms. This would be of particular interest, as studies indicate that peer-to-peer (sharing) platforms offer a broader range of products and services than traditional OTAs (M. Cheng 2016). Our proposed analytical framework could be used to study the sharing economy and determine the potential changes to the customer experience through the use of different peer-to-peer services.
Supplemental Material
Supplemental Material, Executive_Summary_JSR - Harvesting Online Reviews to Identify the Competitor Set in a Service Business: Evidence From the Hotel Industry
Supplemental Material, Executive_Summary_JSR for Harvesting Online Reviews to Identify the Competitor Set in a Service Business: Evidence From the Hotel Industry by Fei Ye, Qian Xia, Minhao Zhang, Yuanzhu Zhan and Yina Li in Journal of Service Research
Supplemental Material
Supplemental Material, Response_letter_JSR_3rd_round - Harvesting Online Reviews to Identify the Competitor Set in a Service Business: Evidence From the Hotel Industry
Supplemental Material, Response_letter_JSR_3rd_round for Harvesting Online Reviews to Identify the Competitor Set in a Service Business: Evidence From the Hotel Industry by Fei Ye, Qian Xia, Minhao Zhang, Yuanzhu Zhan and Yina Li in Journal of Service Research
Footnotes
Declaration of Conflicting Interests
Funding
Supplemental Material
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
