Abstract
This Managerial Perspective considers the use of ChatGPT by hotels as a tool for replying to their property’s online consumer-generated media postings. As readers would certainly know, ChatGPT is a free artificial intelligence software program designed to assist users with tasks requiring intelligence to perform. In this brief study, test participants were provided actual posted online TripAdvisor review comments along with the responses to these that had been posted by hotel management as well as a second response created by ChatGPT, with the request that the participant indicate which response was real and which was produced by the computer. A second request asked the participant to indicate which they felt was “better.” The results suggest that ChatGPT does a relatively effective job of mimicking authentic responses. These findings, and the discussion of relevant theory, will be useful for hospitality industry managers and academics as they consider the future use of artificial intelligence software for performing this important albeit often tedious task.
Introduction
ChatGPT, the artificial intelligence [AI] based website introduced to the public at the end of 2022 by the technology company OpenAI, has become near-universally familiar in an exceedingly short period of time. From the developers of the software, we share the following: “OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity (OpenAI, 2022b).” The company further explains that it had “trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests” (OpenAI, 2022a).
The following presents the findings of a brief study that had as its focus the managerial responses to reviews that most hotels post in response to their consumer-generated media [CGM] review postings on sites such as TripAdvisor, Booking.com, and so on. While issues related to CGM have been extensively studied in the hospitality and tourism literature (e.g., LaTour & Brant, 2022; Litvin et al., 2018; Litvin & Sobel, 2019), we were unable to identify any published academic research that looked at how AI is being used, or could be used, to generate CGM responses. It is our expectation that AI, with the introduction of ChatGPT and its competitors, will be a tool too enticing for hoteliers to not at least consider as a replacement for humans when doing the important but tedious task of responding to CGM reviews. But should hotels remove the human touch and instead rely upon a computer program, smart as it seems to be, to perform this work? Before making such a transition, there are issues that should be considered, several of which this study explores.
Consumer-Generated Media Managerial Responses and AI
It has become the trend for hotels to respond to virtually all postings on consumer review sites—thanking reviewers for their positive comments and apologizing to those posting negative reviews (Anderson & Han, 2016). An article published a decade ago by Litvin and Hoffman (2012) first addressed the question of whether it was a good investment by hotels to craft and respond to online reviews. At the time, few hotels did so. Their research determined, however, that shoppers scanning through hotel reviews to help select a property for an upcoming trip—the audience in fact being targeted, as the original review poster rarely returns to the site to see the response—reacted favorably to management responses. Considerable subsequent research has confirmed this initial view. Mathwick and Mosteller (2017) see management responses as a strong expression of customer engagement, allowing companies to stay connected with their customers. Similarly, Wei et al. (2013) revealed that management responses to positive reviews were interpreted by consumers as customer appreciation actions and that responses to negative reviews were perceived as evidence of a concern for managing customer relationship, protecting online reputation, and enhancing service recovery. Further corroborating these findings, Zhang et al. (2020) demonstrated the effectiveness of personalized management responses, whether to positive or negative reviews, in improving hotels’ online ratings. Perhaps of most significance, in research that elevated the discussion from conceptual to one based upon operational performance, Xie et al. (2017) determined that the volume of management responses significantly and positively correlated with RevPAR across all hotel classes. Given the above, the now ubiquitous policy of hotels responding to online reviews would seem most likely to remain the norm.
Complementing the abovementioned discussion, it is important to provide a brief overview of how AI has evolved and how this could lead to its use as a CGM tool. As noted by Jakesch et al. (2019, p. 1), “What used to be Computer-Mediated Communication (CMC) is turning into AI-Mediated Communication (AI-MC); interpersonal communication not simply transmitted by technology but augmented—or even generated—by algorithms to achieve specific communicative or relational outcomes.” These platforms, however, commented Rapp et al. (2021) rely solely on written words and do not utilize other aspects of communication, such as speech intonation and facial expression. But now we have ChatGPT, which seemingly differs from other text-based chatbots, per both Abdullah et al. (2022) and Shen et al. (2023), for its ability to recognize context and respond to requests in a coherent, human-like, and conversational way that seems more natural than robotic.
Before ChatGPT, most chatbots were automata used for tasks such as web crawling and notifications, all of which have limited interactivity with the user (Grudin & Jacques, 2019). In a review of 10 years of research related to text-based chatbots, conversational issues emerged as a recurring theme, with errors and biases quite common (Rapp et al., 2021). But today, with ChatGPT’s ability to converse—as will be illustrated below—it seems reasonable to anticipate, as AI software continues to mature and gain sophistication, that it will become increasingly difficult for individuals to differentiate between a machine-generated and a human response, thus extending its potential applications and reach. As such, the following, perhaps obvious, questions have emerged: Could not a hotel’s CGM responses be effectively drafted utilizing the AI platform? And would it not then be a cost-effective idea to replace current employee time and effort with ChatGPT? Before making such a determination; however, the issue of how consumers will react when they cannot be sure that what they are reading is human-generated needs to be considered. Such a concern leads to the issue of authenticity, as discussed below.
Authenticity
The concept of authenticity is an important theme of our study. From a neurological perspective, Huang et al. (2011) determined that the human brain responds differently when viewing original, ergo authentic, artwork, versus when viewing work that the person has been told is a copy of the original. No matter how accurate the replication, participants’ brains reacted differently during the viewings. Authenticity, or at least one’s perception of authenticity, was critical to the way the artwork was viewed. From a tourism perspective, the perception of destination authenticity similarly matters, as individuals convert their perception of authenticity into “higher value” (Carroll, 2015), rating destinations they consider to be genuine, honest, and true more positively than those seen to be less authentic. Specifically considering CGM postings, the general sense of the literature similarly suggests that authenticity matters. As noted by Shulga (2021), the receiver’s perception of the authenticity and trustworthiness of consumer reviews positively affects their interpretation of the message. An interesting comment by Steiner and Reisinger (2006, p. 300) suggests that authenticity “is part of a long philosophical tradition concerned with what it means to be human.” This would lead one to question whether AI-produced commentary, decidedly not “human,” can ever be seen as authentic? Determining this is increasingly important, for, per Kim and Baker (2017, p. 312), “authenticity has overtaken quality as the prevailing purchase criterion.” Thus, the acceptance of ChatGPT postings as authentic is clearly a topic deserving of exploration, for it is apparent that hoteliers and others must understand and consider the issue of perceived authenticity of AI-produced review responses before “delegating” the response task to the emerging technology.
The research described below considers both the quality and the perceived authenticity of ChatGPT-generated review responses by having participants take an emailed “quiz” that asked them to compare ChatGPT’s responses to CGM hotel review postings with those written by hotel management. Specifically, the “quiz,” after first sharing each TripAdvisor posting, asked participants to identify which of the two provided responses was the human-generated, versus AI-generated, response. This was followed by a second question that asked their opinion as to which of these two they felt was “better.” Given the nascent nature of the software, we hypothesized that people would successfully identify the human-generated review responses. We also hypothesized they would determine the humans’ responses to be superior. But before creating our “quiz,” we wanted to know if ChatGPT was able to assist with our study.
Exploration and Application of the Software for This Study
Could ChatGPT craft credible review responses without having innate knowledge of the property being reviewed on TripAdvisor? The software, which as noted above is quite conversational, was thus asked: “Can you read TripAdvisor reviews which I will provide and create management responses for these reviews?” ChatGPT responded: Yes, I can read TripAdvisor reviews and create management responses. . .First, I would read the review and understand the main points that are being made. Next, I would analyze the review, paying attention to any specific issues or concerns that are mentioned. Based on this analysis, I would generate a response that addresses the points raised in the review and respond to any issues or concerns that were mentioned.
It was clear from the above response that the program had understood the enquiry and was ready and able to take on the requested task. As such, hotel reviews posted on TripAdvisor for five different hotel properties were selected for testing. To select appropriate reviews and responses for testing, the authors read through the TripAdvisor reviews of multiple hotels located in the same Southern U.S. city and selected five, each from a different hotel, for testing. These ranged from one to five-star reviews. Capturing reviews and responses from different hotels provided a range of response styles and the varied stars awarded by reviewers allowed testing of responses to both satisfied and dissatisfied guests.
The five selected reviews were then provided to ChatGPT, asking that managerial responses to these be drafted. Within seconds, the computer provided the five responses. These were, however, quite redundant, with many of the same phrases and comments repeated in each response. As such, entered into the system was the clearly non-technical request: “The responses are getting too redundant. Can you mix them up a bit?” The computer responded: I apologize if the responses I provided seemed repetitive to you. As a language model, I do not have personal experiences or specific knowledge, so I rely on the information provided to me to generate responses. If you would like me to mix up the responses, I can try to vary the language and structure of the responses to make them less repetitive.
We then replied: “Yes, please vary the language and the structure of the responses.” To which ChatGPT replied: “Certainly, I will do my best to vary the language and structure of the responses while still addressing the specific issues and concerns raised in the reviews.” With our reviews and review responses then in hand, data collection was initiated. The approach and results of that effort are discussed below.
Please note that we have appended the “quiz” to allow readers to determine for themselves whether they were able to distinguish the human versus ChatGPT responses, to allow readers to evaluate which they felt to be the better of the two, and to allow comparison of their results with those of our participants.
Test Method
The approach that guided the above was a modified Turing Test, described by Mikulak-Klucznik et al. (2020) as an effective method for the study of AI when conducting a “human versus machine” comparison study. Mikulak-Klucznik et al. (2020) used the method, first suggested by Turing and Haugeland (1950), to compare then-nascent computer intellect with human intellect when seeking to determine a panel of expert chemists’ ability to differentiate between diagrams reflecting the synthesis of complex natural products crafted by the computer versus those of actual chemists. The Mikulak-Klucznik et al. (2020) test found their experts unable to make the distinction. The authors then asked their participants to judge the comparative “synthetic elegance” of the two diagrams, with the machine-designed pathways found to be “slightly more elegant.” (In the current study, instead of “elegant,” as noted above, participants were asked which was “better.”) Another example of the use of a modified Turing Test was employed by Candello et al. (2017) to learn if their study’s participants were able to determine whether the financial advice they were being asked to evaluate was crafted by machine or human. Once again, humans were unable to correctly distinguish between computer-generated and human-generated work.
Study Participants
The non-scientific data collection method employed was a function of convenience and efficacy. Responding to our participation request was neither quick nor easy. As would have been noted by readers who took the appended “quiz,” providing thoughtful responses required significant reading time and thought, well beyond, we were concerned, what we could expect from a random-sample survey-panel completing a typical Likert-type-scale survey instrument. As such, approximately 100 participation invitations were sent to known recipients of the lead author, asking via email if they would spend some time reading the appended set of reviews and responses, and to then provide their evaluative “personal expert opinion” as to which of the two sets of review responses for each of the five review postings was human versus computer-generated and which of these two was “better.” No promise of anonymity was provided, and with the email responses, it was obvious to the respondent that we would know both who had responded and what their responses had been. No identifying information, however, is shared in this paper. Received were a total of 94 responses, 93 of which were usable (one attached set of answers would not open). These respondents were generally well-educated (83% had at least a university degree), skewed female (68%), and had a median age of 32 years (range 18 to 75), providing a participant pool adequate to accommodate the issue being studied.
Survey Format
The instrument’s five sets of hotel reviews, in terms of star ratings, were randomly ordered. Similarly, determining which managerial response was presented first, the actual response posted by the property or the response generated by ChatGPT was guided by a random number generator. In addition to capturing their responses to these 10 evaluative questions and the three questions used to capture the limited demographics presented above, five complementary questions regarding the participant’s CGM usage constituted the participation request.
Test Results
As noted, to provide some general background regarding the participant’s CGM usage and attitude, five inquiries were made. When asked how often they consulted TripAdvisor or other CGM when planning a trip—with response options that ranged from “1=never to 5=always”—their mean response of 3.0 (SD=1.3) was exactly in the middle. A significantly greater usage of CGM was expected. A second question asked the importance of prior guests’ posted reviews when making travel decisions—with response options that ranged from “1=not important to 5=very important.” This question elicited a mean response of 3.8 (SD=1.2). Thus, while CGM reviews were not used as often as expected, they were seen as relatively important across the sample. The next query asked how often the respondent reads a hotel’s responses to its CGM reviews—with response options that ranged from “1=never to 5=always.” The mean response was a moderately positive 3.5 (SD=1.4). Participants were then asked how much influence a hotel’s responses to review comments had on their evaluation of the hotel—with response options that ranged from “1=no influence to 5=great influence.” The mean response of 3.0 (SD=1.1) suggests these do not seem to be critically important. Finally, however, when asked if the non-response of a hotel to online reviews would be a concern—with response options that ranged from ‘1=no, not at all to 5=yes, it would influence me greatly’—a positive lean was noted, with a mean response of 3.2 (SD=1.3). Summarizing the above, it seems that CGM is generally used by the participants, though not necessarily relied upon. And while management’s responses to posted reviews are not particularly influential, the absence of such reviews is seen as a negative for the property. Given these findings, and in line with the previously discussed literature, it would thus seem important that hotel management continue to respond to their property’s review postings. But, would handing over the job to ChatGPT or another AI software be effective? Based upon the following results, this is a question certainly worthy of consideration.
“Quiz” Responses
The “quiz” test results were interesting and informative. As reflected in Table 1, it was determined that ChatGPT was not particularly effective at fooling people into believing it was real, at least when presented side-by-side with a human response, as was the format for our Turing Test. Participants identified the “real” versus the AI response correctly for four of the five sets of review responses. Collectively, of a possible 465 “votes” (93 participants x 5 sets of review responses), participants correctly identified 67% of the reviews as “real” versus machine-generated. These test results, consistent with our hypothesis, suggest that ChatGPT has some learning to do before it is able to provide responses that come across as “human.”
“Quiz” Answers and Statistics.
Note. The datapoints without an asterisk had a Binomial test difference of p > .05, and thus for these, no clear response preference majority was identified.
Binomial test difference statistically significant at p < .01 (e.g., for the five-star hotel rating, the 77% of participants who correctly identified the human response as “B,” and the 74% of participants who indicated that “B” was the better response, reflected the views of a statistically significant majority of participants who had taken the “quiz.”).
When participants were then asked which review response they felt was the better of the two, the results were less clear. For two of the five review responses, the human response was most often selected as best. For two others, (one dramatically so) the ChatGPT response was preferred. The fifth set of responses was statistically undifferentiated. Collectively, looking at all 465 opinions expressed, there was a near even split between those who preferred the human versus the AI responses, with a slight (54% versus 46%), albeit statistically undifferentiated, skew favoring the responses generated by ChatGPT.
We then revisited the dataset to consider the impact of presumed authenticity. For this analysis, we did not consider whether the participant had correctly identified the response creator as either human or computer. Instead, we wanted to learn if the response option they had identified as human-crafted was also the one they felt to be the better of the two. The results were clear, with 68% of the responses selected as “real” also selected as the better response.
Discussion
The computer responses to the tested review postings provide a sense of the quality of work ChatGPT was able to produce with minimal effort by the user. As noted earlier, the quality of these improved significantly from an initial set of responses following a simple and straightforward secondary prompt seeking less redundancy in content and style. With additional “training,” the computer’s output, it seems likely, would have improved still further.
As the test results revealed, it was apparent that users of CGM were often unable to tell whether the responses they were reading were computer-generated. And even when they did, would they care? As noted by Kelly et al. (2022), when AI gets so good that one cannot tell a computer-generated response from a human response, acceptance becomes an “involuntary action.” Thus, once again the question: Should hotels and other service providers transition to ChatGPT to craft their responses? ChatGPT was asked what it thought, with our query receiving the following response: On the one hand, AI systems can provide a convenient and efficient way for hotels. . .to communicate with guests and handle a large volume of inquiries. They can also provide a consistent and standardized level of service, which may be difficult to achieve with human employees. On the other hand, there are ethical considerations related to the potential for job displacement and the need for transparency about the use of AI. . .It is important for hotels and other hospitality businesses to consider the potential impacts on their employees and to implement any changes in a responsible and transparent manner.
Another consideration relates back to the previously discussed issue of authenticity. However, what is “authentic?” Can AI be “taught” to value morals that align with those of the organization that it represents, or will the computer’s values be generic, based upon its broad foundation of knowledge? This will be important to address as we go forward, for, as reflected in the above findings, participants clearly preferred those responses they felt had been generated by humans, i.e., those they felt reflected the authentic sentiment of the hotel’s management. Such a finding is certainly consistent with the previously described artwork experiment conducted by Huang et al. (2011). Moore et al. (2021, p. 9) however note that the concept of authenticity “is a dynamic and evolving expression.” Will the future reveal that people will simply not care whether the review responses they are reading were generated by human or machine? Will the entire concept of authenticity, as this relates to AI, become meaningless? These questions will be important to consider as we move forward with the ever-advancing and improving technology that ChatGPT and its competitors provide, or perhaps foist upon, humanity.
One final time, we asked for Chat GPT’s view on the topic. The response received follows: It is important for hotels and other hospitality businesses to be transparent with guests about the use of AI and to ensure that guests are aware when they are interacting with a chatbot or other AI system. This can help to build trust and ensure that guests feel comfortable and well-informed about the services they are receiving.
There is no reason to disagree with the above computer-generated response. And Moore et al. (2021) may well be correct that in the future it simply may not matter. But for the present, as reflected in our “quiz” results, it seems important that CGM response readers believe what they are reading is “authentic” as people clearly prefer to be reading words they at least believe to have been crafted by a human and not a machine.
We feel that what was learned herein is important and foundational as we move into a new AI reality. While we recognize that this study has limitations, it is hoped that it can serve as a launching point for future research into what is sure to be a very important topic.
Limitations of the Study
The limited size of the sample, the fact that all participants were from the USA, and that these were not selected randomly, are each limitations that point to the need for additional testing. Further, we have no doubt that the technology will continue to evolve and improve—if being more “human-like” is deemed an improvement. In fact, as noted earlier, when constructing our test, we instructed the computer only once as to how we felt its responses could be improved. Had we further “taught” it how to construct a better response, we do not know how much better these might have become and how this would have affected our “quiz” results. If in the future hotel managers delegate this responsibility to the computer, we suspect they will invest significant effort in refining the computer’s output. The result could be quite impressive.
Furthermore, from a testing perspective, a different survey design may be considered in a follow-up study. As noted, a modified Turing Test was employed herein, presenting the human and computer responses side-by-side. This is a standard, well-accepted, and cost-effective research method. However, a replication study that has subjects view and evaluate one or the other of the responses independently would perhaps provide additional independence to their evaluations.
Future Studies
Beyond the aforementioned issues, there are additional future considerations that should be addressed. Most basically, we need to ask whether we should be using “intelligent” computers to do the work that human employees are now producing. The price of ChatGPT (today, free) is hard to ignore and it was clear that our research participants were generally impressed with the quality of the ChatGPT output. But is AI to be the way in our future? This basic question leads to the following future research topics that could be pursued:
Management responses to posted CGM reviews help to convey a sense of sincerity and honesty (Wei et al., 2013). If management responses are generated by text-based chatbots, will consumers perceive them to be sincere and honest? As noted by Fares (2023), while ChatGPT is highly conversational, “it is important to note that the tool is only human-like.” Will this lack of true human emotion render AI tools such as ChatGPT less effective than one might currently expect? Our research did not specifically ascertain the participant’s view re the sincerity of the responses. Future researchers would be wise to more directly measure this important concept.
Particularly for negative reviews or service failures, management responses can help to rebuild the service provider’s reputation, trustworthiness and caring image (Sparks et al., 2016). However, and contrary to the previous comment, will the future “humanness” and learned ability of text-based chatbots to express empathy actually end up encumbering service recovery? This question has been prompted by the Uncanny Valley Theory (Mori et al., 2012), which proposes that too much humanness by a computer engenders “eerie sensations,” which users find to be uncomfortable, and which could potentially render service recovery ineffective. Future research should determine how the Uncanny Valley Theory may apply to CGM.
Earlier in this paper the topic of authenticity was considered. However, can ChatGPT ever truly learn to convey the “Moral Authenticity” (Lehman et al., 2014) of the organization it represents, or will its values be generic, based upon its broad foundation of knowledge? Such questions will indeed be important to consider as we move forward with the still nascent but ever advancing and improving technology.
Conclusion
We feel that much has been learned in this study, but much more needs to be considered as the technology continues to advance. It was clear from our findings that ChatGPT’s ability to mimic the efforts of humans when drafting CGM review responses is not far behind. But we also know, and see in our results, that humans have a bias in favor of the work of humans versus machines. But that is today. What will the future bring? Choi et al. (2017) noted that posting fake consumer reviews on TripAdvisor was both “unethical” and “morally unacceptable.” Undoubtedly, this is true. But will, in the future, those reading a computer-generated response to a CGM review be concerned that what they are reading, fake as it may be, was not the creation of a fellow human? Only time will tell, for we must accept the fact that reality is but a fluid concept (Lehman et al., 2014), based upon looking “backward,” comparing today’s reality with an idealized version of the past. Hopefully, this early research can help us to better understand the present, and better manage that future.
As a final comment, it is interesting to note that Turing and Haugeland (1950) predicted it would take “about fifty years” before computers were sufficiently smart enough to pose as humans, likely envisioning HAL, the Heuristically programmed ALgorithmic (thus HAL) computer from the 1968 movie 2001: A Space Odyssey. These authors may have been off by a couple of decades, but with the introduction of ChatGPT, it certainly seems that HAL has arrived!
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, or publication of this article.
