Abstract
An exploratory analysis of verbal protocols from a think‐aloud newsvendor experiment provided deeper insights into the decision‐making process, enabling us to formulate a number of questions that are worth answering in future research. In a think‐aloud experiment, subjects verbalize their cognitions while performing a task; responses are then recorded, transcribed, and analyzed. A majority of the subjects struggled with the abstractness of the business setting and were keen to know information on the product type, industry setting, decisions taken in the past, competitor's situation, etc. A large portion of the participants correctly identified the overage and underage costs, but failed to convert that information into the optimal order quantity. Finally, the bias in the order quantity was significantly influenced by the specific type of risk (overage or underage) that was identified closer to the decision, alluding to the presence of a recency effect. As a first application of verbal protocol analysis to inventory decision making, this study gives us an opportunity to highlight the strengths and weaknesses of this research methodology.
1. Introduction
After many years of focusing on rigorous mathematical analysis, researchers in operations management have only recently started studying the behavioral aspects of the decision‐making process. Amaral and Tsay (2009) is a recent example of this growing stream of research. While it should not have been a big surprise, it was observed that the decisions made by real people were substantially different from the optimal decisions predicted solely based on mathematics. These observations resulted in a significant change in the way people approached research in operations management. Boudreau et al. (2003) and Bendoly et al. (2006) provide detailed reviews of experimental and behavioral research in operations management. The newsvendor problem, which deals with stocking level decisions in the presence of uncertainty and costs associated with overstocking and understocking, has been a popular domain for behavioral research.
Schweitzer and Cachon (2000), studying end‐point decisions over 30 periods, reported that there was significant anchoring around the mean of the demand distribution and that the subjects always chose a stocking level that was between the mean demand and the optimal quantity. Bolton and Katok (2008) confirmed this phenomenon in experiments over a longer time horizon and further demonstrated that experience and feedback could lessen the anchoring effect of the mean demand. We investigate newsvendor decision making further, attempting to find out more about the thought processes in play as these decisions are made. We use audio recordings and protocol analysis of the resulting transcripts to identify reasons for the reported biases in newsvendor decisions. Our primary objective is to illustrate the use of verbal protocol analysis methodology for exploratory theory development around the newsvendor‐ordering anomaly.
Based on our experience in teaching and research, we broadly divide the newsvendor decision‐making process into the following three stages: (i) the information gathering phase, (ii) the analysis phase; and (iii) the final decision phase. Existing experimental research in newsvendor decisions has primarily focused on the results from the final decision phase. Experiments using computers were conducted en masse and not designed to capture and analyze data from the information gathering and analysis phases. Our exploratory study audiotapes one‐on‐one interactions between subjects and the experimenter, followed by verbal protocol analysis to map the process the subjects go through while making their decisions. The observations reported here, while only directional due to the limitations of the current experimental design, enable us to formulate a number of questions worth pursuing in future research.
Before presenting the details of the experiment, the method, and the results, we summarize our main findings. We observed that subjects tended to focus on the basic information relevant to the decision and did not seek some of the advanced information that would have enabled a better decision. The participants had difficulty dealing with the abstractness of the task at hand and were very keen to obtain information that they could possibly use to anchor their decisions. For example, they asked for information related to product and industry settings, decisions from the past, competitors' actions, and the vendor situation. In the absence of that information, they anchored their decisions around the mean demand, the only significant piece of the information available to them. Most of the subjects correctly identified the precise overage and underage costs, but failed to convert that information effectively into the optimal order quantity. This suggests that the percentile calculation (as described on p. 522 of Collier and Evans 2007) is not as intuitive as the operations management community perceives it to be. The risk identified closer to the decision played a major role in the bias of the order quantity chosen by the subjects. Close to the decision, if the subject was focused on the risk associated with excess inventories, then his/her order quantity was lower. On the other hand if the subject was focused on the risk associated with unsatisfied demand, then his/her order quantity was higher. We position these findings purely as exploratory with the intention that future well structured and focused studies will ascertain their validity.
The rest of this paper is organized as follows: The next section introduces the Verbal Protocol Analysis (VPA) methodology and details its strengths and weaknesses. We follow that up with a description of our experimental setup. We then present the results from a protocol analysis of the transcripts. The subsequent section describes the limitations of our study and how they can be alleviated in future studies. We conclude the paper with a summary discussion.
2. VPA Methodology
VPA is one of two (see Ford et al. 1989) methods commonly used to map the cognitive processes in decision making. It can be performed either concurrently (during the performance of a task) or retroactively (i.e., after the task has been completed). Under both approaches, the participants are urged to verbalize the thought processes underlying their decisions and the resulting audio tapes are transcribed, segmented, and encoded to obtain a trace of the decision‐making process. This approach has been widely applied in the fields of psychology, education, and cognitive science (Ericsson and Simon 1993). Estrada et al. (1997) and Isen et al. (1991) applied it in medical decision making while Isen and Means (1983) used it to study the impact of affect on decision‐making strategy. Despite its investigative power and popularity, VPA methodology has been criticized (Nisbett and Wilson 1977) for its apparent incompleteness and interference. It is perceived that no verbalization can capture all the thoughts that a decision maker's mind goes through, eventually skewing the observations made by the researchers. At the same time, the act of verbalization and the associated scrutiny may compel the decision makers to change the way they perform the task leading to inaccurate conclusions. Ericsson and Simon (1993) proposed verbal protocol collection methods and analysis procedures to overcome these confounding issues.
While it has never been applied to inventory decision making, VPA studies in the operations management context do exist. Crawford et al. (1999) investigated the work of industrial schedulers, van Wezel and Jorna (2009) studied the shunting operations at the Netherlands Railways, and Sanderson (1996) applied it to process control and transportation. Crawford et al. (1999) acknowledge the advantages of concurrent protocols, but also recognize their infeasibility and use retrospective protocols to characterize the relationship between human schedulers, technical systems, and the work environment. Van Wezel and Jorna (2009) describe the advantages of using experienced planners as subjects instead of students who, while being easily available, tend to be novices. Using observations of shunting schedulers, they identify task structures and develop a prototype planning system tailored to meet their needs. Sanderson (1996) demonstrates SHAPA, an interactive verbal protocol analysis tool, through its application in three experiments in process control, transportation, and navigation. Bainbridge and Sanderson (1995) provide a comprehensive description of VPA applications in operations management. With these studies as the background, we adapt verbal protocol analysis to a newsvendor decision‐making experiment.
3. Experiment Design
Our experiment was designed to observe how newsvendor subjects decide order quantities when the demand is random and there are costs associated with ordering too much or too little. To enable verbal protocolling, we designed the experiment so that only the most basic information is initially given to the subjects and they are expected to seek additional pieces of information that could play a role in their decision. This starts the process of verbalization, and they continue to do so during their decision‐making process. Participants were first presented with a sheet that contained the details of the task, the business setting, and the method of the experiment.
3.1. Task Instructions
For the upcoming selling season you have to decide the stocking level of this product in order to appropriately meet the demand for that product. Your company is very successful in the market segments it participates in and has invested significant amounts of money in technology in order to remain one of the leaders in the business. As a result of these technological investments, a vast amount of information is available to everyone in the company. You are allowed to ask the experimenter for information that you think could help you, but you need to be specific. Simply ask the experimenter for the information and if that information is available, it will be given to you. If it is not available, you will be informed of that as well.
Once the subject has reviewed the instructions, he/she signals the experimenter that the study can be started. The experimenter makes sure that the recording equipment is switched on and presents the subject with a sheet to record his/her decision. To enable the subject to start thinking aloud and also to refresh the subject's memory of the task details, the experimenter reads aloud a shorter version of the task instructions. The subject is then asked to perform the task of determining the order quantity while thinking aloud. As and when questions arise about the information required to make the decision, the subject can ask the experimenter to provide the information. While it is difficult to a priori think of all the information the subjects could ask for, we identified, based on our teaching and research experience, 10 (the subjects were not informed of this number) pieces of information, detailed in Table 1, that could be the most likely candidates. We identified a number of key words and representative questions and trained the experimenter to use them in figuring out which piece of information to provide to the subject. This aspect of the experiment was designed along the lines of Isen et al. (1991).
3.2. Participants
Twenty‐one second year MBA students who had expressed specific interest in operations participated in this study. They had been previously exposed to the newsvendor model in the classroom setting, but it is conceivable that they might have forgotten the mathematical details. In addition, having been involved in a course that had a number of factory visits and guest speakers from the industry, these students should have been in a position to ask the right questions. It is worth noting that the number of subjects used in this study is similar to or slightly larger than the number of subjects used in earlier decision‐making studies using verbal protocol analysis. Isenberg (1986) used 15 subjects and Ball et al. (1998) used 20. As in Isen et al. (1991), the incentive for participation was a flat (i.e., not impacted by the subject's performance) US$15 payment.
3.3. Discussion of our Experimental Setting
Our experimental setting differs from the previously conducted newsvendor experiments in many ways. Ours was a one‐time decision‐making situation whereas the earlier experiments studied these decisions in a repeated environment. In spite of this difference, we were pleased to observe the pattern of anchoring and insufficient adjustment reported in earlier literature. Based on the analysis of the order quantities chosen by our subjects, we are very confident in concluding that they behaved similarly to the subjects in earlier studies. We, however, now have information on how these subjects reached these decisions.
Our experiment also differs in the manner in which information was presented. In the previous experiments, all the relevant information (such as costs, demand distribution) was given en masse to the subjects. In our setting, only the most basic information (i.e., demand forecast) was initially told to the subjects. They had to obtain the rest of the information by specifically asking for it. This enabled us to determine which pieces of information they were able to recognize as being pertinent to this decision. As a result of the different subsets of information obtained by the subjects, each one would have a different decision that would be optimal for them. We evaluated the effectiveness of a subject's decision by comparing it with the optimal solution dictated by the set of information he/she possessed.
The previous experiments, with their emphasis on the end decisions, were conducted in a computer laboratory setting. Our experiment, due to its emphasis on interaction and audio recording, was conducted individually in a small room with a one‐on‐one interface between the subject and an experimenter. The experimenter was not an expert on inventory control, but was trained (supported by the use of key words and representative questions) to understand what the subject could be asking for. As a result, there was always a chance that the experimenter misunderstood the information requests made by the subjects. We are glad to report that our protocol analysis of the transcripts did not identify any such errors.
The interaction between the subjects and the experimenter was not designed to be a conversation and the subjects were made aware of this. For any information requested by the subject, the experimenter either: (i) acknowledged the availability of information and provided it, (ii) informed the subject that the information requested was not available, or (iii) reminded the subject that the request has to be specific. The idea behind designing the study in this manner was to simulate the availability of an information database (and not a domain expert) that the subjects could use to obtain any information that they felt could help them in their decision.
4. Results from Protocol Analysis
We transcribed each audio recording onto paper and performed a detailed protocol analysis of these transcripts. We first describe the methodology we used to analyze the verbal protocols. After that, we present the results of our analysis that can be broadly divided into four categories. We start with the information gathering efforts of the subjects. Initially we focus on the information that was available in the experiment and later we describe the information that was extraneous to the experiment. After that, we focus on the mechanics of the inventory control decision and how the subjects approached it. Then we focus on the specific risks to which the subjects paid attention and how that impacted their order quantity decisions.
4.1. Coding Process for the Verbal Protocols
We first parsed each protocol into thought fragments (segments of one or two sentences each) and every one of these fragments was classified into one of the following categories: (i) Task clarification; (ii) Information gathering; (iii) Statement of logic; (iv) Numerical calculation; (v) Statement of feeling such as frustration, excitement; and (vi) Finalizing the decision. Since these categories are clearly distinct, there was no subjectivity associated with them and the thought fragments could be easily and objectively categorized. Our analysis is mainly focused on categories (ii), (iii), (iv), and (vi) to gain new insights into the thought process behind the newsvendor decisions.
4.2. Seeking Information that was Available
Ten pieces of information were available to the participants. Table 2 shows which information was sought by each subject. An “×” in a cell indicates that the subject sought and acquired that piece of information. Notice that most subjects asked for and obtained pieces 1–5 only. The other five pieces of information were requested by only 2.4 subjects on average (ranging from 1 to 4). Only two pieces of information, namely purchase cost and selling price were acquired by all the subjects. The demand distribution (including the minimum and maximum) and the salvage value information were sought by two‐thirds of the subjects.
This leads us to conclude that most of the subjects were able to recognize the major factors (costs and demand uncertainty) that play a role in the newsvendor inventory decision, but failed to recognize the importance of more advanced (or non‐trivial) information that would have significantly influenced their decision. However, it is also interesting to see that five subjects (almost 25%) did not ask for demand distribution information. In the absence of that information, they have no choice but to anchor their decision to the mean demand.
4.3. Seeking Information that Was Not Available
Next, we focus on the efforts of our subjects in gathering information that was not available in the experiment. It should not be a surprise that during the study the subjects asked a large number of questions for which the experimental design did not have the information. While it is not clear whether having the information would have changed the way the subjects made their decisions, it is important to analyze those questions as well, since that could provide additional insights into the subjects' thought processes.
Table 3 contains the list of questions asked by the subjects and the frequency with which they were asked. Questions seemed to focus on different aspects of the business setting such as the product characteristics, where its utility came from, what other products the company manufactured, whether the product was perishable, etc. Clearly, the subjects seem interested in creating a business environment for their decision‐making situation. It is striking that more than two‐thirds of the participants asked about the decisions made in the past, decisions made by the competitors, decisions made by the vendors, etc. Perhaps they were apprehensive about making a decision with sole responsibility and were looking for other sources to justify their decision. That is, they may have thought that, if someone else such as the previous decision maker or a competitor made a decision similar to the one they were considering, they would be on firmer ground.
4.4. Order Quantities Chosen
Table 4 contains a summary of the order quantities selected and the time taken by each of the 21 participants in the study. These order quantities covered the whole range of the demand. That is, the lowest chosen value was zero and the highest chosen value was 20,000. Because of the availability of the transcripts, we can explore more about how the participants made their decisions. Order quantities may be classified into four different categories: (i) mean demand used, (ii) extreme value selected, (iii) optimal value chosen, and (iv) miscellaneous values.
The optimal order quantity reported here is specific to the subset of information that the subject acquired.
Participants 13 and 21 were able to obtain information that helped them to lean toward ordering 20,000. Subject #13 asked for and received information on the price discount for ordering 20,000 units and the presence of a defect rate. He/she put those two pieces of information together and decided to order 20,000 units. Subject #21 asked for and obtained the information on the CRM strategy that guarantees that sales will be at least 5000 units. That encouraged him/her to order the full 20,000 units.
The other two subjects (#1 and 7) chose 8900 and 9000 as their order quantities. A closer examination of their analysis showed that they made major mistakes in their analysis. For example, subject #1 somehow figured that the cost of stocking out is US$700 per unit and the cost of overstocking is US$800 per unit. Based on these costs, he/she estimated that the order quantity should be below the mean and decided to order 8900 units. Subject #7 gathered only the information on selling price and purchasing cost, and using that, he/she correctly calculated that he/she would need to sell 3333 units to break even if he/she purchased 10,000 units. Not knowing where to go from there, he/she said that “to be conservative,” he/she decided to order 9000 units.

Most of the subjects (19 out of 21) had reasonable logic in determining the order quantity. Only the remaining two (about 10%) subjects made analytical or logical mistakes that prevented them from reaching a reasonable decision. However, for most of the subjects, the reasoning was not rigorous enough to lead them to the optimal decision. Ten out of the 21 subjects used a parameter (minimum, mean, or maximum) of the demand distribution as their order quantity. Five of the 21 subjects chose 15,000, which would be the optimal order quantity if using only the basic (pieces 1–5) information that was available to them. This subset of people fully understood the newsvendor model, remembered it, and was able to use it. Four participants chose values of 11,000 or 12,000, recognizing that they should choose a quantity above the mean, but could not figure out how far above the mean they should be. There were six people who correctly computed the overage and underage costs and were aware of the demand distribution. They were unable to convert that information correctly into the optimal order quantity, however. This indicates that this computation may be the hardest part in the newsvendor calculation. The people who seemed to be unsure (and chose a value close to the mean demand) of their solution tended to ask more questions for which the information was not available in the experiment.
4.5. Time Taken for the Decision
The time taken by the subjects to complete this study varied from 529 to 1219 seconds. Not surprisingly, the subjects who spent more time asked more questions that were included in the experiment and also asked more questions that were not included in the experiment. We investigated whether this increase in the number of questions enabled them to make better decisions. Figure 2 contains a graph of the time taken by the subjects versus the error in their order quantity. There is no clear trend in the graph. We repeated this analysis using the percentage error as the measure of performance and there was no trend in that data either. Thus, we conclude that, while the people who took longer asked more questions, their decisions were no better.

4.6. Risk Identification Sequence and its Impact on Order Quantity
Here we focus on the sequence in which the risks were identified by each subject and how that impacted their order quantity decision. For each subject from the protocol analysis, we were able to determine whether he/she was first focused on the risk of excess inventory or the risk of unsatisfied demand. We were also able to determine the time at which they identified that risk. We attempt to relate that information to their eventual order quantity decision. Table 5 contains the details of that analysis.
Ten subjects first identified the overage risk (the risk associated with excess inventory), while the other 11 subjects first identified the underage risk (the risk associated with unsatisfied demand). For the participants who identified the overage risk first and the underage risk later, the average order quantity was 13,700 units. On the other hand, for those who identified the underage risk first and overage risk later, the average order quantity was 11,173. This trend was also present in the bias (distance between the subject's decision and his/her optimal solution) of the order quantity. For the subjects who identified overage risk closer to the decision, the order quantity was on the average 800 units below the optimal value. On the other hand, for those who identified the overage risk closer to the decision, the order quantity was on the average 2000 units below the optimal value. This difference, illustrated in Figure 3, indicates the presence of a recency effect, which is known to exist (see Table 1 in Hogarth and Einhorn 1992) in complex tasks with end‐of‐sequence processing of information. The recency effect could be the driver behind the demand chasing behavior that is known (see Bolton and Katok 2008) to exist in newsvendor decisions.

5. Discussion
Based on the results observed from the protocol analysis of the subjects' thought processes while making the newsvendor decision, we gained new insights into how inventory decisions are made. This enabled us to formulate the following questions that could be studied in future research.
What role does the problem abstractness play in preventing the decision makers from choosing the optimal order quantities? If the subjects are presented with information such as past decisions and competitor decisions, would they still anchor to the mean or would they choose a different anchor? If the subjects were provided a decision support system (that computed the optimal order quantity given demand distribution, overage, and underage costs) would the subjects be able to reach the optimal decision? Is the recency effect present in the newsvendor decision and how can it be used to enable the subjects to make better decisions?
Operations management decisions are so ubiquitous and critical in the modern economic environment and yet very little is known about the mental models (Senge et al. 1994) underlying these decisions. Verbal protocols and other process mapping techniques can make the thinking processes visible enabling the development of strategies to effectively battle the biases in the newsvendor, beer game, supply contracts, and other operations management decisions.
6. Limitations of Our Study
In spite of the number of new insights that our study provided, we recognize that it suffered from some experiment design issues, and here we detail how some of these limitations can be addressed in future studies. We use students as subjects to understand a ubiquitous real world decision while it would have been better to use practitioners as subjects. However, using students enables us to perform concurrent verbal protocol analysis which is richer (Crawford et al. 1999) than ex post verbal protocol analysis. We use the VPA method in an adaptive manner in that the subjects have to ask for and obtain new pieces of information and that possesses its own challenges. What information to make available and how to avoid the subjectivity associated with providing this information is an issue that requires a closer review. In a real world environment, this problem would never arise, but there information gathering and decision making occur over many days (Crawford et al. 1999) prohibiting detailed observation. Due to the small sample size, we were able to manually analyze the verbal protocols. There is a large volume of existing knowledge (Bainbridge and Sanderson 1995) on coding of verbal protocols that should be used if the sample size is large and/or if the protocols are long. Observations from Figures 1 and 2 could suffer from issues of endogeneity as one characteristic of the subject (e.g., confidence level, intelligence) could play a role in the reported outcomes. Such endogeneity can be avoided by measuring the subjects' intelligence, confidence level or other appropriate characteristics and evaluating their role in the experimental results. Our subjects received a flat compensation fee and, in the future, it may be necessary to devise a compensation scheme that paid subjects commensurate with their performance.
7. Conclusion
In this paper, we reported our observations on the newsvendor decision makers' thought processes achieved via protocol analyses of audio recording of participants' verbalization of their thoughts while solving the problem. These transcripts allowed us to understand the information‐gathering efforts of our subjects and further decipher how they used that information to make their inventory decisions. Thus, this study illustrates how protocol analysis may be used to investigate the processes of operational decision making.
We first observed that most of the subjects sought only the most basic information and failed to recognize the possibility of additional, non‐trivial information that should have helped them in their decision‐making process. Further, we observed that subjects had significant trouble with the abstractness of the problem setting and asked a number of questions aimed at removing this abstractness. Once the information was available, most participants were able to compute the overage and underage costs accurately, but failed to couple that with the demand information to determine the optimal inventory level. This led us to believe that computation of the critical ratio (and the subsequent conversion of that to the inventory level) is not as intuitive as commonly perceived by the academic community.
Finally, we noted the sequence in which the two risks (overage and underage) were identified by the decision makers and then observed that this sequence had a significant impact on the order quantity chosen. Those who first identified overage risk ordered a larger amount of inventory, while those with the other sequence (underage risk followed by overage risk) order a smaller amount of inventory. In addition to providing several insights into the decision makers' thought processes, this study raised a number of issues that could be pursued in future research.
