Estimating the true arrival,balking,and reneging processes from censored transactional data: a simulation-based approach

Abstract

The transactional data typically collected/available on queueing systems are often subject to censoring as unsuccessful arrivals due to balking and/or unserved entities due to reneging are not recorded. In fact, in many situations, the true arrival, balking, and reneging events are unobservable, making it virtually impossible to collect data on these stochastic processes—information that is crucial for capacity planning and process improvement decisions. The objective of this paper is to estimate the true (latent) external arrival, balking, and reneging processes in queueing systems from such censored transactional data. The estimation problem is formulated as an optimization model and an iterative simulation-based inference approach is proposed to find appropriate input models for these stochastic processes. The proposed method is applicable in any complex queueing situation as long as it can be simulated. The problem is investigated under both known and unknown reneging distribution. Through extensive simulation experiments, general guidelines are provided for specifying the parameters of the proposed approach, namely, sample size and number of replications. The proposed approach is also validated through a real-world application in a call center, where it successfully estimates the underlying arrival, balking, and reneging distributions. Finally, to enable reproducibility and technology transfer, a working example, including all codes and sample data, are made available in an open online data repository associated with this paper.

Keywords

Simulation censored transactional data subset selection balking reneging call center

1. Introduction

Queueing systems are commonly found in various sectors including but not limited to manufacturing, healthcare (hospitals and emergency clinics), service and amusement industries (restaurants and theme parks), telecommunication, transportation, and military. In many of these systems, an entity/customer who balks at joining a queue or reneges after waiting for some time without being processed/served corresponds to a negative outcome, such as lost revenue and low quality of service. Therefore, there is a need to estimate the underlying balking and reneging dynamics.

Estimating entities’ balking behavior is a challenging problem since transactional data collected from most queueing systems only account for entities that actually entered the queue. Such data censor part of the demand that was lost due to balking and can lead to potentially incorrect or sub-optimal decisions if used for decision-making related to the design and operation of the queueing system. For example, effective capacity planning and staffing decisions require an accurate estimate of the demand, otherwise they may lead to excessive resource costs or an understaffed facility. The main challenge is that, in many situations, the true (before-balk) arrival process and balking events are unobservable, making it impossible to collect data on these processes. For example, there is no way to tell whether or not a car passing by a busy drive-through restaurant is a potential customer who decides to balk due to the long line.

A related phenomenon is reneging, that is, when an entity, having joined the queue, decides to leave the queue once her waiting time exceeds some threshold representing the maximum time she has or is willing to wait for service. While such behavior may be observable and possible to collect data on, typical transactional data collected from queueing systems often do not include reneging events as it requires tracking each entity separately. This also applies to a special type of queueing systems generally referred to as ticket queues,^1–3 which may or may not have a physical waiting area. Upon arrival to a ticket queue, a customer is given information about the current state of the system indicating the order of arrival to the system and the number for the customer that is currently being served. The arriving customer then chooses whether to join the queue or balk, a decision based on the perceived queue length and associated waiting time. Therefore, for such queues, the true external arrival process can be estimated directly from transactional data. However, data on balking and reneging still may not be available. For instance, in many ticket queues, a balking or reneging event is not detected until the person’s number is called for service. The problem is that it is not clear whether the entity balked right away without entering the queue or reneged after waiting for a while (i.e., no information on balking threshold (BT) or waiting threshold (WT)).

It is worth noting that individuals themselves may not be fully aware of their own balking or reneging thresholds/behavior with respect to different queueing systems. Even for the same type of system, an individual’s threshold may vary for every visit to the system depending on the need and situation. For example, people are willing to wait longer to have their driver’s license renewed if their current license expires on the same day. Moreover, people often rely on their perception at the moment rather than their exact position in line or a precise estimate of the amount of time they are willing to wait before reneging.

In short, collecting data on and estimating the true arrival, balking, and reneging processes is not a trivial task, to say the least. This paper proposes a method to estimate these important parameters. We consider queueing systems where entities balk depending on the queue length at the time of arrival and their personal BT, and renege when their WT is exceeded. Given only observed transactional data on after-balk arrivals, queue length (without tracking individual entities), and service times, the goal is to reconstruct the model by estimating the distribution of the true before-balk entity inter-arrival time (IAT) and balking and reneging thresholds that would lead to similar dynamics as those observed. The paper introduces a new random variable to characterize the effective arrival process under any realized queue length, which can be evaluated from the real-world transactional data and is also simulable. We formulate a conceptual optimization model, where the objective is to find appropriate arrival, balking, and reneging input distributions for the simulation model of the queueing system under study such that the difference between the observed and simulated values of this random variable is minimized. An iterative method combining discrete-event simulation and subset selection is then developed to solve the optimization problem and identify appropriate input distributions for entity IAT and balking and reneging thresholds that perform best (statistically) among a set of candidate distributions. The performance of the proposed approach is tested for the case of unknown and known reneging distribution. Through extensive simulation experiments, general guidelines for specifying the parameters of the proposed method are provided. An application using real-world transactional data from a call center is presented to validate and illustrate the efficacy of the proposed method. A complete working example, including all codes and a sample dataset, is provided in an open online Mendeley Data repository associated with this paper.⁴

The remainder of the paper is organized as follows. Section 2 provides a critical review of the related literature. Section 3 presents the definitions, formulation of the estimation problem as an optimization model, and the proposed simulation-based solution method. Section 4 illustrates the applicability of and justification for the proposed solution approach in the case of known and unknown reneging distribution for a hypothetical yet complex queueing situation. Section 5 presents the results of extensive computational experiments and provides additional insights on how to configure the parameters of the proposed method. Section 6 provides a validation study using real-world transactional data from a call center. Finally, section 7 presents the conclusions and potential future extensions.

2. Related literature

This section provides a critical review of the two primary related streams of research on queueing inference and simulation input modeling and calibration with the goal to identify the existing gaps and delineate the contributions of this work to each research stream.

2.1. Queueing inference

Early studies on queueing with balking and/or reneging go back to as early as the 1950s.^5,6 The majority of traditional queueing studies aim at (1) finding steady-state or transient solutions for different types of queues with impatient customers^7–10 and (2) estimating the probability that the total number of customers in the system reaches a certain high level/threshold during a busy cycle of the system.¹¹

The most important issue with these models is that they generally assume that the true arrival, balking, and reneging processes (or at least their mean and variance) are known, which is not the case in many real-world situations. This has given rise to a stream of research on queue inference that deals with indirect inference of unobservable or missing queueing statistics based on some observed output or transactional data.

Perhaps one of the key seminal works in the queueing inference literature is the Queue Inference Engine.¹² This work and its extensions^13–15 primarily focus on using transactional data to estimate the performance of a queueing system directly rather than reconstructing the model by estimating the constituent input models that resulted in the observed dynamics. The other subclass of the queueing inference literature, that is most relevant to this paper, primarily focuses on inferring or estimating the constituent probability distributions of a queueing model based on its output data such as queue length or waiting time data, collected either continuously or at discrete points in time. There are three main limitations associated with these approaches:

The majority of these studies assume that the inferred distributions come from a parametric family and use maximum-likelihood estimators to estimate the parameters of the assumed distribution.^16–21

The studies on nonparametric inference, on the contrary, are only limited to simple queueing structures (e.g., M/G/1 or M/G/ $\infty$ ) and simple priority rules (e.g., first-come-first-served or last-come-first-served). This is mainly to enable deriving closed-form solutions or analytical approximations.^22–26

Even for simple queueing systems, existing work on estimating the true arrival process and balking behavior require the availability of some non-censored data to develop an initial estimate of the inferred distributions.²² However, such data are often not available (as discussed before), further limiting the applicability of these approaches.

The above limitations are primarily due to analytical tractability, hence there is a lack of approaches that are applicable in more complex queueing situations that involve simultaneous presence of balking and reneging in conjunction with other factors that would make the inference problem analytically intractable, such as blocking, nonstandard queue priority rules, and nonstationary stochastic processes. The simulation-based inference approach proposed in this paper contributes to this stream of research in three ways: (1) it does not require simplistic queueing structures and is applicable to any complex situation that can be simulated; (2) it is applicable for any underlying distribution, parametric or not; and (3) it extracts useful information from censored data by using a new random variable that is completely observable regardless of the queueing setting. Therefore, the proposed method is applicable even if no or insufficient non-censored data are available.

2.2. Simulation input modeling and calibration

Stochastic simulation is a common tool for modeling queueing systems in various contexts such as manufacturing,²⁷ healthcare,²⁸ military,²⁹ supply chain,³⁰ and marketing,³¹ to name a few. Stochastic simulation involves a set of input models, mostly in the form of probability distributions, that are used to generate random outputs for subsequent performance analysis. Input models are developed based on data collected on input processes, expert knowledge, or physical justifications. Perhaps the most common approach involves fitting probability distributions to the collected data via goodness-of-fit tests, which are discussed in any introductory reference on simulation input modeling and analysis.³² If there is too much data available, such statistical tests become too sensitive to even minute deviations, leading to a rejection for the fit of almost all distributions. In such cases, input variates can be sampled directly from the real-world data.³³ These common input modeling techniques, however, are not applicable when estimating the true arrival, balking, and reneging distributions as these processes are unobservable and any kind of data available would be censored.

Calibration, on the contrary, can be defined as the process of inferring an input model from output data by iteratively comparing simulation reports and real-world output data and making refinements to the input models and/or model specifications as necessary.^34,35 Common methods for performing these comparisons include statistical tests such as two-sample mean-difference tests³⁶ and the Schruben-Turing test,³⁷ which are generally insufficient to solve the problem considered in this paper even for very simple situations. For instance, consider estimating the after-balk arrival process into a single server based on observations on the average utilization of the server as the observable output. If a simple mean-difference test is used, almost any arrival distribution with the correct mean value would likely pass the test and be deemed appropriate as it would result in a similar mean utilization. Due to this limitation of common calibration methods, inferring appropriate input models from output data is known as an important strategic problem in the simulation field.³⁸

Recently, a few simulation studies tackle the inference problem. Optimization-based simulation techniques to infer service times in emergency departments from incomplete data have been proposed in two papers,^39,40 where only the starting time of service at each stage is recorded, hence the time difference between service start time at stage $i$ and $i - 1$ would include both the service time at stage $i - 1$ and waiting time in the queue for stage $i$ . Both papers first assume that the unobserved service times follow a particular probability distribution family. Then, they propose an approach to search for an appropriate parameter configuration for the assumed service time distribution such that the discrepancy between the observed and simulated time differences generated by a simulation model of the emergency department is minimized. Both papers define this discrepancy as a weighted average of the estimators’ absolute relative error for the means, standard deviations, and proportions of number of observations falling in different time intervals. Moreover, both papers use metaheuristics, namely simulated annealing and genetic algorithm, to perform the search. In another paper,⁴¹ a simulation-based method is proposed to estimate the true demand for bikes and docks in bike-sharing systems in the presence of censoring. However, the method is not directly applicable to queueing systems since the operation of a bike station is fundamentally different from a typical queueing system due to two main reasons. First, the notion of processing time (PT) or a fixed capacity does not apply to a bike station since the number of bikes and docks (i.e., resources) change with stochastic pickup and drop-off events, and that bike check-in and check-out times (service times) are often negligible. Second, customers or arriving bikes generally do not wait in a queue if there is no bike or dock available at the station, hence the nature of balking and reneging dynamics is also fundamentally different.

This paper contributes to this stream of research in three ways: (1) to the best of the author’s knowledge, this is the first paper to propose a simulation-based inference method for simultaneous estimation of the true arrival, balking, and reneging processes in queueing systems; (2) the proposed approach does not require a certain probability distribution family, is not limited to a specific type of censoring or application (e.g., emergency departments or bike sharing systems), and is applicable to any queueing system with any arbitrary level of complexity as long as it can be modeled via simulation; and (3) it proposes a comprehensive quantile-based measure to evaluate the discrepancy between the observed and simulated data (as opposed to only comparing summary statistics such as the mean or standard deviation). This enables detecting and filtering out incorrect potential solutions that may lead to summary statistics similar to the ones observed.

3. Methodology: definitions, conceptual formulation, and solution approach

This section provides the definition of related random variables, formulation of the estimation problem as an optimization model, and the proposed simulation-based solution method.

3.1. Definition of random variables

The following random variables are defined and used extensively throughout the paper. Figure 1 presents an illustrative example with one or more realizations of these random variables for a station with two parallel servers:

IAT models the true (before-balk) IAT of entities (i.e., the actual external arrival process). In the presence of balking, the IAT distribution is generally unobservable and cannot be directly estimated from historical data (say, via goodness-of-fit tests). IAT is one of the random variables that we strive to estimate here.

BT represents the minimum queue length at which an entity balks without entering the queue. The BT distribution is generally unknown and unobservable, prohibiting direct estimation from data. BT is another random variable to be estimated here.

WT represents the maximum amount of time an entity is willing to wait in the queue before reneging without being served. WT may or may not be observable. If each individual entity’s entrance and reneging time is tallied separately, then WT distribution can be estimated directly from data, although in many cases, such data are not readily available and require additional data collection. However, if only the length of the queue is tracked, then WT is not specifiable from data. In this paper, we investigate both known and unknown WT cases.

Successful IAT (SIAT) represents the after-balk IAT of entities into the queueing system. Considering balks as failed arrivals, SIAT is the elapsed time between arrival of two consecutive entities that actually enter the queue (hence the choice of word successful). Note that one or more balks may occur between two successful arrivals (i.e., SIAT censors the true IAT) but this information is not available as balks are not observable.

Valid SIAT (VSIAT) is a special case of SIAT and represents the elapsed time between two consecutive successful arrivals into the system if and only if there is at least one server available after the first arrival so it is guaranteed that no balking occurred between the two arrivals. These are valid observations of the true IAT and can be used directly to estimate the actual external arrival process. The main issue with VSIAT is that for busy intervals that we are concerned with here, the arrival rate exceeds the processing rate and there will be too few or no VSIAT observations available as there is almost always a queue.

Effective IAT (EIAT) is another special case of SIAT that represents the elapsed time between two consecutive successful arrivals into the station if and only if there was no completion or reneging event after the first arrival, i.e., the number in queue (NIQ) does not change between the two successful arrivals (regardless of the occurrence of a balk). Therefore, these values represent the realized or effective arrival process for the corresponding NIQ value. Given a fixed IAT distribution, the EIAT distribution varies per NIQ since the entities balking decision depends on NIQ at the time of their arrival, hence we use the notation EIAT(NIQ) to indicate this dependency. It is important to note that EIAT(NIQ) is completely observable from queue length data, the time of successful arrivals, and service completion times, which are available or can be easily collected for most queueing systems. EIAT(NIQ) is a new random variable defined in this paper and will be used for simulation-based queueing inference as discussed in the following subsection.

PT models the service/processing time for the station with external arrival, which can be directly estimated from data.

Figure 1.

A sample scenario for a station with two parallel servers (c = 2, where c denotes the processing capacity). The time origin is shifted to the arrival of entity 2. The initial state is of the system is shown at the top of the figure. Ai, Ci, and Ri represent the arrival, completion of service, and reneging event for entity $i$ . EIAT(j) represents an observation of EIAT(NIQ) for NIQ = j. The elapsed time between the arrival of entities 0 and 1 represents a VSIAT observation (not shown in the figure).

3.2. Conceptual formulation of estimating arrival, balking, and reneging processes as an optimization problem

Consider a station with external arrivals in a queueing network of an arbitrary level of complexity. Entities’ BT and WT random variates have probability distributions $P_{BT}$ and $P_{WT}$ , and the true IAT random variate has a probability distribution $P_{IAT}$ over a time interval $T$ . For a nonstationary (time-varying) arrival process, this is equivalent to assuming that the arrival process can be approximated by a piecewise-constant rate function by considering intervals (say, hourly) where the arrival process follows a stationary stochastic process and is independently and identically distributed (IID) as shown in Figure 2. This is a common approach in the simulation literature for modeling and generating nonstationary stochastic processes.⁴³ Several tools and approaches are proposed to help identify an appropriate piecewise-constant rate function, e.g., change-point analysis⁴⁴ and visual assessment methods.^32,42,45 Here, we deal with a situation where these intervals are already determined and the goal is to estimate the true (unobservable) $P_{IAT}$ , $P_{BT}$ , and $P_{WT}$ distributions for a particular interval $T$ during which the configuration and logic of the queueing network (e.g., number of servers, PTs, routing logic) remain unchanged. Our focus is primarily on busy intervals where the arrival rate exceeds the processing rate and balking occurs frequently as there is almost always a queue. Consequently, there will be too few or no VSIAT observations available to directly estimate $P_{IAT}$

Figure 2.

A piecewise-constant rate function with hourly intervals for approximating the underlying true nonstationary arrival rate represented by the continuous solid line. The HistoRIA tool⁴² is used to generate the piecewise-constant rates as well as the histogram of inter-arrival times within each 1–hour block.

Let $IAT = (IA T_{1}, IA T_{2}, . . ., IA T_{a})$ be an IID sequence of input variates each distributed according to $P_{IAT}$ over the interval $T$ . Similarly, let $BT = (B T_{1}, B T_{2}, . . ., B T_{b})$ and $WT = (W T_{1}, W T_{2}, . . ., W T_{w})$ be IID sequences of input variates over the same interval distributed according to $P_{BT}$ and $P_{WT}$ , respectively. We let the function $g (.)$ be the logic from input sequences IAT, BT, and WT to the output sequences, each representing the effective IAT for a particular NIQ value $NI Q_{i}$ , i.e., $EIAT (NI Q_{i}) = (EIA T_{1} (NI Q_{i}), EIA T_{2} (NI Q_{i}), . . ., EIA T_{s_{i}} (NI Q_{i}))$ , where $s_{i}$ is the sample size, i.e., number of observations available on $EIAT (NI Q_{i})$ . The function $g (.)$ depends on any aspect of the configuration of the queueing network that affects the process at this station, e.g., blocking, machine failures, rework, routing logic, etc. Therefore, the complexity of $g (.)$ depends on the complexity of the system at hand. We assume that $g (.)$ is a simulable map, i.e., we can use simulation to evaluate the outputs EIAT( $NI Q_{i}$ ) given any set of input distributions.

Recall that IAT and BT sequences are not observable and that WT may or may not be specifiable depending on the data collection process used. EIAT( $NI Q_{i}$ ), however, can be fully observed via data as discussed in section 3.1. In the general case, the demand estimation problem is to calibrate the input parameters $\hat{IAT}$ , $\hat{BT}$ , and $\hat{WT}$ in the simulation model of the queueing system under study such that the resulting simulated $\hat{EIAT} (NI Q_{i})$ sequences are similar (statistically) to the observed real-world EIAT( $NI Q_{i}$ ) sequences (denoted by EIAT⁰( $NI Q_{i}$ )) resulted from the true unknown IAT, BT, and WT. On an abstract level, the estimation problem to be solved can be formulated by the following optimization model:

Minimize | | [\begin{matrix} F (\hat{EIAT} (NI Q_{1})) \\ F (\hat{EIAT} (NI Q_{2})) \\ . . . \\ F (\hat{EIAT} (NI Q_{N})) \end{matrix}] - [\begin{matrix} F ({EIAT}^{0} (NI Q_{1})) \\ F ({EIAT}^{0} (NI Q_{2})) \\ . . . \\ F ({EIAT}^{0} (NI Q_{N})) \end{matrix}] | |

(1)

subject to:

\hat{EIAT} (NI Q_{i}) = \hat{g} (\hat{IAT}, \hat{BT}, \hat{WT}, . . .) \forall i = 1, . . ., N

(2)

\hat{IAT} \in S_{IAT}

(3)

\hat{BT} \in S_{BT}

(4)

\hat{WT} \in S_{WT}

(5)

where $F (X)$ denotes the empirical distribution of random variable $X$ , hence the objective function given by Equation (1) represents (conceptually) the difference between the empirical distributions for $\hat{EIAT} (NI Q_{i})$ and EIAT⁰( $NI Q_{i}$ ) for all $i = 1, . . ., N$ , that is, the overall difference between the simulated and observed EIAT over a set of $N$ realized NIQ values.

Various measures can be used to evaluate the discrepancy between two empirical distributions, such as the difference between the means or variances. In this paper, for each $NI Q_{i}$ value, we compare multiple quantiles of the observed and simulated EIAT( $NI Q_{i}$ ) (from 0.10 to 0.90 in 0.10 increments) for a more comprehensive comparison. We use the mean absolute percentage error (MAPE) of the differences between quantiles of the two distributions to evaluate $F (EIA T^{0} (NI Q_{i})) - F (\hat{EIAT} (NI Q_{i}))$ for each $NI Q_{i}$ in the objective function. Section 4.3 illustrates the reason behind using EIAT( $NI Q_{i}$ ), i.e., why we distinguish between different observed NIQ levels, as well as an aggregate measure of discrepancy used in this paper to identify appropriate input distributions.

In constraint given by Equation (2), $\hat{g} (.)$ represents the simulated version of the actual system logic $g (.)$ , and $\hat{IAT}$ , $\hat{BT}$ , and $\hat{WT}$ represent the input distributions used in the simulation model for generating entity IATs, BTs, and WTs, respectively. The decision variables are the $\hat{IAT}$ , $\hat{BT}$ , and $\hat{WT}$ to be used in the simulation model, where $S_{IAT}$ , $S_{BT}$ , and $S_{WT}$ in constraints given by Equations (3)–(5) represent the discrete space of possible distributions for these input models. The idea is that the $\hat{IAT}$ , $\hat{BT}$ , and $\hat{WT}$ input distributions that minimize the objective function also provide appropriate models for the true unknown IAT, BT, and WT distributions that resulted in $[F (EIA T^{0} (NI Q_{1})), F (EIA T^{0} (NI Q_{2})), \dots, F (EIA T^{0} (NI Q_{N}))]$ in the first place. Note that the decision variables appear in the objective function since the vector $[F (\hat{EIAT} (NI Q_{1})), F (\hat{EIAT} (NI Q_{2})), . . ., F (\hat{EIAT} (NI Q_{N}))]$ depends on the choice of the input distributions $\hat{IAT}$ , $\hat{BT}$ , and $\hat{WT}$ as suggested by constraint given by Equation (2).

3.3. The proposed simulation-based inference method

Figure 3 summarizes the general steps of the proposed iterative approach for solving the optimization model in section 3.2. In each iteration, a set of hypothesized distributions in solution spaces $S_{IAT}$ , $S_{BT}$ , and $S_{WT}$ are evaluated (through simulation and quantile-based comparison with real-world observations) and the best candidate(s) among the hypothesized scenarios are identified (via subset selection). By iterating through this process, the search of the solution space continues until the recommended distributions are deemed appropriate for modeling the true arrival, balking, and reneging processes based on the desired maximum difference between the simulated and observed EIAT values, i.e., the objective function value in Equation (1). A complete working example, including all codes and a sample dataset, is available in a Mendeley Data repository.⁴ The general steps in the solution method can be summarized as follows:

In the first step, we extract EIAT( $NI Q_{i}$ ) observations associated with different observed queue lengths $NI Q_{i}$ from the transactional data. The total EIAT sample size for the observed (real-world) data is denoted by $s_{observed}$ . Note that the sample size for each $NI Q_{i}$ realization depends on how frequently the system is in that state when a successful arrival occurs (see section 4.3 for more details). Only those $NI Q_{i}$ values for which sufficient real-world data are available will be included in the analysis. In all of the experiments performed in this paper, we use 100 as the minimum sample size for EIAT( $NI Q_{i}$ ), $\forall i$ . This is to ensure that extreme quantiles such as 0.10 and 0.90 can be estimated fairly accurately.

Step 2 involves hypothesizing a set of distributions that can potentially be a good model for IAT, BT, and WT. Similar to general simulation input modeling, the goal is not to find the exact distribution family and/or parameters, but rather to find appropriate candidates that provide a reasonable fit. If available, VSIAT observations can be used to hypothesize the IAT distribution. If no or too few VSIAT can be extracted from the transactional data, then expert knowledge or physical justifications could be used (e.g., the exponential distribution is generally found to be a good model for IATs in many applications). Alternatively, data from less busy intervals or even SIAT observations corresponding to very small NIQ levels where balking is unlikely or less likely to occur (i.e., less censoring) can also be used for hypothesizing the distribution family. Hypothesizing distribution families and parameter estimation are well-established topics in the field of simulation input modeling and analysis, hence an extensive discussion on this step is avoided for the sake of conciseness.

Step 3 involves running $n$ replications of the simulation model using each hypothesized scenario, where each replication provides a sample of simulated EIAT( $NI Q_{i}$ ) values to be compared with the corresponding real-world EIAT observations extracted in step 1. The sample size for simulated EIAT values is denoted by $s_{simulated}$ . Note that the sample size for each $NI Q_{i}$ level varies based on the probability of the system being in that state (this is shown in section 4.3). Only those $NI Q_{i}$ values for which sufficient simulated data are available will be used for the comparisons with the respective observed sample. Here, we use 100 as the minimum sample size for simulated EIAT( $NI Q_{i}$ ), $\forall i$ .

Step 4 involves comparing the simulated and observed EIAT( $NI Q_{i}$ ), for all $i$ that meet the minimum sample size requirement for both samples. The MAPE for difference in the quantiles (from 0.10 to 0.90 in increments of 0.10) is used to evaluate the difference between the two empirical distributions. This is under the assumption that the smaller these differences, the better the hypothesized scenario models the true unknown input distributions. The comparison is performed for the simulated data from each simulation replication, hence this step results in $n$ IID observations of the performance of each hypothesized scenario under each $NI Q_{i}$ included in the analysis, which are then used in subset selection.

In step 5, a common subset selection procedure⁴⁶ is used to identify a subset of candidate scenarios that perform best (statistically) among the hypothesized scenarios with probability $\geq 1 - α$ . Algorithm 1 summarizes the subset selection procedure. We use $α = 0.05$ for all experiments. At the end of this step, for each $NI Q_{i}$ included in the analysis, it is determined which hypothesized scenario(s) belong to the potentially good subset in terms of their MAPE values. A hypothesized scenario may be in the recommended subset for only some $NI Q_{i}$ values. Therefore, we use the number of times that a scenario is included in the potentially good subset as the overall performance of that scenario.

Finally, in step 6, depending on the outcome and the desired level of accuracy, the list of hypothesized scenarios can be revised to be used in the next iteration of the proposed framework. This can enable convergence to an appropriate solution through an iterative process. The following section illustrates situations where such revisions/iterations may be needed.

Figure 3.

The proposed simulation-based queueing inference approach.

Algorithm 1: The subset selection procedure
Step 1. For each hypothesized distribution $i$ ( $i = 1, 2, . . ., k$ ), sample $n_{i}$ observations of the performance measure (MAPE or number of statistical difference for percentiles). Let $X_{ij}$ denote the performance of hypothesized distribution $i$ in replication $j$ ( $j = 1, 2, . . ., n_{i}$ ), where the $X_{ij}$ are IID N( $μ_{i}, σ_{i}^{2}$ ) random variables. Step 2. Let: $W_{il} = (\frac{t_{i}^{2} S_{i}^{2}}{n_{i}} + \frac{t_{l}^{2} S_{l}^{2}}{n_{l}}), \forall i \neq l$ where $t_{i} = t_{{(1 - α)}^{\frac{1}{k - 1}}, n_{i} - 1}$ and $t_{β, ν}$ is the $β$ quantile of the $t$ distribution with $ν$ degrees of freedom, $S_{i}^{2}$ denotesthe sample variance for system $i$ , and $α$ is the significance level for the procedure. Step 3. Set: $I = {i : 1 \leq i \leq k and {\bar{X}}_{i} \geq {\bar{X}}_{l} - W_{il}, \forall l \neq i}$ , where ${\bar{X}}_{i}$ is the sample mean for system $i$ . Step 4. Return set $I$ as the potentially good subset of the hypothesized distributions.

Algorithm 1: The subset selection procedure

Step 1. For each hypothesized distribution

i

(

i = 1, 2, . . ., k

), sample

n_{i}

observations of the performance measure (MAPE or number of statistical difference for percentiles). Let

X_{ij}

denote the performance of hypothesized distribution

i

in replication

j

(

j = 1, 2, . . ., n_{i}

), where the

X_{ij}

are IID N(

μ_{i}, σ_{i}^{2}

) random variables.
Step 2. Let:

W_{il} = (\frac{t_{i}^{2} S_{i}^{2}}{n_{i}} + \frac{t_{l}^{2} S_{l}^{2}}{n_{l}}), \forall i \neq l

where

t_{i} = t_{{(1 - α)}^{\frac{1}{k - 1}}, n_{i} - 1}

and

t_{β, ν}

is the

β

quantile of the

t

distribution with

ν

degrees of freedom,

S_{i}^{2}

denotesthe sample variance for system

i

, and

α

is the significance level for the procedure.
Step 3. Set:

I = {i : 1 \leq i \leq k and {\bar{X}}_{i} \geq {\bar{X}}_{l} - W_{il}, \forall l \neq i}

,
where

{\bar{X}}_{i}

is the sample mean for system

i

.
Step 4. Return set

I

as the potentially good subset of the hypothesized distributions.

4. Illustration of the proposed method for a hypothetical queueing system

This section illustrates the applicability of the proposed solution method for solving the estimation problem for a hypothetical queueing network. In section 4.3, we use this queueing example to also illustrate the motivation behind the choice of the objective function in the optimization formulation presented in section 3.

4.1. The queueing network and discrete event simulation model

We consider the queueing network in Figure 4 for the computational experiments discussed here and in section 5. Entities balk if the size of queue at Server 1 is greater than their BT, and renege once their WT is exceeded. It is important to note that this queueing network resembles a fairly complex queueing situation that is beyond analytical tractability since the successful after-balk arrival into the first station depends on not only the balking and reneging behavior of entities, but also on the effect of finite queue at Server 2 that causes blocking at the first station (i.e., when the queue of Server 2 is full, Server 1 is blocked as it cannot pass on the processed entity to Server 2 hence cannot start processing a new entity). This blocking effect itself is a function of non-exponential service times, the rework process, and a combination of probabilistic and conditional routing in subsequent stations. Because of these complexities, the logic $g (.)$ cannot be expressed analytically in closed form, but we can use discrete event simulation to evaluate the outputs EIAT( $NI Q_{i}$ ) given any input distribution for the (unknown) before-balk external entity IAT, BT, and WT to evaluate Equation (2).

Figure 4.

The queueing network. Queue priority rules are all FIFO and processing times are in minutes. FIFO: first-in-first-out.

Recall from section 3.2 that the nonstationary external arrival process is modeled by a piecewise-constant rate function and our goal is to make estimations for a particular busy interval. Each simulation run/replication corresponds to the operation of the system in multiple realizations of the time interval of interest. Initial conditions for each realization of the interval are random according to the historical data. To have control over the sample size for simulated data, the run terminates when $s_{simulated}$ observations of EIAT are obtained. The simulation model represents $\hat{g} (.)$ in the above formulation and is implemented in the Simio software.³³

4.2. Solving the inference problem for the queueing network

Here, we show how the proposed approach can be used to solve the inference problem for the queueing network in Figure 4 for the case of known and unknown WT distribution.

4.2.1. Known reneging, unknown arrival, and balking processes

If the transactional data track each individual entity that joins the queue, then the reneging behavior (i.e., WT distribution) can be directly estimated from data, and the problem reduces to estimating the true arrival process and balking behavior. For the above queueing network, suppose WT ~ triangular(6 min, 9 min, 15 min) and can be estimated from data, and that the true unknown arrival and balking processes are IAT ~ exponential(1.3 min) and BT ~ triangular(3,6,9), respectively. BT is a discrete random variable as it depends on NIQ and a continuous triangular distribution is only used to simplify its representation. The continuous triangular(3,6,9) corresponds to the following discrete probability distribution:

P (BT = x) = (\begin{matrix} \frac{1}{18}, & x = 4, \\ \frac{1}{6}, & x = 5, \\ \frac{5}{18}, & x = 6, \\ \frac{5}{18}, & x = 7, \\ \frac{1}{6}, & x = 8, \\ \frac{1}{18}, & x = 9, \end{matrix}

where $P (BT = x)$ is the probability of an entity having a BT of $x$ . From Figure 4, the mean PT at Server 1 is 1.75 min, hence we are dealing with a busy period where the true arrival rate is greater than the processing rate and we expect significant censoring due to balking. We first run the simulation model of the queueing system using an arbitrary random seed $r$ . The run ends when $s = 5000$ observations of EIAT( $NI Q_{i}$ ) are generated. These are treated as the observed EIAT( $NI Q_{i}$ ) values, i.e., the outcome of Step 1. We begin the first iteration by hypothesizing 15 possible scenarios for IAT and BT distributions as shown in Table 1. In step 3, we perform 20 independent replications for each scenario using 20 random seeds different from $r$ that was used to generate the observed values. Each replication generates $s$ simulated EIAT( $NI Q_{i}$ ) values. In other words, we have $s_{simulated} = s_{observed} = s = 5000$ .

Table 1.

Hypothesized and recommended scenarios for two iterations of the proposed method for the case of known WT. The highlighted cells represent the best performing candidate solutions.

Iteration	Scenario number	Hypothesized scenario (IAT, BT)	Avg MAPE over all NIQ	Selection count based on MAPE
1	1	Exp(1.1), Tri(2,6,10)	7.89	6
	2	Exp(1.1), Tri(3,6,9)	8.38	4
	3	Exp(1.1), Tri(4,6,8)	9.32	1
	4	Exp(1.2), Tri(2,6,10)	8.50	6
	5	Exp(1.2), Tri(3,6,9)	7.50	7
	6	Exp(1.2), Tri(4,6,8)	8.00	5
	7	Exp(1.3), Tri(2,6,10)	8.49	8
	8	Exp(1.3), Tri(3,6,9)	7.92	8
	9	Exp(1.3), Tri(4,6,8)	8.22	8
	10	Exp(1.4), Tri(2,6,10)	8.25	8
	11	Exp(1.4), Tri(3,6,9)	8.43	7
	12	Exp(1.4), Tri(4,6,8)	7.48	8
	13	Exp(1.5), Tri(2,6,10)	8.54	6
	14	Exp(1.5), Tri(3,6,9)	8.56	7
	15	Exp(1.5), Tri(4,6,8)	7.43	6
2	7	Exp(1.3), Tri(2,6,10)	8.68	5
	8	Exp(1.3), Tri(3,6,9)	8.02	8
	9	Exp(1.3), Tri(4,6,8)	8.23	6
	10	Exp(1.4), Tri(2,6,10)	8.35	5
	12	Exp(1.4), Tri(4,6,8)	7.80	8

The first iteration uses 20 replications and 50 replications for the second iteration. The NIQ levels included in the analysis are 0,1,2, …, 7. Therefore, the maximum possible selection count is 8. WT: waiting threshold; IAT: inter-arrival time; BT: balking threshold; MAPE: mean absolute percentage error; NIQ: number in queue.

The percentiles of the observed and simulated samples will be compared in step 4 to obtain 20 observations of the performance (i.e., MAPE) of each scenario under every $NI Q_{i}$ included in the analysis. The subset selection algorithm then uses these observations to determine the potentially good scenarios under each $NI Q_{i}$ . The total number of times a hypothesized scenario is selected by subset selection, hereafter referred to as selection count, is used as an aggregate measure of performance for that scenario. The results from the first iteration are summarized in Table 1 where the highlighted cells indicate the recommended scenarios based on their selection count.

In the second iteration, we increase the number of replications to 50 and only focus on the five scenarios recommended in iteration 1, hoping that this would further narrow down the list of recommended scenarios. We observe that the recommendation now contains only two scenarios and includes the correct estimate (scenario 8). One could perform another iteration with higher number of replications or sample size to select the best scenario among the two recommended scenarios in iteration 2. We assess the effect of these parameters in section 5.

4.2.2. Unknown reneging, arrival, and balking processes

If the transactional data only provide the queue lengths without tracking individual entities, then WT cannot be estimated from data and needs to be inferred along with IAT and BT distributions. In this case, step 2 of the method involves hypothesizing all three distributions. We use the same configuration for the true (unknown) distributions and follow a similar general design for the experiments. The hypothesized and recommended scenarios for the first iteration are summarized in Table 2. In the second iteration, we narrow down the list of candidates from 45 to the 12 recommended scenarios from iteration 1 and increase the number of simulation replications from 20 to 50. Table 3 provides the results for iteration 2. Once again, the correct estimate (scenario 23) is among the two candidates recommended after two iterations.

Table 2.

Hypothesized and recommended scenarios in iteration 1 for the case of unknown WT. The highlighted cells represent the best performing candidate solutions.

Iteration	Scenario number	Hypothesized scenario (IAT, BT, WT)	Avg MAPE over all NIQ	Selection count based on MAPE
1	1	Exp(1.1), Tri(2,6,10), Tri(5,8,14)	9.64	4
	2	Exp(1.1), Tri(3,6,9), Tri(5,8,14)	9.83	2
	3	Exp(1.1), Tri(4,6,8), Tri(5,8,14)	9.74	3
	4	Exp(1.2), Tri(2,6,10), Tri(5,8,14)	9.40	7
	5	Exp(1.2), Tri(3,6,9), Tri(5,8,14)	9.66	6
	6	Exp(1.2), Tri(4,6,8), Tri(5,8,14)	9.88	4
	7	Exp(1.3), Tri(2,6,10), Tri(5,8,14)	8.43	7
	8	Exp(1.3), Tri(3,6,9), Tri(5,8,14)	8.51	7
	9	Exp(1.3), Tri(4,6,8), Tri(5,8,14)	8.72	6
	10	Exp(1.4), Tri(2,6,10), Tri(5,8,14)	8.57	7
	11	Exp(1.4), Tri(3,6,9), Tri(5,8,14)	8.83	7
	12	Exp(1.4), Tri(4,6,8), Tri(5,8,14)	7.91	5
	13	Exp(1.5), Tri(2,6,10), Tri(5,8,14)	10.97	7
	14	Exp(1.5), Tri(3,6,9), Tri(5,8,14)	7.83	7
	15	Exp(1.5), Tri(4,6,8), Tri(5,8,14)	7.34	6
	16	Exp(1.1), Tri(2,6,10), Tri(6,9,15)	7.89	6
	17	Exp(1.1), Tri(3,6,9), Tri(6,9,15)	8.38	3
	18	Exp(1.1), Tri(4,6,8), Tri(6,9,15)	9.32	1
	19	Exp(1.2), Tri(2,6,10), Tri(6,9,15)	8.50	7
	20	Exp(1.2), Tri(3,6,9), Tri(6,9,15)	7.50	7
	21	Exp(1.2), Tri(4,6,8), Tri(6,9,15)	8.00	6
	22	Exp(1.3), Tri(2,6,10), Tri(6,9,15)	8.49	8
	23	Exp(1.3), Tri(3,6,9), Tri(6,9,15)	7.92	8
	24	Exp(1.3), Tri(4,6,8), Tri(6,9,15)	8.22	8
	25	Exp(1.4), Tri(2,6,10), Tri(6,9,15)	8.25	8
	26	Exp(1.4), Tri(3,6,9), Tri(6,9,15)	8.43	8
	27	Exp(1.4), Tri(4,6,8), Tri(6,9,15)	7.48	8
	28	Exp(1.5), Tri(2,6,10), Tri(6,9,15)	8.54	6
	29	Exp(1.5), Tri(3,6,9), Tri(6,9,15)	8.56	7
	30	Exp(1.5), Tri(4,6,8), Tri(6,9,15)	7.43	7
	31	Exp(1.1), Tri(2,6,10), Tri(7,10,16)	7.39	5
	32	Exp(1.1), Tri(3,6,9), Tri(7,10,16)	7.69	5
	33	Exp(1.1), Tri(4,6,8), Tri(7,10,16)	8.47	4
	34	Exp(1.2), Tri(2,6,10), Tri(7,10,16)	7.41	7
	35	Exp(1.2), Tri(3,6,9), Tri(7,10,16)	6.91	7
	36	Exp(1.2), Tri(4,6,8), Tri(7,10,16)	6.77	7
	37	Exp(1.3), Tri(2,6,10), Tri(7,10,16)	8.17	8
	38	Exp(1.3), Tri(3,6,9), Tri(7,10,16)	7.94	8
	39	Exp(1.3), Tri(4,6,8), Tri(7,10,16)	8.50	8
	40	Exp(1.4), Tri(2,6,10), Tri(7,10,16)	8.32	7
	41	Exp(1.4), Tri(3,6,9), Tri(7,10,16)	8.09	8
	42	Exp(1.4), Tri(4,6,8), Tri(7,10,16)	8.00	8
	43	Exp(1.5), Tri(2,6,10), Tri(7,10,16)	8.92	6
	44	Exp(1.5), Tri(3,6,9), Tri(7,10,16)	8.71	8
	45	Exp(1.5), Tri(4,6,8), Tri(7,10,16)	7.67	5

The first iteration uses 20 replications. The NIQ levels included in the analysis are 0,1,2, …, 7. Therefore, the maximum possible selection count is 8. WT: waiting threshold; IAT: inter-arrival time; BT: balking threshold; MAPE: mean absolute percentage error; NIQ: number in queue.

Table 3.

Hypothesized and recommended scenarios in iteration 2 for the case of unknown WT. The highlighted cells represent the best performing candidate solutions.

Iteration	Scenario number	Hypothesized scenario (IAT, BT, WT)	Avg MAPE over all NIQ	Selection count based on MAPE
2	22	Exp(1.3), Tri(2,6,10), Tri(6,9,15)	8.68	5
	23	Exp(1.3), Tri(3,6,9), Tri(6,9,15)	8.02	8
	24	Exp(1.3), Tri(4,6,8), Tri(6,9,15)	8.24	5
	25	Exp(1.4), Tri(2,6,10), Tri(6,9,15)	8.35	5
	26	Exp(1.4), Tri(3,6,9), Tri(6,9,15)	8.10	7
	27	Exp(1.4), Tri(4,6,8), Tri(6,9,15)	7.80	6
	37	Exp(1.3), Tri(2,6,10), Tri(7,10,16)	8.28	4
	38	Exp(1.3), Tri(3,6,9), Tri(7,10,16)	7.84	8
	39	Exp(1.3), Tri(4,6,8), Tri(7,10,16)	8.28	5
	41	Exp(1.4), Tri(3,6,9), Tri(7,10,16)	8.00	6
	42	Exp(1.4), Tri(4,6,8), Tri(7,10,16)	8.21	7
	44	Exp(1.5), Tri(3,6,9), Tri(7,10,16)	8.53	4

The second iteration uses 50 replications. The NIQ levels included in the analysis are 0,1,2, …, 7. Therefore, the maximum possible selection count is 8. WT: waiting threshold; NIQ: number in queue; IAT: inter-arrival time; BT: balking threshold; MAPE: mean absolute percentage error.

4.3. Motivation behind using $EIAT (NIQ)$

The effective arrival process depends on and is a function of NIQ since entities decide to balk based on NIQ at the time of arrival. More specifically, both the frequency (sample size) and distribution of EIAT observations vary for different NIQ values. Since we compare quantiles of the simulated and observed samples, we only consider those NIQ levels for which a sufficiently large sample size is available to enable estimating some of the extreme quantiles such as 0.10 and 0.90. In the above examples, we excluded NIQ = 8 from the analysis due to its small sample size ( $< 100$ ) as shown in Figure 5(a).

Figure 5.

Frequency and distribution of EIAT observations for different NIQ values based on a total of 5000 EIAT observations collected from the queueing network in Figure 4. (a) Count (sample size) of EIAT observations for different NIQ values. (b) Cumulative probability function of EIAT observations for NIQ = 3, 5, and 7.

Figure 5(b) shows how the distribution of EIAT varies per NIQ level. An appropriate model for the true unknown IAT, BT, and WT distributions is one that results in similar dynamics for all NIQ levels not just some of them. Therefore, we need to perform comparisons under all NIQ values with a sufficiently large sample size. To further illustrate this point, consider Figure 6 that shows the cumulative distribution function for a triangular and uniform BT. Given an external arrival process, the two scenarios would result in statistically similar effective arrival processes for $NIQ \leq 3$ (no balking) and $NIQ = 6$ where 50% of entities balk in both cases. Note that all entities balk if $NIQ = 9$ , so there will be no observation associated with EIAT(NIQ = 9). Using the proposed selection count measure over all NIQ values between 0 and 8 (given sufficient sample size), we increase the chance of detecting such differences even though for some NIQs the two scenarios lead to statistically similar results. Therefore, selection count is an effective and statistically valid measure in providing an overall estimate of the performance of hypothesized scenarios.

Figure 6.

The cumulative distribution function for a triangular and uniform balking threshold.

5. Computational experiments and results

The above results suggest that, given a sufficient sample size and number of simulation replications, the proposed method has the statistical power to detect the correct scenario if it is among the hypothesized candidates. In this section, we perform additional experiments with the same queueing network to further investigate how the estimated performance for the correct scenario changes with number of replications ( $n$ ), sample size for simulated EIAT ( $s_{simulated}$ ), and sample size for observed EIAT ( $s_{observed}$ ). Table 4 summarizes the design of these experiments. It is worth noting that experiments with other queueing network structures with different complexity levels resulted in similar general findings. However, space limitations preclude presenting the results for all of these systems.

Table 4.

Design and parameter choices for simulation experiments.

Parameter	Value/range
True inter-arrival time (IAT)	Exp(1.3) min
True balking threshold (BT)	Tri(3,6,9)
True waiting/reneging threshold (WT)	Tri(6,9,15) min
Number of simulation replications	10–500 in increments of 10
Sample size for simulated EIAT ( $s_{simulated}$ )	500–10,000 in increments of 500
Sample size for observed EIAT ( $s_{observed}$ )	500–10,000 in increments of 500
Percentiles for comparison	10th to 90th in increments of 10%
Confidence level for subset selection	95%

EIAT: effective inter-arrival time.

5.1. The effect of the number of simulation replications ( $n$ )

In both cases solved in section 4, we increased the number of replications in the second iteration of the proposed method to obtain higher statistical power. Figure 7 shows the effect of number of simulation replications on the performance of the correct scenario. We observe that while running additional replications has virtually no effect on the average MAPE over all NIQ values, it significantly decreases the standard error for the overall MAPE $(\frac{σ_{MAPE}}{\sqrt{n}})$ . This results in narrower confidence intervals for the estimated performance, which increases the statistical power of subset selection in detecting and excluding inferior scenarios. As a result, the choice of the number of replications will depend on two factors: (1) the analyst’s desired standard error and (2) the available computational resources/time.

Figure 7.

The effect of number of simulation replications on the estimate of the performance of the correct scenario. We use $s_{observed} = s_{simulated} = 5000$ in these experiments. The orange line shows the mean overall MAPE.

5.2. The effect of sample size for the simulated EIAT values ( $s_{simulated}$ )

For the examples in section 4, in the second iteration, the proposed method, one could also increase the size of the simulated EIAT sample ( $s_{simulated}$ ) to obtain a better accuracy for the correct scenario. Here, we further study the effect of this parameter. Figure 8 shows the effect of the ratio of the sample size for simulated EIAT to observed EIAT $(\frac{s_{simulated}}{s_{observed}})$ on the performance of the correct scenario. Given a fixed $s_{observed}$ , as $\frac{s_{simulated}}{s_{observed}}$ increases, both the average overall MAPE (indicated by the orange line) and the variability in the overall MAPE (indicated by the box plots) decrease. However, no significant statistical improvement is detected for $\frac{s_{simulated}}{s_{observed}} > 1.0$ . Determining an appropriate sample size for simulated values depends on the complexity of the simulation model and computational time required to run each replication until $s_{simulated}$ is obtained. For the queueing system under study, a replication with $s_{simulated} =$ 10,000 takes about 40 s to run on a typical laptop with a Core i7 2.60 GHz CPU and 16GB of memory, hence computational time was not a binding constraint. Based on these observations, we recommend setting $s_{simulated}$ at least equal to $s_{observed}$ for complex models with high run time. For models that run faster, use a $s_{simulated} > s_{observed}$ so long as computational limits are not exceeded.

Figure 8.

Average MAPE over all NIQ values for the correct scenario as a function of $\frac{s_{simulated}}{s_{observed}}$ . In these experiments, we set $s_{observed} = 5000$ and vary the size of the simulated samples from 500 to 10,000 in increments of 500.

5.3. The effect of sample size for the observed real-world EIAT values $(s_{observed})$

Unlike number of replications and $s_{simulated}$ that are under our control (within the available computational capacity), $s_{observed}$ is often out of our control and we are limited to the amount of data available. However, for the sake of comprehensiveness, and to provide guidance for data collection whenever possible, we further analyze the effect of the sample size for the observed real-world data $(s_{observed})$ on the performance of the correct scenario as summarized in Figure 9. In all of these experiments, we set $s_{simulated} = s_{observed}$ following the above general guidelines.

Figure 9.

MAPE under each NIQ value for the correct scenario as a function of sample size when $s_{simulated} = s_{observed}$ .

For all NIQ, both variability and the average MAPE decrease with larger sample size (which is expected). For extreme NIQ values (in this case 0 and 7), since the probability of the system being in these states is relatively small, the sample size for both the observed and simulated EIAT( $NI Q_{i}$ ) are smaller than other NIQ values, hence we observe higher variability and MAPE associated with these extreme NIQ values. Moreover, for small sample sizes (say, less than 4000), the minimum sample size requirement of 100 is not met for NIQ = 0 and 7, hence no box plot is provided for these cases as they are excluded from the analysis and will not be used in comparisons. Therefore, a larger sample size increases the chance that more NIQ levels will be included in the analysis, and as discussed in section 4.3, this enhances the proposed method’s ability to detect deviations in the hypothesized scenarios from the true (unknown) distributions being estimated.

6. Real-world application and validation: a call center

A call center is a special case since the true arrival, balking, and reneging processes are observable. Here, we use real-world transactional data from a call center of a bank to validate the efficacy of the proposed approach. The data are available at http://iew3.technion.ac.il/serveng/callcenterdata/ and include about 1,200,000 calls over a period of 1 year.

Figure 10 shows the basic operation of the call center. Calls first go through the automated voice response unit (VRU), where customers receive recorded information, e.g., store locations/hours and account balances. There are 65 VRU lines and busy signals (lost demand) at this stage are extremely rare. Roughly 65% of customers complete their service via the VRU and leave the system. The remaining 35% request to speak with a human operator and service begins immediately if an operator is available. Otherwise, customers join a queue and are served on a first-in-first-out basis based on their arrival time at the queue. Waiting customers periodically receive information about their position in the queue and the amount of time that the first person in line has been waiting. The announcement repeats every 60 s, with music, news, and commercials in between.

Figure 10.

The process flow for the call center under study.

In our analysis, we treat the input process into the operator queue as the external arrival process, that is, the customers that request to speak to an operator after exiting the VRU. The data provide the time stamps associated with all possible events. For every call, we have the time of arrival at the queue or start of service (if there is no queue), the time exiting the queue, and the manner in which it exited the queue (started service or abandoned), and if served, the starting and ending time of service. There were several operational changes during that year, all occurring before November. Therefore, for consistency purposes, we only focus on non-holiday weekdays in November and December during the morning peak hours from 09:00 to 11:00 h.

We consider abandonments with a waiting time of 15 s or less as balks, representing two groups of customers: (1) those that are not willing to wait at all and balk right away after hearing “please wait” and realizing that they need to join a queue (which takes a few seconds), and (2) those that may be willing to wait but decide to balk once they are notified about their position in the queue because it exceeds their perceived BT. This assumption is consistent with many other studies that also used this dataset⁴⁷ and references therein. In the intervals under consideration, there were a total of 13,468 arrivals (calls needing operator assistance) from which 2625 balked. Therefore, the demand estimate based on censored data from successful arrivals would underestimate the true demand rate by $\frac{2625}{13468} = 19.5 %$ . Also, a total of 4952 EIAT observations can be extracted from the after-balk data, hence we have $s_{observed} = 4952$ .

The true IAT, BT, and WT can be estimated from the data. The IAT is found to be exponentially distributed with mean 0.389 min, the WT is estimated to be exponential with a mean of 1.744 min, and the BT is estimated to follow triangular(0,12,24). The details of data analysis and goodness-of-fit tests are omitted for brevity. The reader is referred to any general discussion on simulation input analysis and distribution fitting,³² and to detailed statistical analysis of this call center’s dataset available in the literature.⁴⁷

In this analysis, we consider the case of known reneging and use the proposed method to estimate the true IAT and BT distributions (assumed to be unknown). There were eight agents available during the intervals under study, hence we use a simulation model of a server with eight parallel processing stations, where customers balk based on the observed NIQ at the time of arrival and renege once their WT is exceeded. We set $s_{simulated} = s_{observed} = 4952$ . Table 5 summarizes the results for two iterations of the proposed method. In iteration 1, we use 16 hypothesized scenarios and 50 simulation replications. The list of hypothesized candidates is then refined to be used in the second iteration. Since only Exp(0.30) and Exp(0.35) are recommended in iteration 1, we vary the mean for the hypothesized IAT distributions in iteration 2 from 0.30 to 0.40 in increments of 0.01. The only recommended scenario in iteration 2 is the correct set of estimates, validating the efficacy of the proposed method in a real-world situation.

Table 5.

Hypothesized and recommended scenarios for the real-world call center application. The highlighted cells represent the best performing candidate solutions.

Iteration	Hypothesized scenario (IAT, BT)	Avg MAPE over all NIQ	Selection count based on MAPE
1	Exp(0.30), Tri(0,12,24)	15.97	4
	Exp(0.35), Tri(0,12,24)	16.22	4
	Exp(0.40), Tri(0,12,24)	16.29	3
	Exp(0.45), Tri(0,12,24)	16.68	3
	Exp(0.30), Tri(0,7,14)	17.35	3
	Exp(0.35), Tri(0,7,14)	17.28	3
	Exp(0.40), Tri(0,7,14)	15.67	3
	Exp(0.45), Tri(0,7,14)	19.26	2
	Exp(0.30), Uni(0,24)	16.15	2
	Exp(0.35), Uni(0,24)	15.24	4
	Exp(0.40), Uni(0,24)	15.46	3
	Exp(0.45), Uni(0,24)	18.79	2
	Exp(0.30), Uni(0,14)	16.54	4
	Exp(0.35), Uni(0,14)	16.06	2
	Exp(0.40), Uni(0,14)	18.61	2
	Exp(0.45), Uni(0,14)	23.24	1
2	Exp(0.30), Tri(0,12,24)	16.11	5
	Exp(0.31), Tri(0,12,24)	15.76	6
	Exp(0.32), Tri(0,12,24)	15.75	5
	Exp(0.33), Tri(0,12,24)	15.64	4
	Exp(0.34), Tri(0,12,24)	15.67	3
	Exp(0.35), Tri(0,12,24)	16.02	3
	Exp(0.36), Tri(0,12,24)	16.97	5
	Exp(0.37), Tri(0,12,24)	15.18	3
	Exp(0.38), Tri(0,12,24)	15.32	4
	Exp(0.39), Tri(0,12,24)	14.80	7
	Exp(0.40), Tri(0,12,24)	15.73	2
	Exp(0.35), Uni(0,24)	15.16	3
	Exp(0.30), Uni(0,14)	16.63	3

The first iteration uses 50 replications and 75 replications for the second iteration. IAT: inter-arrival time; BT: balking threshold; MAPE: mean absolute percentage error; NIQ: number in queue.

7. Conclusion

The paper formulates the estimation of the true arrival, balking, and reneging processes in a queueing system as an optimization model and proposes an iterative simulation-based inference approach integrating quantile-based measures and subset selection to detect appropriate probability distributions for modeling these processes. The method is applicable in any complex queueing situation as long as it can be simulated, and for the case of known and unknown reneging distribution. Extensive simulation experiments are used to develop general guidelines for specifying the parameters of the proposed approach, namely sample size and number of replications, given limited computational resources/time. The method is also validated via a real-world application in a call center, where the true arrivals, balking, and reneging events are observable.

There are two aspects of the proposed method that would benefit from further clarification:

Possibility of multiple solutions: There may be more than a single set of IAT, balking, and reneging distributions that result in similar dynamics, and the proposed method does not guarantee returning the exact answer. However, this limitation is not unique to the method proposed here. Even simple distribution fitting commonly used in simulation input analysis does not provide such guarantees. Goodness-of-fit tests often fail to reject the fit for more than one distribution, requiring the analyst to consider other factors (such as empirical justification) before choosing among the candidate distributions not rejected by the statistical test. In fact, due to the same reason, the goal of input analysis is never to identify the right distribution, but to help select an appropriate distribution that simply provides a reasonable model for the data. The method presented in this paper has a similar goal.

Computational time: For each hypothesized scenario, the proposed method involves performing simulations to obtain a sufficient sample size for the simulated data, and requires comparing multiple percentiles of EIAT(NIQ) under different NIQ values. This process may become computationally expensive depending on the complexity of the simulation, number of hypothesized scenarios, and desired sample sizes. However, it is important to note that we are not concerned with real-time decision-making here. Instead, we strive to estimate balking and reneging thresholds, which are behavioral characteristics of individuals and irrespective of the real-time system performance. Therefore, computational overhead is irrelevant in this paper since there is no meaningful practical significance as to whether these estimates are obtained in a few seconds or a few minutes.

As an interesting area for future research, one could investigate the applicability of the proposed method for the case of infinite variance IATs such as those commonly found in telecommunication, computer networks, traffic, and finance applications. The proposed method can also be extended to include indifference-zone analysis, which requires additional simulation replications of the competitive candidates returned by subset selection. Such procedures guarantee to select the single best candidate with a probability greater than or equal to the confidence level, whenever the best hypothesized scenario is at least a user-specified amount better than the other candidates. One must be careful when using such methods instead of or in addition to subset selection. In particular, indifference-zone analysis should not be used unless sufficient iterations of the proposed method are performed so that the list of hypothesized distributions is likely to include high-quality candidates. Otherwise, indifference-zone analysis may result in the selection of the best scenario among a set of bad candidates.

It is hoped that the proposed method and its extensions will help businesses better estimate balking and reneging dynamics and facilitate effective decision-making to minimize unsatisfied demand and improve service quality.

Footnotes

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

ORCID iD

Ashkan Negahban

Data accessibility statement

A working example, including all codes and a sample dataset are available in an open online Mendeley Data repository associated with this paper.⁴ The files associated with the repository are licensed under a Creative Commons Attribution 4.0 International license. You can share, copy, and modify the files so long as you give appropriate credit to the original author, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset.

Author biography

Ashkan Negahban is an Assistant Professor of Engineering Management at the School of Graduate Professional Studies at The Pennsylvania State University (USA). He received his PhD and Master’s degrees from Auburn University (USA ) and his BS from University of Tehran (all in Industrial and Systems Engineering). His research involves stochastic simulation methods, primarily agent-based and discrete-event simulation. His email and web addresses are mailto://anegahban@psu.eduanegahban@psu.edu and http://ashkannegahban.com.

References

Gao

Service performance analysis and improvement for a ticket queue with balking customers. Manag Sci 2007; 53: 971–990.

Jennings

Pender

Comparisons of ticket and standard queues. Queueing Syst 2016; 84: 145–202.

Guo

Zipkin

The effects of the availability of waiting-time information on a balking queue. Eur J Oper Res 2009; 198: 199–209.

Negahban

Simulation-based queue inference: a working example including codes and sample data. Mendeley Data, 2019, DOI: 10.17632/ddn55j7shv.1.

Haight

FA.

Queueing with balking. Biometrika 1957; 44: 360–369.

Haight

FA.

Queueing with reneging. Metrika 1959; 2: 186–197.

Wang

Jiang

Queueing system with impatient customers: a review. In: Proceedings of 2010 IEEE international conference on service operations and logistics, and informatics, QingDao, China, 15–17 July 2010, pp. 82–87. New York: IEEE.

Natvig

On the transient state probabilities for a queueing model where potential customers are discouraged by queue length. J Appl Probab 1974; 11: 345–354.

Perel

Yechiali

Queues with slow servers and impatient customers. Eur J Oper Res 2010; 201: 247–258.

10.

Al-Seedy

El-Sherbiny

El-Shehawy

, et al. Transient solution of the M/M/c queue with balking and reneging. Comput Math Appl 2009; 57: 1280–1285.

11.

Buijsrogge

de Boer

Scheinhardt

WRW

. Importance sampling for Markovian tandem queues using subsolutions: exploring the possibilities. Simulation 2021; 97: 849–866.

12.

Larson

RC.

The queue inference engine: deducing queue statistics from transactional data. Manag Sci 1990; 36: 586–601.

13.

Mandelbaum

Zeltyn

Estimating characteristics of queueing networks using transactional data. Queueing Syst 1998; 29: 75–127.

14.

Jang

Suh

Liu

A new procedure to estimate waiting time in GI/G/2 system by server observation. Comp Oper Res 2001; 28: 597–611.

15.

Frey

Kaplan

EH.

Queue inference from periodic reporting data. Oper Res Lett 2010; 38: 420–426.

16.

Basawa

Bhat

Lund

Maximum likelihood estimation for single server queues from waiting time data. Queueing Syst 1996; 24: 155–167.

17.

Pickands

Stine

RA.

Estimation for an M/G/∞ queue with incomplete information. Biometrika 1997; 84: 295–308.

18.

Ross

Taimre

Pollett

PK.

Estimation for queues from queue length data. Queueing Syst 2007; 55: 131–138.

19.

Basawa

Bhat

Zhou

Parameter estimation using partial information with applications to queueing and related models. Stat Probab Lett 2008; 78: 1375–1383.

20.

Heckmüller

Wolﬁnger

BE.

Reconstructing arrival processes to discrete queueing systems by inverse load transformation. Simulation 2011; 87: 1033–1047.

21.

Whitt

Fitting birth-and-death queueing models to data. Stat Probab Lett 2012; 82: 998–1004.

22.

Jones

LK.

Inferring balking behavior from transactional data. Oper Res 1999; 47: 778–784.

23.

Bingham

Pitts

SM.

Nonparametric inference from M/G/1 busy periods. Stoch Model 1999; 15: 247–272.

24.

Hall

Park

Nonparametric inference about service time distribution from indirect measurements. J R Statist Soc 2004; 66: 861–875.

25.

Kim

Park

. New approaches for inference of unobservable queues. In: Proceedings of the 2008 winter simulation conference, Miami, FL, 7–10 December 2008, pp. 2820–2825. New York: IEEE.

26.

Park

Kim

Willemain

TR.

Analysis of an unobservable queue using arrival and departure times. Comp Indus Eng 2011; 61: 842–847.

27.

Negahban

Smith

JS.

Simulation for manufacturing system design and operation: literature review and analysis. J Manuf Syst 2014; 33: 241–261.

28.

Mielczarek

Uzialko-Mydlikowska

Application of computer simulation modeling in the health care sector: a survey. Simulation 2012; 88: 197–216.

29.

Naseer

Eldabi

Jahangirian

Cross-sector analysis of simulation methods: a survey of defense and healthcare. Transf Gov People Process Policy 2009; 3: 181–189.

30.

Oliveira

Jin

Lima

, et al. The role of simulation and optimization methods in supply chain risk management: performance and review standpoints. Simul Model Pract Theory 2019; 92: 17–44.

31.

Negahban

Yilmaz

Agent-based simulation applications in marketing research: an integrated review. J Simul 2014; 8: 129–142.

32.

Vincent

Input data analysis. In: Banks

(ed.) Handbook of simulation. Hoboken, NJ: John Wiley & Sons, Inc., 1998, pp. 55–90.

33.

Smith

Sturrock

Kelton

WD.

Simio and simulation: modeling, analysis, applications. 5th ed. Pittsburgh, PA: Simio LLC, 2018.

34.

Sargent

. Verification and validation of simulation models. In: Proceedings of the 2005 winter simulation conference, Orlando, FL, USA, 4-7 December 2005, pp. 130–143.Piscataway, New Jersey, USA: IEEE.

35.

Lin

Qian

, et al. An evidence theory-based validation method for models with multivariate outputs and uncertainty. Simulation 2021; 97: 821–834.

36.

Balci

Sargent

. Some examples of simulation model validation using hypothesis testing. In: Proceedings of the 1982 winter simulation conference, San Diego, CA, USA, 6–8 December 1982, pp. 621–629.Piscataway, New Jersey, USA: IEEE.

37.

Schruben

LW.

Establishing the credibility of simulations. Simulation 1980; 34: 101–105.

38.

Nelson

BL.

Some tactical problems in digital simulation for the next 10 years. J Simul 2016; 10: 2–11.

39.

Kuo

Rado

Lupia

, et al. Improving the efficiency of a hospital emergency department: a simulation study with indirectly imputed service-time distributions. Flex Serv Manuf J 2016; 28: 120–147.

40.

Guo

Goldsman

Tsui

, et al. Using simulation and optimisation to characterise durations of emergency department service times with incomplete data. Int J Prod Res 2016; 54: 6494–6511.

41.

Negahban

Simulation-based estimation of the real demand in bike-sharing systems in the presence of censoring. Eur J Oper Res 2019; 277: 317–332.

42.

Ansari

Negahban

Megahed

, et al. HistoRIA: a new tool for simulation input analysis. In: Proceedings of the 2014 winter simulation conference, Savannah, GA, 7–10 December 2014, pp. 2702–2713. New York: IEEE.

43.

Morgan

Titman

Worthington

, et al. Input uncertainty quantification for simulation models with piecewise-constant non-stationary Poisson arrival processes. In: Proceedings of the 2016 winter simulation conference, Washington, DC, 11–14 December 2016, pp. 370–381. New York: IEEE.

44.

Chen

Gupta

AK.

Parametric statistical change point analysis: with applications to genetics, medicine, and finance. 2nd ed. New York: Springer Science & Business Media, 2011.

45.

Negahban

Ansari

Smith

. ADD-MORE: automated dynamic display of measures of risk and error. In: Proceedings of the 2016 winter simulation conference, Washington, DC, USA, 11-14 December 2016, pp. 977–988.Piscataway, New Jersey, USA: IEEE.

46.

Boesel

Nelson

Kim

SH.

Using ranking and selection to “clean up” after simulation optimization. Oper Res 2003; 51: 814–825.

47.

Brown

Gans

Mandelbaum

, et al. Statistical analysis of a telephone call center. J Am Stat Assoc 2005; 100: 36–50.

Estimating the true arrival,balking,and reneging processes from censored transactional data: a simulation-based approach

Abstract

Keywords

1. Introduction

2. Related literature

2.1. Queueing inference

2.2. Simulation input modeling and calibration

3. Methodology: definitions, conceptual formulation, and solution approach

3.1. Definition of random variables

3.2. Conceptual formulation of estimating arrival, balking, and reneging processes as an optimization problem

3.3. The proposed simulation-based inference method

4. Illustration of the proposed method for a hypothetical queueing system

4.1. The queueing network and discrete event simulation model

4.2. Solving the inference problem for the queueing network

4.2.1. Known reneging, unknown arrival, and balking processes

4.2.2. Unknown reneging, arrival, and balking processes

4.3. Motivation behind using EIAT ( NIQ )

5. Computational experiments and results

5.1. The effect of the number of simulation replications ( n )

5.2. The effect of sample size for the simulated EIAT values ( s simulated )

5.3. The effect of sample size for the observed real-world EIAT values ( s observed )

6. Real-world application and validation: a call center

7. Conclusion

Footnotes

Funding

ORCID iD

Data accessibility statement

Author biography

References

4.3. Motivation behind using $EIAT (NIQ)$

5.1. The effect of the number of simulation replications ( $n$ )

5.2. The effect of sample size for the simulated EIAT values ( $s_{simulated}$ )

5.3. The effect of sample size for the observed real-world EIAT values $(s_{observed})$