Offline Learning and Optimization for Multi-Product Inventory Management With Stockout-Based Substitution

Abstract

We deal with a retailer’s multi-product inventory system where customers randomly seek acceptable substitutes if their initial requests are not satisfied. Unsuccessful quests for substitutes would result in lost sales. Motivated by a consulting project as well as other real practices, we also choose to deal with further challenges posed by initially unknown base demand distributions and substitution probabilities; moreover, we aim to develop an offline learning method that (i) takes advantage of given data derived from bygone decisions rather than of any learning-while-doing opportunity, (ii) makes inferences about base-demand distributions and pairwise substitution probabilities without knowing whether there have been demand arrivals after inventory depletions, and (iii) is unhindered by the lack of knowledge about the assortments faced by customers and their purchase-or-no-purchase decisions at their individual arrivals. To address these challenges, we propose an innovative approach based on the Kaplan-Meier estimator that circumvents unrealistic data requirements. Our substitution probability estimates employ carefully designed weighting schemes to facilitate rigorous theoretical analysis through tools such as the Cauchy-Schwartz inequality. Both one-time substitution scenarios and more complex Markov-chain substitution patterns would be accommodated. Using large deviation tools, we establish provably optimal convergence rates of our estimates on top of consistency. The precision in parameter estimates would translate into accuracy in replenishment decisions. For inventory management, we take advantage of a submodularity property to obtain an exact algorithm for the two-product case and a good heuristic for the general multi-product problem. Computational studies based on simulated and actual data confirm the merits of our approach.

Keywords

Multi-Product Inventory Substitution Censored Data Demand Learning

1. Introduction

Lost sales figure prominently in retailers’ day-to-day operations. Approximately $984B out of the world’s $28.64T retail turnover is lost to stockouts (Buzek, 2018; eMarketer, 2022). Among them, as much as 47% are attributed to inadequacies in store-level forecasting and replenishment (Gruen and Corsten, 2007). On the flip side, stockouts are not always detrimental. They may prevent spoilage and induce substitution demands for more profitable products (Zhang et al., 2021).

Within this context, we investigate learning methods of demand patterns and substitution behaviors, with some attention also paid to the optimization portion after parameter learning. Besides censoring which has long plagued demand learning involving lost sales, we face the additional challenge stemming from our data being intertwined with the concerned retailer’s past replenishment activities that are beyond our control. In addition, our applications demand us to estimate all pair-wise substitution probabilities directly instead of assessing potentially much fewer behind-the-scenes parameters that prop up the substitution behaviors.

1.1. Motivation by a Large European Chain

A European grocery chain has several hundred stores that are supplied with various breads from ten production facilities. Every store in the chain places and receives orders for a variety of fresh bread types before its daily opening. Any remaining stock at the end of a day is scrapped.

The store faces inaccurate demand information for each stock-keeping unit (SKU), while the stockout of one SKU induces the censoring of its own demand and substitution requests being made to one or more other SKUs. Further complicating the problem is that substitution probabilities are unknown and must be estimated along with the original base demand per SKU. One might think that a positive stock level at the end of the day would allow the true demand to be captured by that day’s sales records; however, some of the sales for that SKU might originate from other products that have run out of stock. In other words, observed sales are not true demand: the stockout of one product causes not only the censoring of its own demand but also a distortion of the demands for its substitutable products.

To better monitor the substitution-induced demand generated by stockouts, the company split the business period into hourly blocks and recorded the stores’ inventory levels at the beginning of each block. It gave us one year’s worth of historical sales and stock data at one of its stores, and then asked us to make forecasts, identify substitutions, and suggest replenishment quantities for the stores.

Figure 1 illustrates the store’s demand pattern. It involves $12$ products represented by vertices, whose sizes reflect average historical sales. The substitution structure is built on a small-scale customer survey conducted by the chain. Now, an arc from $j$ to $i$ represents the knowledge that upon stockout of product $j$ , customers originally seeking this product may instead attempt to purchase product $i$ as a substitute. The number associated with this arc is the estimated frequency of customers who make this attempt upon stockout. The absence of an arc between two products indicates statistically insignificant substitution attempts.

Figure 1.

A rough pattern of product substitutions.

In Figure 1, substitutions are limited to product clusters, with the largest involving seven products. For instance, a shortage of Value multigrain bloomer would call for the help of Value wheat bloomer, while no other shortage would affect this product. We may observe two key patterns: (1) not all pairs of products have substitution relationships; and (2) the substitution frequencies are not correlated with the concerned products’ average historical sales. These prompt us to build a model wherein each product has a base demand and when a stockout occurs, customers seek out alternatives following certain probabilities. This probability-based substitution approach is well-known in inventory management; see, for example, Netessine and Rudi (2003), Nagarajan and Rajagopalan (2008), Zhang et al. (2021), and Chen and Chao (2020).

We recognize that discrete choice models are frequently used for multi-product operations management. They typically require detailed sales data that cover (1) assortments faced by customers at their arrivals and (2) customers’ no-purchase decisions; see, for example, Şimşek and Topaloglu (2018) and Chen et al. (2022). Unfortunately, such detailed transaction-level data are unavailable to the European grocery chain. Therefore, estimations based on choice models are unsuitable for the current problem, as well as other problems with only aggregate sales data.

Some works like Newman et al. (2014) and Abdallah and Vulcano (2021) made efforts to circumvent the need for observing no-purchase decisions. Still, they had to assume knowledge of assortments at various transactions, special demand arrival patterns like Poisson, and parametric substitution behaviors like multinomial logit (MNL). The latter works rely on the independence of irrelevant alternatives (IIA), which is violated by our European chain as shown in Figure 1. For instance, a shortage of Value multigrain bloomer would call for the help of Value wheat bloomer, which is not affected by the shortage of any other product.

In contrast, using merely inventory-level records at pre-fixed time points, our method can estimate base-demand distributions and substitution probabilities without being constrained by any pre-set conditions. The added flexibility renders it applicable to many real-world scenarios where available data are not ideal.

1.2. Our Model and Approach

For ease of exposition, we start our research with two products. In this setting, a retailer decides period-wise (e.g., daily) replenishment quantities for two substitutable products. Demands for the two products arrive in fashions initially unknown to the retailer. A request for one of the products, if unsatisfied, translates into a request for the other product at a probability that is again initially unknown.

To gain information about the demands, the retailer tallies sales and inventory levels at certain checkpoints during the period—any two consecutive checkpoints help mark out a block (e.g., an hour). The early blocks of a period are rarely affected by stockouts and they therefore aid the learning of pre-substitution base demand distributions. Meanwhile, stockouts and substitutions are more rampant in later blocks which could help us to better assess substitution probabilities.

Block-wise sales and inventory data are fed to a Kaplan–Meier (KM)-type estimator to generate demand estimates in the face of censoring enforced so strictly that even knowledge about whether there has been any demand arrival after the depletion of an inventory is not available. We then estimate substitution probabilities by taking ratios between KM-estimated substitution realizations and estimated substitution requests. Certain carefully designed weights are applied to the ratio-forming formulae to help the estimates’ convergence. By regret in this learning-then-optimization regime, we mean the difference between the optimal per-period profit achievable under true problem parameters and that achievable under the estimated ones.

Using large-deviation tools and information-theoretic techniques, we can verify our learning method’s attainment of tight regret bounds. We also work on the effective computation of optimal replenishment levels under known problem parameters. Here, some innate submodularity is exploited. All our results are extendable to the multi-product case. Here, an implicit assumption is that each customer encountering a stockout would attempt substitution for just once. Such one-time substitutions have been widely adopted in inventory management; see, for example, Netessine and Rudi (2003) and Nagarajan and Rajagopalan (2008).

We also consider the extension to a Markov-chain substitution model, allowing customers to attempt multiple substitutions in a Markov chain fashion. Even though discrete choice models are unsuitable for our learning problem as has already been discussed, we draw our inspiration drawn from the Markov-chain choice model of Blanchet et al. (2016) and Şimşek and Topaloglu (2018). While having fewer spelled-out coefficients, the regret bound that we can achieve for multi-product Markov-chain substitution has nevertheless the same order of time growth as that for one-time substitution.

1.3. Contributions and Key Results

This work is inspired by the industry’s need to understand problem parameters ranging from base demand distributions to pair-wise substitution probabilities. Worthy of note are three features:

(i)
The learning must work for a given historical data set, which may be devoid of any opportunity to place exploration-intended orders. This distinguishes us from Chen and Chao (2020), who also estimated both demand distributions and substitution probabilities, albeit with the freedom to manipulate orders for active learning purposes. Incidentally, to the best of our knowledge, this work is the only one other than ours that deals simultaneously with the learning of substitution probabilities and operations optimization involving demand learning. While Chen and Chao (2020) took on learning-while-doing challenges, we consider a complementary situation where replenishment decisions reflected in the data are already bygone.
(ii)
We do not insist on knowing whether there have been demand arrivals after the occurrences of stockouts; hence, our treatment of censoring is in a sense more thorough. Our KM estimation has made improvements on that of Huh et al. (2011), who still required what we call the strong censoring indicators on post-depletion demand arrivals. Although insignificant for a continuous-demand case, not needing any post-depletion information could help greatly expand our method’s applicability when demands are discrete.
(iii)
Our data form is not compatible with estimations based on choice models. Such methods typically require information on both assortments at various purchase occasions and customers’ purchase-or-no-purchase decisions. Our aggregate inventory-sales data do not hold such information. This propels our approach to be fundamentally different from existing estimation methods for choice models. This being said, we do draw inspiration from the state-of-the-art Markov-chain choice model of Blanchet et al. (2016) in our extension to a Markov-chain substitution model.

Also notable is the fact that our data are granular at block-wise levels. Although Jain et al. (2014) estimated demand distributions from block-wise demand, they did not provide the means to extract substitution probabilities from the block-wise estimates. This turns out to be a nontrivial task.

Performance guarantees

Our learning approach leads to base demand distributions and substitution probabilities that converge to their true values. Using large deviation theory, we can derive exponential convergence rates of the demand-related parameters; see Theorem 1. This result then extends to an $O (\sqrt{\log T / T})$ bound on the regret; see Theorem 2 for the one-time substitution model and Theorem 4 for the Markov-chain case. Due to a known $Ω (1 / \sqrt{T})$ bound, this result is nearly tight.

Submodularity and optimization

Beyond the known submodularity of the profit function under proportional substitution (Netessine and Rudi, 2003; Zhang et al., 2021), we achieve the same property for stochastic substitution; see Theorem 3. Maximization of a submodular objective function remains NP-hard (Bian et al., 2017: Proposition 5). Nevertheless, a double greedy heuristic exists for proportional substitution (Zhang et al., 2021), which is polynomial in the number of products. This heuristic is also applicable to our multi-product problem with probabilistic substitutions.
2. Literature Review

This work builds upon existing research in optimal inventory management for multiple mutually substitutable products. A common model in this field assumes that when a stockout occurs for a product, customers may choose another product as a substitute based on known exogenous probabilities or proportions. Numerous studies have focused on characterizing the structural properties of optimal inventory policies when both the demand distribution and customers’ substitution probabilities are known to the retailer.

Nagarajan and Rajagopalan (2008) investigated optimal inventory policies for substitutable products and introduced a partially decoupled base-stock policy. Schlapp and Fleischmann (2018) analyzed a multi-product model with capacity constraints, focusing on the effects of capacity limits and substitution preferences. Additionally, Zhang et al. (2021) formulated a multi-product inventory problem as an integer program and derived necessary optimality conditions.

Instead of the management aspect, our research concentrates on learning the demand distributions and substitution probabilities directly from the retailer’s historical sales data. We develop an effective offline learning algorithm and demonstrate its theoretical convergence properties. By enabling data-driven inventory decisions, this approach provides a practical framework for retailers to determine optimal inventory levels based on past sales records.

Many works utilize choice models to capture substitution effects. These include MNL (Vulcano et al., 2012), nested logit (Kök and Xu, 2011), and the Markov-chain choice model (Blanchet et al., 2016). Estimations of such models’ parameters usually require knowledge on real-time assortment evolutions and customers’ no-purchase decisions. Such information is not derivable from the aggregate inventory-sales data provided to us. We are thus compelled to come up with fundamentally different approaches than those based on choice modeling.

When retailers only have transaction data instead of complete information about the demand and substitution structures while the data themselves are likely incomplete, many researchers used the expectation–maximization (EM) method to recover the true demand-related information; see Anupindi et al. (1998). Apart from EM, competing approaches existed in the forms of likelihood estimation (Newman et al., 2014) and Markov chain Monte Carlo methods (Musalem et al., 2010). Unlike these approaches, we make adaptations to the renowned KM estimator. These enable us to estimate demand distributions without any prior assumptions, widening the applicabilities of our approach.

The KM estimator and its product-limit form were proposed by Kaplan and Meier (1958). They are widely used in survival analyses and medical trials. Foldes and Rejto (1981) showed the convergence of the KM estimator under random independent censoring. We use a similar method to estimate demand distributions. In our current situation, the conventional assumptions of independence are no longer realistic. Our theoretical derivations benefit from Huh et al. (2011), who allowed censoring variables to depend on past observations as well. However, we have no use of some strong censoring indicators needed by them which essentially tell whether more demand requests arrive after inventories have been emptied.

We also contribute to data-driven inventory control. For a single-product case, Levi et al. (2007) used uncensored empirical distributions to approximate true distributions and gave performance bounds. Huh and Rusmevichientong (2009) dealt with lost sales and demand censoring using a gradient-descent algorithm. Besbes and Muharremoglu (2013) designed an exploration-exploitation algorithm with a provable regret bound. For inventory management with lead time, Zhang et al. (2020) leveraged the convexity of the profit function and developed an online learning algorithm, while Lyu et al. (2024) integrated the upper confidence bound algorithm with the KM estimator. In multi-product settings, Shi et al. (2016) considered warehouse capacity constraints and applied online convex optimization techniques to the learning of inventory levels from data. To this growing body of literature, we add our approach to stockout-based substitutions involving demand learning from aggregate inventory sales data.

In their learning-while-doing study of a multi-product inventory control problem, Chen and Chao (2020) made the same stochastic-substitution assumption as ours. Having its own challenges notwithstanding, this earlier work possesses a unique advantage over ours: the ability to manipulate replenishment quantities for the purpose of data collection. Indeed, Chen and Chao (2020) took this advantage to its fullest extent by using their exploration-then-exploitation stages that grow in lengths in a doubly exponential fashion. Within each exploration phase, they cleverly inflated replenishments. Moreover, they cycled through the products, deliberately under-supplying one product a time to extract information on substitution activities. Their analysis was also assisted by the assumption that cost- and demand-related parameters would together allow an $ϵ$ -wide error tolerance within which demand estimations can guarantee exact replenishment decisions.

In contrast, we take a vastly different approach to our past and hence no-longer-manipulable data in order to extract useful information for profit optimization purposes. The fact that data comes in a block-wise fashion has helped us tremendously. There are ample opportunities for high inventory levels to dampen the censoring effects and for low inventory levels to reveal the substitution effects. For the former, we enlist the help of KM estimation instead of simple averages of sales data. Our assumptions on demand-related parameters, though considerably strong, are not interwoven with those regarding cost-related parameters. Consequently, the derivation of our performance bounds relies on intricate arguments afforded by close scrutinies of the profit function itself. Aside from demand learning, we derive the latter’s submodularity and make efforts to efficiently solve the optimization problem that may have multiple local optima.

3. Problem Formulation

For the ease of presentation, let us start with two products. Our retailer sells two substitutable products with given sales prices and unit costs. Any customer facing stockout independently decides whether or not to seek substitution from the other product. Any still-unmet demand is lost. The retailer begins without prior knowledge about demand distributions or substitution probabilities. However, historical stock levels and sales data for the two products are available.

Let us use the term period to denote a replenishment cycle. It is further decomposed into business selling blocks, much as a day can be divided into hours. Each product is allowed one single inbound delivery at the beginning of every period. The retailer can observe the stock level at the beginning of each block and make deductions about block-wise sales levels. He can neither observe unmet demand nor tell whether a sales event emanates from an originally intended purchase or a substitution attempt.

The retailer sells a set of $I = 2$ products $i = 1, 2$ . Let there be $K$ selling blocks in a given period $t$ . All random variables without further specifications are assumed to be independent of one another. Let $D_{i, k}^{t}$ be the random base demand for product $i$ at block $k$ of period $t$ . When a capital letter is used for a random variable, we shall use its lower case to stand for the random variable’s realization. The base demand stems from the customers’ original purchase intents prior to observing if this product is available from stock. We assume that a product’s base demand levels in the same selling block are identically distributed among periods: for any fixed $i$ and $k$ , the $D_{i, k}^{t}$ levels across different $t$ ’s share some common cumulative distribution function (CDF) $F_{i, k}$ and probability mass function $f_{i, k}$ on some finite support ${0, 1, \dots, {\bar{d}}_{i, k}}$ , where ${\bar{d}}_{i, k} \geq 1$ . Then, we can treat ${\bar{d}}_{i} \equiv \sum_{k = 1}^{K} {\bar{d}}_{i, k}$ as an upper bound for product $i$ ’s period-wise demand. This finite upper bound is mainly for analytical convenience and will be used in our consistency analysis later.

Although the exact demand upper bounds may not be precisely known, firms usually possess reasonable prior knowledge of their likely magnitudes. For instance, brick-and-mortar retailers often estimate customer populations and preferences through field surveys or market research that provide practical benchmarks for inventory decisions. Hence, firms likely maintain working estimates of plausible demand upper bounds. Alternatively, firms may glean information about the demand upper bounds from historical sales data.

We also let fixed substitution probabilities $p_{j i} \in [0, 1]$ guide demand transfers: when a base demand arrival for product $j$ is not satisfied, it is with this probability that the request is transferred to product $i$ . Each $p_{j i}$ would remain constant across blocks and periods. This tacit requirement may be viewed as representing a stable, long-run substitution tendency between product pairs.

We acknowledge that in more realistic settings, substitution behavior may vary across customer segments, product attributes, or temporal factors such as promotions and seasonality. Allowing substitution probabilities to evolve with these contextual variables would better capture such behavioral heterogeneity. However, we adopt fixed substitution probabilities here for theoretical tractability and to isolate the effects of censored observations on learning and optimization performance. Extending the framework to accommodate feature-based or segment-aware substitution models represents an important direction for future research, which we further discuss in the conclusion.

Unlike conventional models, neither any $F_{i, 1}$ , $F_{i, 2}$ , $\dots$ , $F_{i, K}$ nor any $p_{j i}$ is known to the retailer. For each period $t$ , selling block $k$ , and product pair $(j, i)$ , we introduce an infinite sequence of random variables $B_{j i, k}^{t} (1)$ , $B_{j i, k}^{t} (2)$ , $\dots$ that are independent and identically distributed in a Bernoulli fashion with parameter $p_{j i}$ . When $D_{i, k}^{t}$ is realized as $d_{i, k}^{t}$ ’s, the $B_{j i, k}^{t} (m)$ ’s as $b_{j i, k}^{t} (m)$ ’s, and the starting inventory level $O_{i, k}^{t}$ as $o_{i, k}^{t}$ , the stockout number for the product- $j$ base demand would be $(d_{j, k}^{t} - o_{j, k}^{t})^{+}$ ; consequently, the substitution-induced demand from product $j$ to $i$ would be $W (d_{j, k}^{t}, o_{j, k}^{t} | b_{j i, k}^{t} (1), b_{j i, k}^{t} (2), \dots)$ , where

W (d, o | b (1), b (2), \dots) \equiv \sum_{m = 1}^{(d - o)^{+}} b (m) .

(1)

In other words, for each of the

(d_{j, k}^{t} - o_{j, k}^{t})^{+}

occasions where an arriving customer is not satisfied by product

j

’s inventory, there is a

p_{j i}

chance that the demand is transferred to product

i

; moreover, the transfers are independent among different incidences. Due to assumptions on the

B_{j i, k}^{t} (m)

’s and (1),

\begin{aligned} E [W (D_{j, k}^{t}, O_{j, k}^{t} | B_{j i, k}^{t} (1), B_{j i, k}^{t} (2), \dots)] \\ = p_{j i} \cdot E [(D_{j, k}^{t} - O_{j, k}^{t})^{+}] . \end{aligned}

(2)

Also following (1), the total composite demand for product $i$ would be the sum of the base and substitution-induced demands:

d_{i, k}^{t} + W (d_{j, k}^{t}, o_{j, k}^{t} | b_{j i, k}^{t} (1), b_{j i, k}^{t} (2), \dots) .

Our retailer lacks the ability to see demand directly. Instead, he can observe each resulting sales level, which is the minimum of the order-up-to and composite demand levels:

\begin{aligned} z_{i, k}^{t} \equiv min {o_{i, k}^{t}, d_{i, k}^{t} + W (d_{j, k}^{t}, o_{j, k}^{t} | b_{j i, k}^{t} (1), b_{j i, k}^{t} (2), \dots)} . \end{aligned}

(3)

While it is the retailer who decides on the order-up-to levels

o_{i, 1}^{t}

, the ensuing levels

o_{i, 2}^{t}, . o_{i, 3}^{t}, \dots, o_{i, K}^{t}

would just follow the dynamics that for

k = 1, 2, \dots, K - 1

o_{i, k + 1}^{t} = o_{i, k}^{t} - z_{i, k}^{t} .

(4)

Given (4), the

o_{i, k}^{t}

information is redundant for

k = 2, 3, \dots, K

. Nevertheless, we include them in our data set for convenience. Our description makes it appear as if substitutions were realized at the end of each selling block. As we explain in Appendix B of the E-Companion, the timing is largely immaterial within the period when there is exactly one inbound delivery per period. In addition, given that replenishment occurs only at the beginning of each period but not at each selling block, we always have

o_{i, 1}^{t} \geq o_{i, 2}^{t} \geq \dots \geq o_{i, K}^{t} \geq 0

For some $T$ periods, we have the observed data set

\begin{aligned} S_{T} \equiv {o_{i, k}^{t}, z_{i, k}^{t} : t = 1, 2, \dots, T, k = 1, 2, \dots, K, i = 1, 2}, \end{aligned}

(5)

where the sales levels

z_{i, k}^{t}

are related to the starting inventory levels

o_{i, k}^{t}

in the fashion of (4). Recall the two purposes of this article. On the one hand, we seek to use the

T

periods’ worth of data to reconstruct the underlying demand CDF’s

F_{i, k}

(or the equivalent distributions

f_{i, k}

) and the underlying substitution probabilities

p_{j i}

as precisely as possible. On the other hand, given the underlying parameters of the problem, we seek to manage the period-wise order-up-to activities as efficiently as possible.

Formally, we name the $d_{i, k}^{t}$ -portion of the demand for product $i$ the base demand, the $W (d_{j, k}^{t}, o_{j, k}^{t} ∣ b_{j i, k}^{t} (1), b_{j i, k}^{t} (2), \dots)$ -portion the substitution-induced demand, and the sum of the two the composite demand. Even though distinguishing between base and composite demands may seem difficult, we manage to use all the information in the data set for demand learning. When $o_{j, k}^{t}$ is above product $j$ ’s demand upper bound ${\bar{d}}_{j, k}$ in block $k$ , we can attribute the sales level of product $i = 3 - j$ , whether censored or not, solely to its own base demand; this aids in estimating the CDF $F_{i, k}$ ; refer to (7) and (13) later. The opposite case where $o_{j, k}^{t} = 0, \dots, {\bar{d}}_{j, k} - 1$ allows us to estimate the composite demand; refer to (7) and (14) later. A combination would help the learning of substitution probabilities $p_{j i}$ ; refer to (8) to (12) later.

Owing to lost sales and the resulting censoring, we can only access the sales data $z_{i, k}^{t}$ of (3), which corresponds to weak censoring. The latter refers to the non-strict inequality

d_{i, k}^{t} + W (d_{j, k}^{t}, o_{j, k}^{t} ∣ b_{j, k}^{t} (1), b_{j i, k}^{t} (2), \dots) \geq z_{i, k}^{t} .

By contrast, the strict inequality used by Huh et al. (2011) lets out a strong signal for censoring.

4. Our KM-Based Learning Procedure

We first introduce the original KM estimator and then present our KM-based estimation.

4.1. Fundamentals of KM Estimation

The KM estimator was introduced six decades ago by Kaplan and Meier (1958) for constructing empirical CDFs from censored data sets.

Let ${D_{1}}_{t = 1}^{T}$ be a sequence of independent and identically distributed random variables. Under a sequence ${O_{t}}_{t = 1}^{T}$ of censoring variables, let

\begin{aligned} Y_{t} \equiv min {D_{t}, O_{t}}, δ_{t} \equiv 1 (D_{t} \leq O_{t} - 1) = 1 (Y_{t} \leq O_{t} - 1), \\ \forall t = 1, \dots, T . \end{aligned}

(6)

Think of each

D_{t}

as a demand quantity, each

O_{t}

as an inventory level, each

Y_{t}

as a sales quantity, and each

δ_{t}

as an indicator of whether or not inventory strictly exceeds demand. An alternative expression for

δ_{l}

1 (O_{t} - Y_{t} \geq 1)

, which expresses that at least one item is left in inventory after demand arrival. In any event, this weak signal

δ_{t}

is more realistic than its stronger counterpart

{\tilde{δ}}_{t} \equiv 1 (D_{t} \leq O_{t})

, the telling of whose being zero or one requires knowing whether there has been any demand arrival after the depletion of the inventory. In our censored environment where the

O_{t}

’s and

Y_{t}

’s are available but the

D_{t}

’s are not, the weak signals can be assembled from the observables

O_{t}

and

Y_{t}

while the strong ones cannot.

We may sort the observations ${(Y_{1}, δ_{1}), \dots, (Y_{T}, δ_{T})}$ in the $Y_{t}$ s’ ascending order and whenever at one same $Y_{t}$ level, make the $δ_{t}$ ’s ascend as well. This way, we can obtain an information poset $ι \equiv {(Y_{(1)}, δ_{(1)}), \dots, (Y_{(T)}, δ_{(T)})}$ , in which $Y_{(t)} \leq Y_{(t + 1)}$ and any $Y_{(t)} = Y_{(t + 1)}$ is associated with $δ_{(t)} \leq δ_{(t + 1)}$ . Not only are our indicators weaker versions of Huh et al. (2011)’s ${\tilde{δ}}_{t} \equiv 1 (D_{t} \leq O_{t})$ , but they are also ordered in a reverse manner than those in the earlier work. Our version of the KM estimator for the $D_{t}$ ’s refers to the following complementary CDF (CCDF):

\begin{aligned} {\hat{F}}^{∁} (d, ι) \equiv \prod_{t : Y_{(t)} \leq d} {(\frac{T - t}{T + 1 - t})}^{δ_{(t)}}, for any d = 0, 1, \dots, Y_{(T)} . \end{aligned}

(7)

To ensure that

1 - {\hat{F}}^{∁} (d, ι)

is a proper CDF, let

{\hat{F}}^{∁} (d, ι) = 0

for any

d \geq Y_{(T)} + 1

. The thus-obtained CCDF allows us to compute (i) the empirical CDF

\hat{P} (D \leq d) = 1 - {\hat{F}}^{∁} (d, ι)

and (ii) the empirical probability mass function (PMF)

\hat{P} (D = d) = {\hat{F}}^{∁} (d - 1, ι) - {\hat{F}}^{∁} (d, ι)

, where

{\hat{F}}^{∁} (- 1, ι)

is understood as one.

When every $δ_{(t)} = 1$ and hence every $Y_{(t)}$ is a true reflection of the underlying random variable $D$ , (7) would just give us an ordinary empirical distribution where each observation contributes a $1 / T$ weight. Presences of $δ_{(t)} = 0$ , however, would make it likelier that the true $D$ be above the observed $Y_{(t)}$ ; indeed, (7) helps them to boost relevant CCDF terms. Note $O_{t} = 0$ would cause $(Y_{t}, δ_{t}) = (0, 0)$ via (6), rendering the pair uninformative. Thus, the “effective” data amount in $ι$ might be regarded as $T^{'} \equiv \sum_{t = 1}^{T} 1 (O_{t} \geq 1)$ . Indeed, our technical result Lemma B.3 shows that both $T$ and $T^{'}$ could loom large in convergences of our estimators. Due to (7), however, retaining uninformative data points would not affect the actual estimator.

For an example, suppose $ι = {(0, 0), (1, 1), (2, 0), (2, 1), (3, 1)}$ and hence $T = 5$ and $Y_{(T)} = 3$ . Then, we may use (7) to obtain ${\hat{F}}^{∁} (0, ι) = [(5 - 1) / (5 + 1 - 1)]^{0} = 1$ , ${\hat{F}}^{∁} (1, ι) = 1 \times [(5 - 2) / (5 + 1 - 2)]^{1} = 3 / 4$ , ${\hat{F}}^{∁} (2, ι) = (3 / 4) \times [(5 - 3) / (5 + 1 - 3)]^{0} \times [(5 - 4) / (5 + 1 - 4)]^{1} = 3 / 8$ , ${\hat{F}}^{∁} (3, ι) = (3 / 8) \times [(5 - 5) / (5 + 1 - 5)]^{1} = 0$ , and ${\hat{F}}^{∁} (4, ι) = {\hat{F}}^{∁} (5, ι) = \dots = 0$ . Thus, our estimate for the random $D$ is such that $\hat{P} (D = 0) = 1 - 1 = 0$ , $\hat{P} (D = 1) = 1 - 3 / 4 = 1 / 4$ , $\hat{P} (D = 2) = 3 / 4 - 3 / 8 = 3 / 8$ , $\hat{P} (D = 3) = 3 / 8 - 0 = 3 / 8$ , and $\hat{P} (D = 4) = \hat{P} (D = 5) = \dots = 0$ . Among the $T = 5$ pairs of $(Y_{t}, δ_{t})$ , the first $(0, 0)$ behaves as if it did not exist; effectively, $ι$ may be understood as only containing $T^{'} = 4$ useful pairs.

When $ι$ is derived from specific data in $S_{T}$ for a given product $i$ and block $k$ , we utilize (7) to estimate both the base and composite demand distributions. In the following subsection, we detail how these estimated demand distributions are subsequently employed for the estimation of substitution probabilities.

4.2. Estimation of the Substitution Probabilities

Let us follow the convention of Section 4.1 to express expectations with respect to the KM-estimated distributions using $\hat{E}$ . Owing to (1) and (2), it is natural to estimate the substitution probability $p_{j i}$ using the following ratios involving estimated expectations:

\frac{\hat{E} [W (D_{j, k}^{t}, o ∣ B_{j i, k}^{t} (1), \dots)]}{\hat{E} [(D_{j, k}^{t} - o)^{+}]} = \frac{\hat{E} [D_{i (j), k}^{t} (o)] - \hat{E} [D_{i, k}^{t}]}{\hat{E} [(D_{j, k}^{t} - o)^{+}]} .

(8)

In (8), we have used

D_{i (j), k}^{t} (o) \equiv D_{i, k}^{t} + W (D_{j, k}^{t}, o ∣ B_{j i, k}^{t} (1), B_{j i, k}^{t} (2), \dots)

as the composite demand. Here,

o

is the realized product-

j

inventory level in period

t

and block

k

The estimation accuracy for $p_{j i}$ is according to (8) reliant on the two terms in the numerator and one term in the denominator. Corresponding to the first term in the numerator, let

ψ_{i (j), k} (o) \equiv \frac{1}{T} \sum_{t = 1}^{T} 1 (o_{i, k}^{t} \geq 1 and o_{j, k}^{t} = o)

(9)

be the observed fraction of

o

-occurrences among all

T

data points. Note in (8),

o

can be any integer between

0

and

{\bar{d}}_{j, k} - 1

. We use

o_{i, k}^{t} \geq 1

to ensure at least a partial observability of demand arrivals. In the current two-product setting,

j

in many subscripts may seem redundant in the presence of

i

because it can only be

3 - i

. However, we shall retain them to be consistent with genuinely multi-product cases.

To emulate the two terms regarding base demands in (8), also let

ς_{i, k} \equiv \frac{1}{T} \sum_{t = 1}^{T} 1 (o_{i, k}^{t} \geq 1 and o_{j, k}^{t} \geq {\bar{d}}_{j, k}^{t}),

(10)

which is the observed fraction of non-stockout occurrences among the

T

data points where demand coming to product

i

can be partially observed and demand coming to

j

has not depleted its inventory. In those periods, whatever demand observations made for

i

are ascribable to its base demand. We have not involved

j

in our notation for (10) because in the general case it would expand to all products other than

i

Following (2), we shall weigh (8)’s terms using the $(ψ_{i (j), k} (o), ς_{i, k}, ς_{j, k})$ ’s. To assemble an estimate of the substitution probability for a whole period from those for various blocks, we use the weights

ω_{j i, k} \equiv min {\sqrt{ς_{j, k}}, \sqrt{ς_{i, k}}} .

(11)

Within each block

k

at each

o

-level, it is natural to use the weight

ψ_{i (j), k} (o)

. In the end, we let

\begin{aligned} {\hat{p}}_{j i} & \equiv (\frac{\sum_{k = 1}^{K} ω_{j i, k} \sum_{o = 0}^{{\bar{d}}_{i, k} - 1} ψ_{i (j), k} (o) (\hat{E} [D_{i (j), k}^{t} (o)] - \hat{E} [D_{i, k}^{t}])}{\sum_{k = 1}^{K} ω_{j i, k} \sum_{o = 0}^{{\bar{d}}_{j, k} - 1} ψ_{i (j), k} (o) \hat{E} [(D_{j, k}^{t} - o)^{+}]}) \\ \lor 0 \land 1. \end{aligned}

(12)

On the rare occasion when the denominator is zero, we treat the ratio as zero. Our overarching idea in designing (12) is to give more weight to terms that can be more accurately estimated. This would give us some advantages in the use of the Cauchy–Schwarz inequality when bounding the errors caused by (12). Specifically, as indicated by (9), a greater

ψ_{i (j), k} (o)

would mean higher frequencies of the substitution demand from product

j

i

. Similarly, based on definitions (10) and (11), a greater

ω_{j i, k}

would mean more frequent revelations of (i) block-

k

base demand for product

j

and (ii) block-

k

base demand for product

i

. As will be reflected in the proof of Theorem 1, the various improved frequencies would entail more accuracy in estimations of terms that appear in (8). This is our rationale for the weighting scheme used in (12).

4.3. Our Learning Procedure in Detail

In detail, our procedure works as follows:

INPUT. A data set $S_{T}$ of (5) consisting of block-wise sales observations ${z_{1, 1}^{t}, z_{2, 1}^{t}, \dots, z_{1, K}^{t}, z_{2, K}^{t}}_{t = 1}^{T}$ , and the corresponding starting block-wise stock levels ${o_{1, 1}^{t}, o_{2, 1}^{t}, \dots, o_{1, K}^{t}, o_{2, K}^{t}}_{t = 1}^{T}$ that also obeys the consistency requirements of (4).

OUTPUT. The estimated block-wise base demand distributions ${\hat{f}}_{i, k}$ for products $i$ and block $k$ and the estimated substitution probabilities ${\hat{p}}_{j i}$ from products $j$ to $i$ .

Step 1. Form information posets for KM estimation.

We use $y_{i, k}^{t}$ for the observation made for base demand after censoring and $δ_{i, k}^{t}$ for the corresponding censoring indicator in the selling block $k$ :

\begin{aligned} y_{i, k}^{t} & = z_{i, k}^{t} \times 1 (o_{i, k}^{t} \geq 1 and o_{j, k}^{t} \geq {\bar{d}}_{j, k}), \\ δ_{i, k}^{t} & = 1 (z_{i, k}^{t} \leq o_{i, k}^{t} - 1 and o_{j, k}^{t} \geq {\bar{d}}_{j, k}) . \end{aligned}

(13)

Recall that

{\bar{d}}_{j, k}

is the known upper bound for product

j

’s base demand at selling block

k

. By (13),

y_{i, k}^{t}

is product

i

’s sales quantity

z_{i, k}^{t}

when its own inventory level

o_{i, k}^{t}

is not zero and the other product’s inventory level

o_{j, k}^{t}

is more than adequate to handle its own demand. Here,

z_{i, k}^{t}

at least records partial demand and is not distorted by any substitution from product

j

. Besides,

δ_{i, k}^{t}

indicates whether or not

y_{i, k}^{t} = z_{i, k}^{t}

is a true base demand for product

i

. Note (13)’s definition of

(y_{i, k}^{t}, δ_{i, k}^{t})

conforms with (6) in that

y_{i, k}^{t} = min {d_{i, k}^{t}, o_{i, k}^{t} \times 1 (o_{i, k}^{t} \geq 1 and o_{j, k}^{t} \geq {\bar{d}}_{j, k})}

and

δ_{i, k}^{t} = 1 (d_{i, k}^{t} \leq o_{i, k}^{t} \times 1 (o_{i, k}^{t} \geq 1 and o_{j, k}^{t} \geq {\bar{d}}_{j, k}) - 1)

; see Appendix A of the E-Companion.

Moreover, we use $y_{i (j), k}^{t} (o)$ for $o = 0, 1, \dots, {\bar{d}}_{j, k} - 1$ to denote the observation for the composite demand $d_{i (j), k}^{t} (o)$ with the starting inventory level $o_{j, k}^{t} = o$ of the other product $j$ after censoring and use $δ_{i (j), k}^{t} (o)$ to denote the corresponding censoring indicator in the selling block $k$ :

\begin{aligned} y_{i (j), k}^{t} (o) & = z_{i, k}^{t} \times 1 (o_{i, k}^{t} \geq 1 and o_{j, k}^{t} = o), \\ δ_{i (j), k}^{t} (o) & = 1 (z_{i, k}^{t} \leq o_{i, k}^{t} - 1 and o_{j, k}^{t} = o) . \end{aligned}

(14)

By (14),

y_{i (j), k}^{t} (o)

is product

i

’s sales quantity

z_{i, k}^{t}

when its own inventory level

o_{i, k}^{t}

is not zero and the other product’s inventory level

o_{j, k}^{t}

is at a specific level

o

. Here,

z_{i, k}^{t}

records the composite demand consisting of the base demand of product

i

and the substitution demand from product

j

at a specific product-

j

inventory level

o

. Besides,

δ_{i, k}^{t}

indicates whether or not

y_{i (j), k}^{t} (o) = z_{i, k}^{t}

is a true composite demand for product

i

. Note (14)’s definition of

(y_{i (j), k}^{t}, δ_{i (j), k}^{t})

conforms with (6) in that

y_{i (j), k}^{t} (o) = min {d_{i (j), k}^{t} (o), o_{i, k}^{t} \times 1 (o_{i, k}^{t} \geq 1 and o_{j, k}^{t} = o)}

and

δ_{i (j), k}^{t} = 1 (d_{i (j), k}^{t} (o) \leq o_{i, k}^{t} \times 1 (o_{i, k}^{t} \geq 1 and o_{j, k}^{t} = o) - 1)

. Details can be found in Appendix A of the E-Companion.

Now, we have posets for each product $i$ that contain the observations for the base demand, as well as for the composite demand in each block $k$ :

\begin{aligned} ι_{i, k} = {(y_{i, k}^{t}, δ_{i, k}^{t})}_{t = 1}^{T}, ι_{i (j), k} (o) = {(y_{i (j), k}^{t} (o), δ_{i (j), k}^{t} (o))}_{t = 1}^{T}, \\ o = 0, \dots, {\bar{d}}_{j, k} - 1. \end{aligned}

(15)

Through (13) to (15), the original

S_{T}

of (5) can help us generate the various posets

ι_{i, k}

and

ι_{i (j), k} (o)

. By (7), these would lead to the various

{\hat{F}}^{∁} (d, ι_{i, k})

’s and

{\hat{F}}^{∁} (d, ι_{i (j), k} (o))

’s that will soon show their worths.

Step 2. Formulate block-wise base demand distributions.

Using (7), we compute the KM-estimated CCDF for each product $i$ in each selling block $k$ on the basis of the corresponding posets ${\hat{F}}^{∁} (d, ι_{i, k})$ , for $d = 0, 1, \dots, {\bar{d}}_{i, k} - 1$ . Based on the CCDFs, we compute the block-wise demand probability distribution functions:

{\hat{f}}_{i, k} (d) = {\hat{F}}^{∁} (d - 1, ι_{i, k}) - {\hat{F}}^{∁} (d, ι_{i, k}), for d = 0, 1, \dots .

(16)

Step 3. Estimate substitution probabilities.

We calculate the expectations for the block-wise base and composite demands using the CCDFs estimated from the posets, for $i = 1, 2$ , $k = 1, \dots, K$ , and $o = 0, 1, \dots, {\bar{d}}_{j, k} - 1$ :

\begin{aligned} \hat{E} [(D_{j, k}^{t} - o)^{+}] = \sum_{d = o}^{{\bar{d}}_{j, k} - 1} {\hat{F}}^{∁} (d, ι_{j, k}) \\ (base demand and lost sales), \\ \hat{E} [D_{i (j), k}^{t} (o)] = \sum_{d = 0}^{{\bar{d}}_{i, k} + {\bar{d}}_{j, k} - 1} {\hat{F}}^{∁} (d, ι_{i (j), k} (o)) \\ (composite demand) . \end{aligned}

(17)

All we need for

{\hat{p}}_{j i}

of (12), including

\hat{E} [D_{i, k}^{t}] \equiv \hat{E} [(D_{i, k}^{t} - 0)^{+}]

, is supplied by (17).

In Step 1, the complexity for the base demand data collection is $O (K T)$ , and that for the composite demand collection is $O ({max}_{i = 1}^{2} {\bar{d}}_{i} T)$ ; in Step 2, the complexity for sorting is $O (T)$ , and that for the subsequent KM estimation is $O ({max}_{i = 1}^{2} {\bar{d}}_{i} T)$ ; meanwhile, the complexity of Step 3 is $O ({max}_{i = 1}^{2} {\bar{d}}_{i})$ . Thus, the total complexity so far accounted for is $O ({max}_{i = 1}^{2} {\bar{d}}_{i} T)$ . Recall that a convolution in the fashion of (20) later is needed before the retailer can optimize his operations. When this is also taken into account, the time complexity of the entire learning procedure would amount to $O ({max}_{i = 1}^{2} {\bar{d}}_{i} (T + {max}_{i = 1}^{2} {\bar{d}}_{i}))$ .

5. Theoretical Results

Under reasonable conditions, we prove the convergence of the KM-based estimates for both base demand distributions and substitution probabilities. This result leads us to a regret bound concerning the replenishment problem. The bound is shown to be tight. Submodularity, in turn, enables a greedy algorithm.

5.1. Convergence of Estimators

We first make an assumption on the random nature of historical replenishment levels ${O_{1, 1}^{t}, O_{2, 1}^{t}}_{t = 1}^{T}$ . It reflects that decisions made by the retailer can depend on past information only. A similar assumption is made by Huh et al. (2011: Lemma 1).

Assumption 1
The order-up-to level $O_{i, 1}^{t}$ for each product $i$ at the beginning of each period $t$ is adapted to the filtration $σ (F_{t - 1} \cup σ (ε_{i}^{t}))$ , with
$\begin{aligned} F_{t - 1} & \equiv σ (G_{1} \cup G_{2} \cup \dots \cup G_{t - 1}), \\ where G_{s} \equiv σ (O_{1, 1}^{s}, O_{2, 1}^{s}, D_{1, 1}^{s}, D_{2, 1}^{s}, \dots, O_{1, K}^{s}, O_{2, K}^{s}, \\ D_{1, K}^{s}, D_{2, K}^{s}), s = 1, \dots, t - 1. \end{aligned}$
Here, each $G_{s}$ is a collection of sets that help reveal the demand and replenishment information for period $s$ , and the $ε_{i}^{t}$ ’s are independent random variables that are independent of $F_{t - 1}$ . Furthermore, when $σ (\cdot)$ applies to a collection of sets, it generates the smallest $σ$ -field that covers that collection; whereas, when it applies to a mapping from the sample space to a Euclidean space, it generates the smallest $σ$ -field that renders that mapping a random vector in that space.

The filtration $F_{t - 1}$ in Assumption 1 merely imposes an information upper bound. Our retailer can access past sales data only if they are treatable as functions of the demand data to which the retailer has no direct access, and of his own past decisions. The random variable $ε_{i}^{t}$ summarizes factors other than past replenishment and sales data, which can even include coin flips that a retailer may use to make up his mind. It is worth noting that conditioned on the filtration $F_{t - 1}$ , the order-up-to level $O_{i, 1}^{t}$ for product $i$ at the beginning of each period $t$ is independent of that for the other product. The extra freedom afforded by the random perturbations $ε_{i}^{t}$ allows the retailer’s replenishment behaviors to be not completely interpretable by past data. For instance, the retailer may be engaged in cost reduction campaigns or faced with random yields.

We also impose an assumption about the block-wise demand levels.
Assumption 2
For any product $i$ , period $t$ , and selling block $k$ ,
$0 < {\underline{α}}_{i, k} \leq P (D_{i, k}^{t} = {\bar{d}}_{i, k}) \leq P (D_{i, k}^{t} \geq 1) < 1,$
where ${\underline{α}}_{i, k}$ is a strictly positive constant.

In Assumption 2, $P (D_{i, k}^{t} = {\bar{d}}_{i, k}) \geq {\underline{α}}_{i, k} > 0$ means that the base demand upper bound is reachable. Note this is almost automatic from the existence of the demand upper bounds ${\bar{d}}_{i, k}$ —if this were not true for a certain $(i, k)$ , we could always lower the current ${\bar{d}}_{i, k}$ . The second inequality on the same line of the assumption is always true, and $P (D_{i, k}^{t} \geq 1) < 1$ means there is a chance for a zero demand level. Each $D_{i, k}^{t}$ is only a block-wise demand level, which tends to be much lower than the period-wise $D_{i}^{t} \equiv \sum_{h = 1}^{K} D_{i, h}^{t}$ . Hence, our block-wise requirement is weaker than having a positive probability for the period-wise demand to be zero.

Recall that the retailer can only control the first block’s starting inventory levels $O_{i, 1}^{t}$ , whereas levels of other blocks are subject to the fluctuations of demand arrivals. We next impose some conditions on them.
Assumption 3
For some strictly positive constants ${\underline{ϑ}}_{1}$ and ${\underline{ϑ}}_{2}$ , the order-up-to level $O_{i, 1}^{t}$ of any product $i$ at the beginning of any period $t$ satisfies
$\begin{aligned} P (O_{i, 1}^{t} \geq {max}_{k = 1}^{K} {{\bar{d}}_{1, k} + {\bar{d}}_{2, k}} | F_{t - 1}) \geq {\underline{ϑ}}_{1}, \\ P (O_{i, 1}^{t} \leq \sum_{k = 1}^{K - 1} {\bar{d}}_{i, k} | F_{t - 1}) \geq {\underline{ϑ}}_{2}; \end{aligned}$
either inequality is one between an $F_{t - 1}$ -measurable random variable and a constant which can still be understood as a random variable that is measurable under any $σ$ -field.

Assumption 3 imposes bounds on the retailer’s order-up-to levels in period $t$ that are irrespective of his experiences in periods $1$ to $t - 1$ . In it, all we need is the strict positivity of ${\underline{ϑ}}_{1}$ and ${\underline{ϑ}}_{2}$ that are independent of sample paths. First, the bounds are reasonable. Note that we relate to the maximum block-wise total demand level ${max}_{k = 1}^{K} {{\bar{d}}_{1, k} + {\bar{d}}_{2, k}}$ instead of the sum $\sum_{k = 1}^{K} {{\bar{d}}_{1, k} + {\bar{d}}_{2, k}}$ , while there is no additional replenishment during the period after an initial block-1 delivery. In addition, each ${\bar{d}}_{i, k}$ tends to be small as one period’s demand is divided into $K$ block-wise demands. Therefore, the first half of the assumption does not amount to too much. As for the second half, $\sum_{k = 1}^{K - 1} {\bar{d}}_{i, k}$ is likely already at a high percentile of one period’s random demand because only the last block’s demand upper bound has been left out of the sum.

Besides, as long as the unit sales prices are not outrageously higher than the unit ordering costs, a back-of-the-envelope calculation using the newsvendor formula indicates that order-up-to levels should not be too high. Second, even if the inequalities inside the probabilities point to unwise decisions, we should still give a real-life retailer some allowance for occasional mistakes.

While Assumption 1 is impossible to verify numerically, we have provided partial validations in Section 7.2 for Assumptions 2 and 3 using real data from the European grocery chain.

We are now in a position for an intermediate result.
Proposition 1
Based on Assumptions 1 to 3, we can define strictly positive constants $\underline{θ}$ , $\underline{γ}$ , and $\underline{ζ}$ to facilitate the following bounds: (a)
for any period $t = 1, \dots, T$ and block $k = 1, \dots, K$ ,
$P (⋂_{i = 1}^{2} {O_{i, k}^{t} \geq {\bar{d}}_{i, k}} | F_{t - 1}) \geq \underline{θ};$

(b)
for any product $i = 1, 2$ , $j \neq i$ , period $t = 1, \dots, T$ , and the last block $K$ ,
$P (O_{i, K}^{t} \geq 1 and O_{j, K}^{t} \leq {\bar{d}}_{j, K} - 1 ∣ F_{t - 1}) \geq \underline{γ};$

(c)
for any product $i = 1, 2$ , $j \neq i$ , period $t = 1, \dots, T$ , block $k = 1, \dots, K$ , and $o = 0, 1, \dots$ ,
$\begin{aligned} P (O_{i, k}^{t} \geq {\bar{d}}_{1, k} + {\bar{d}}_{2, k} ∣ F_{t - 1} and O_{i, k}^{t} \geq 1 and O_{j, k}^{t} = o) \\ \geq \underline{ζ} . \end{aligned}$

In Proposition 1, part $(a)$ guarantees some uncensored observation of either product’s base demand; part $(b)$ guarantees some substitutions in at least the last block; the condition listed in part $(c)$ is met due to part $(b)$ ; under this condition, which indicates the occurrence of some substitution, there is some probability of a complete observation for the composite demand. Our subsequent consistency results are derived under Proposition 1 stated with the true demand upper bounds ${\bar{d}}_{i, k}$ . In practice, however, the firm may adopt upper bounds ${\tilde{d}}_{i, k}$ that are slightly different from ${\bar{d}}_{i, k}$ . Note that overestimated upper bounds may even violate Assumption 2. But this need not matter—the assumptions are sufficient but not necessary for Proposition 1. Once the proposition’s parts (a) to (c) are in place, our ensuing derivations can be carried out.

The proposition is itself sufficient but not exactly necessary. In Sections 7.1.1 and 7.1.2, we shall deliberately inflate the demand upper bounds and approximate them using different sales quantiles, respectively. The results show that our estimation-optimization framework is robust under mis-specifications.

Let us now derive the convergence of our estimates.
Theorem 1 (Convergence of Demand Distributions and Substitution Probabilities)

The following statements are true:

(Convergence of the demand distribution) For any $ϵ > 0$ , $i = 1, 2$ , and $k = 1, 2, \dots, K$ ,

P ({max}_{d = 0}^{{\bar{d}}_{i, k}} | {\hat{f}}_{i, k} (d) - f_{i, k} (d) | \geq ϵ) \leq {\bar{A}}_{i, k}^{them 1} \exp (- \frac{T ϵ^{2}}{{\bar{B}}_{i, k}^{them 1}}),

where each

{\hat{f}}_{i, k} (d)

is the empirical probability derived in steps (13) to (16),

T

is the number of periods,

{\bar{A}}_{i, k}^{them 1} = 16 {\bar{d}}_{i, k}

, and

{\bar{B}}_{i, k}^{them 1} = 32 (2 {\bar{d}}_{i, k} + 1)^{2} / ({\underline{θ}}^{2} {\underline{α}}_{i, k}^{2})

, with

{\underline{α}}_{i, k}

being the strictly positive constant defined at Assumption 2 and

\underline{θ}

the strictly positive constant from Proposition 1

(a)

(Convergence of the substitution probability) For any $0 < ϵ < 1 / 2$ and $i = 1, 2$ , $j \neq i$ ,

P (| {\hat{p}}_{j i} - p_{j i} | \geq ϵ) \leq {\bar{C}}_{j i}^{them 1} \exp (- \frac{T ϵ^{2}}{{\bar{D}}_{j i}^{them 1}}),

where

{\hat{p}}_{j i}

is the estimated substitution probability reached through (9) to (12),

{\bar{C}}_{j i}^{them 1} = 16 {\bar{d}}_{i} + 8 {\bar{d}}_{j} + \sum_{k = 1}^{K} {\bar{d}}_{j, k} (8 {\bar{d}}_{1, k} + 8 {\bar{d}}_{2, k})

{\bar{D}}_{j i}^{them 1} = 512 ({\bar{d}}_{1} + {\bar{d}}_{2})^{2} {max}_{k = 1}^{K} {{\bar{d}}_{j, k} (2 {\bar{d}}_{1, k} + 2 {\bar{d}}_{2, k} + 1)^{2}} / ({\underline{θ}}^{2} \underline{γ} \underline{ζ} {\underline{α}}_{i}^{2})^{2}

, and

{\underline{α}}_{i} = {min}_{k = 1}^{K} {\underline{α}}_{i, K}

with

{\underline{α}}_{i, K}

defined in Assumption 2 and

\underline{θ}, \underline{γ}

, and

\underline{ζ}

being probabilities in Proposition 1.

We now provide a sketch of Theorem 1’s proof. To establish the convergence of the estimated demand distribution, let us focus on the KM-estimated CCDF, ${\hat{F}}^{∁} (d, ι)$ , as defined at (7). For notation convenience, we simplify ${\hat{F}}^{∁} (d, ι)$ to ${\hat{F}}^{∁} (d)$ for a generic poset $ι \equiv {(Y_{(1)}, δ_{(1)}), \dots, (Y_{(T)}, δ_{(T)})}$ . The latter may be obtained from the censored observations ${(Y_{1}, δ_{1}), \dots, (Y_{T}, δ_{T})}$ as described in Section 4.1.

We rely on the decomposition that

\begin{aligned} P (max_{d} | {\hat{F}}^{∁} (d) - F^{∁} (d) | \geq ϵ) \\ = P (max_{d} | {\hat{F}}^{∁} (0) \prod_{s = 1}^{d} \frac{{\hat{F}}^{∁} (s)}{{\hat{F}}^{∁} (s - 1)} \\ - F^{∁} (0) \prod_{s = 1}^{d} \frac{F^{∁} (s)}{F^{∁} (s - 1)} | \geq ϵ) \\ \leq P (| {\hat{F}}^{∁} (0) - F^{∁} (0) | + max_{d} \sum_{s = 1}^{d} \\ \times | \frac{{\hat{F}}^{∁} (s)}{{\hat{F}}^{∁} (s - 1)} - \frac{F^{∁} (s)}{F^{∁} (s - 1)} | \geq ϵ) \\ \leq P (| {\hat{F}}^{∁} (0) - F^{∁} (0) | \geq \frac{ϵ}{\bar{d}}) \\ + \sum_{s = 1}^{\bar{d} - 1} P (| \frac{{\hat{F}}^{∁} (s)}{{\hat{F}}^{∁} (s - 1)} - \frac{F^{∁} (s)}{F^{∁} (s - 1)} | \geq \frac{ϵ}{\bar{d}}), \end{aligned}

where the last two inequalities hold since

| a_{1} a_{2} - b_{1} b_{2} | \leq | a_{1} - b_{1} | + | a_{2} - b_{2} |

for any

a_{1}, a_{2}, b_{1}, b_{2} \in [0, 1]

. For each

s

, we note the identities

F^{∁} (s) / F^{∁} (s - 1) = P (D \geq s + 1) / P (D \geq s)

and

\begin{aligned} \frac{{\hat{F}}^{∁} (s)}{{\hat{F}}^{∁} (s - 1)} & = \prod_{t : Y_{(t)} = s} {(\frac{T - t}{T + 1 - t})}^{δ_{(t)}} \\ = \frac{T - \sum_{t = 1}^{T} 1 (Y_{(t)} \leq s)}{T - \sum_{t = 1}^{T} 1 (Y_{(t)} = s, O_{(t)} \leq s)} \\ = \frac{\sum_{t = 1}^{T} 1 (D_{t} \geq s + 1, O_{t} \geq s + 1)}{\sum_{t = 1}^{T} 1 (D_{t} \geq s, O_{t} \geq s + 1)}, \end{aligned}

where detailed derivations could be found later at Lemma B.2. With these expressions in place, we can then apply appropriate concentration inequalities to establish the various exponential probability bounds.

When it comes to the substitution probabilities, we note from (2) and (12) that both the true and estimated substitution probabilities are in a fractional form. Simplifying the true and estimated substitution probabilities by $p_{j i} = N_{j i} / M_{j i}$ and ${\hat{p}}_{j i} = {\hat{N}}_{j i} / {\hat{M}}_{j i}$ , respectively, we can arrive to the decomposition

| p_{j i} - {\hat{p}}_{j i} | \leq \frac{| N_{j i} - {\hat{N}}_{j i} |}{M_{j i}} + \frac{| N_{j i} (M_{j i} - {\hat{M}}_{j i}) |}{M_{j i} {\hat{M}}_{j i}} .

Each term, such as

| N_{j i} - {\hat{N}}_{j i} |

, is a weighted sum of KM estimates. By the first part of this theorem, every summand is exponentially concentrated. Apply the Cauchy–Schwarz inequality with chosen weights to tighten the bound, and we can then obtain the desired exponential bounds.

All detailed proofs for this section are in Appendix B of the E-Companion. We not only prove the convergence of KM estimators as was done by Huh et al. (2011) but also identify convergence rates for the demand distributions and substitution probabilities. The $\bar{A} \exp (- T ϵ^{2} / \bar{B})$ -nature of both right-hand sides conveys the identical message of exponential convergences. For conciseness, we do not spell out the estimated parameters’ dependencies on the data set $S_{T}$ . More explicit notations might produce the likes of ${\hat{f}}_{i, k} (d ∣ S_{T})$ and ${\hat{p}}_{j i} (S_{T})$ .

5.2. Convergence of the Profit Function

With the estimated demand distributions and substitution probabilities given, our retailer can determine optimal replenishment quantities. To describe the problem that he faces, let $D_{i}$ denote the period-wise base demand for product $i$ , and let $B_{j i} (1), B_{j i} (2), \dots$ be the Bernoulli random variables that trigger the various $j$ -to- $i$ substitution requests in the involved period. Given the unit selling prices $s_{i}$ , variable ordering costs $c_{i}$ , and unit salvage values $v_{i}$ where $s_{i} > v_{i}$ , the retailer faces the following expected profit function:

\begin{aligned} Π (q_{1}, q_{2}) & \equiv \sum_{i = 1}^{2} {s_{i} E [min {q_{i}, D_{i} \\ + W (D_{j}, q_{j} ∣ B_{j i} (1), B_{j i} (2), \dots)}] - c_{i} q_{i} \\ + v_{i} (q_{i} - E [min {q_{i}, D_{i} \\ + W (D_{j}, q_{j} ∣ B_{j i} (1), B_{j i} (2), \dots)}])}, \end{aligned}

(18)

where

W (d, q | b (1), b (2), \dots)

has been defined at (1).

This may appear as a one-shot problem. However, with each salvage value $v_{i}$ understood as $c_{i} - h_{i}$ , the difference between the variable ordering cost $c_{i}$ and the unit holding cost rate $h_{i}$ , (18) can help find the best order-up-to levels for a retailer that operates in the long run. Moreover, substitutions may appear to be processed at the end of each period. In actuality, this is not necessary. Both points are addressed in Appendix B of the E-Companion. The optimal objective and optimal order-up-to levels can be expressed as follows:

\begin{aligned} \begin{aligned} Π^{*} & = {max}_{q_{1}, q_{2} = 0}^{{\bar{d}}_{1} + {\bar{d}}_{2}} Π (q_{1}, q_{2}), \\ q^{*} & \equiv (q_{1}^{*}, q_{2}^{*}) \in {\arg \max}_{q_{1}, q_{2} = 0}^{{\bar{d}}_{1} + {\bar{d}}_{2}} Π (q_{1}, q_{2}) . \end{aligned} \end{aligned}

(19)

Due to the peculiar natures of a Markov decision process as implied by (18) and the fact from (19) that $q^{*} \geq 0$ , we can further argue in Appendix B of the E-Companion that the sufficient conditions of Heyman and Sobel (2004: Chapter 3) are satisfied. Then, repeatedly ordering up to $q^{*}$ would be the best policy for any $T$ periods if the initial stock levels are below $q^{*}$ .

Since we can only access the data set $S_{T}$ given at (5) instead of the true problem parameters, we need to approximate the expected profit function by the $\hat{Π} (q_{1}, q_{2} | S_{T})$ built on demand parameters estimated from $S_{T}$ . To obtain period-wise base demand estimates ${\hat{f}}_{i}$ , we take advantage of the independence among block-wise demands to arrive at the convolution

{\hat{f}}_{i} \equiv {\hat{f}}_{i, 1} \otimes {\hat{f}}_{i, 2} \otimes \dots \otimes {\hat{f}}_{i, K},

(20)

where each

{\hat{f}}_{i, k}

is derived at (16) and any

[f \otimes f^{'}] (d)

is equal to

\sum_{d^{'} = 0}^{d} f (d^{'}) f^{'} (d - d^{'})

. As for substitution probabilities, the

{\hat{p}}_{j i}

’s of (12) can be directly used. We therefore face the following post-learning problem:

\begin{aligned} \begin{aligned} {\hat{Π}}^{*} (S_{T}) & = {max}_{q_{1}, q_{2} = 0}^{{\bar{d}}_{1} + {\bar{d}}_{2}} \hat{Π} (q_{1}, q_{2} ∣ S_{T}), \\ {\hat{q}}^{*} (S_{T}) & \equiv ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T})) \in {\arg \max}_{q_{1}, q_{2} = 0}^{{\bar{d}}_{1} + {\bar{d}}_{2}} \hat{Π} (q_{1}, q_{2} ∣ S_{T}) . \end{aligned} \end{aligned}

(21)

Theorem 1 illustrates that the accuracy of our KM-based estimates for demand-related parameters grows with the data size $T$ . The increasing closeness between the true and estimated parameters naturally translates into that between expected profit functions achievable by the two sets of parameters through (19) and (21).

Proposition 2 (Profit Function Convergence)

Given any $T$ , order-up-to levels $q_{1}$ and $q_{2}$ , and $δ \in (0, 1 / 4]$ , there is a probability of at least $1 - 4 δ$ for

\begin{aligned} {max}_{q_{1}, q_{2} = 0}^{{\bar{d}}_{1} + {\bar{d}}_{2}} | Π (q_{1}, q_{2}) - \hat{Π} (q_{1}, q_{2} ∣ S_{T}) | \\ \leq {\bar{A}}^{prop 2} \sqrt{\frac{1}{T} \log (\frac{{\bar{B}}^{prop 2}}{δ})}, \end{aligned}

where

{\bar{A}}^{prop 2} = \sum_{i = 1}^{2} {\bar{C}}^{prop 2} ({\bar{D}}_{i}^{prop 2} + {\bar{E}}_{i}^{prop 2})

{\bar{B}}^{prop 2} = 16 ({\bar{d}}_{i} + {\bar{d}}_{j}) + \sum_{k = 1}^{K} {\bar{d}}_{j, k} (8 {\bar{d}}_{1, k} + 8 {\bar{d}}_{2, k})

{\bar{C}}^{prop 2} = 2 ({\bar{d}}_{1} + {\bar{d}}_{2})

, and for

i = 1, 2

\begin{aligned} \begin{aligned} {\bar{D}}_{i}^{prop 2} & = \frac{8 (s_{i} - v_{i}) {\bar{d}}_{i} \sqrt{2 {max}_{k = 1}^{K} {{\bar{d}}_{j, k} (2 {\bar{d}}_{1, k} + 2 {\bar{d}}_{2, k} + 1)^{2}}}}{{\underline{θ}}^{2} \underline{γ} \underline{ζ} {\underline{α}}_{i}^{2}}, \\ {\bar{E}}_{i}^{prop 2} & = \frac{4 \sqrt{2} (s_{i} - v_{i}) {\bar{d}}_{i} (\prod_{k = 1}^{K} {\bar{d}}_{i, k}) (2 {\bar{d}}_{i} + K)}{\underline{θ} {\underline{α}}_{i}} . \end{aligned} \end{aligned}

By Proposition 2, our understanding of the underlying problem would tend to be more accurate when the size of the data set grows. On top of this convergence involving identical replenishment levels, we can still demonstrate a further asymptotic proximity between the optimal values, which are presumably achieved at different replenishment levels. That is, we can show the convergence of ${\hat{Π}}^{*} (S_{T}) \equiv \hat{Π} ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T}) ∣ S_{T})$ of (21) to $Π^{*} \equiv Π (q_{1}^{*}, q_{2}^{*})$ of (19). The difference between the two is commonly called the “in-sample performance” gap.

Corollary 1 (In-Sample Performance Bound)

Given any number of observations $T$ and $δ \in (0, 1 / 4]$ , there is a probability above $1 - 4 δ$ for

| Π^{*} - {\hat{Π}}^{*} (S_{T}) | \leq {\bar{A}}^{prop 2} \sqrt{\frac{1}{T} \log (\frac{{\bar{B}}^{prop 2}}{δ})},

where

{\bar{A}}^{prop 2}

and

{\bar{B}}^{prop 2}

have been specified in Proposition 2.

To appreciate Proposition 2 and Corollary 1, imagine that we face a fixed $δ > 0$ , no matter how small it is. Then, given that any $\log (C / δ)$ term is always associated with $1 / T$ , which goes down to $0^{+}$ as $T ⟶ + \infty$ , the result indicates that with a near-one probability $1 - 4 δ$ , the profit fluctuation approaches $0^{+}$ when the data size $T$ grows up to $+ \infty$ .

Ultimately, we are interested in the plugging of a solution that is optimal to the data-derived problem back into the true problem. With the two technical results ready, we are now in a position to bound the accuracy of this data-dependent approximation:

Π^{*} - Π ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T})) \equiv Π (q_{1}^{*}, q_{2}^{*}) - Π ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T})) .

More than probably approximately correct (PAC) answers of the two propositions, we can even state the result in an expected fashion.

Theorem 2 (Regret Bound for the Two-Product Case)

There exists a positive problem-related constant ${\bar{A}}^{them 2}$ such that

Π^{*} - E [Π ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T}))] \leq {\bar{A}}^{them 2} \sqrt{\frac{\log T}{T}},

where

{\bar{A}}^{them 2} \equiv {\bar{A}}^{prop 2} + 2 {\bar{A}}^{prop 2} \sqrt{\log ({\bar{B}}^{prop 2})} + 8 max {s_{1}, s_{2}} ({\bar{d}}_{1} + {\bar{d}}_{2})

with all the involved parameters already specified in Proposition 2.

The proof of Theorem 2 relies on the following decomposition:

\begin{aligned} Π^{*} - Π ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T})) \\ = Π^{*} - {\hat{Π}}^{*} (S_{T}) + {\hat{Π}}^{*} (S_{T}) - Π ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T})) \\ \leq | Π^{*} - {\hat{Π}}^{*} (S_{T}) | + | {\hat{Π}}^{*} (S_{T}) - Π ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T})) |, \end{aligned}

(22)

where the first and second terms have been bounded by Corollary 1 and Proposition 2, respectively. Taking into account the bounded profits on any sample path, we then identify a constants

\bar{A}, \bar{B}, and \bar{C}

such that

\begin{aligned} E [Π^{*} - Π ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T}))] \\ \leq \bar{A} \sqrt{\frac{1}{T} \log (\frac{1}{δ})} + \bar{B} \sqrt{\frac{1}{T}} + \bar{C} δ . \end{aligned}

(23)

The

\sqrt{\log T / T}

-sized bound in Theorem 2 can be achieved by setting

δ = (1 / \sqrt{T}) \land (1 / 4)

. It may appear worse than Chen and Chao’s (2020)

\log T / T

-sized one. However, the latter is achieved under their

ϵ

-related Assumption 1(i), which requires certain coordination of not only demand- but also cost-related parameters. Such

\log T

\sqrt{T}

discrepancy appears in other literature; see, for example, Broder and Rusmevichientong (2012).

Indeed, even without multiple products, substitution, or censoring, the regret lower bound is already $Ω (1 / \sqrt{T})$ ; see, for example, Proposition 1 of Zhang et al. (2020) for the learning-while-doing setting and below for the current estimation-and-optimization setting.

When there is one product type, (18) could be simplified into

Π_{f} (q) = s E_{f} [min {q, D}] - c q + v (q - E_{f} [min {q, D}]),

(24)

where the subscript

f

, only to be used in this portion, is here to emphasize the underlying single-product demand distribution

f

. As (24) is but the classic newsvendor problem, one optimal solution

q_{f}^{*}

is known to be the smallest

q

satisfying

F_{f} (q) \geq (s - c) / (s - v)

. In the current single-product case, (5)’s data set

S_{T}

is merely

(d_{1}, \dots, d_{T})

. Any ordering policy

\tilde{q}

is a function of

(d_{1}, \dots, d_{T})

. Thus, the regret incurred at any underlying distribution

f

after

T

periods’ learning could be written as

Π_{f}^{*} - E_{f^{T}} [Π_{f} (\tilde{q} (S_{T}))]

, where

f^{T}

stands for the product-form probability distribution responsible for the generation of

S_{T} \equiv (d_{1}, \dots, d_{T})

Suppose $s > c > v$ . Then, for any policy $\tilde{q}$ and any $T \geq \underline{T} \equiv [4 \cdot (s - v)^{2} / (s - c)^{2}] \lor [(s - v)^{2} / (c - v)^{2}]$ , there would be some demand distribution $f$ such that

\begin{aligned} Π_{f}^{*} - E_{f^{T}} [Π_{f} (\tilde{q} (S_{T}))] \\ \geq \frac{s - v}{4} \cdot \exp (- \frac{8 \cdot (s - v)^{2}}{(s - c) \cdot (c - v)}) \cdot \frac{1}{\sqrt{T}}; \end{aligned}

(25)

see more details in Appendix B of the E-Companion. Note that this offline-setting one-shot regret is consistent with the time average of the cumulative regret for the online problem of Besbes and Muharremoglu (2013). In view of (25), our upper bound at Theorem 2 is tight up to a logarithmic term.

5.3. Submodularity of the Profit Function

For the profit maximization at a given parameter set in the fashion of (19), a common approach is to exploit the submodularity of the joint profit function (Netessine and Rudi, 2003; Zhang et al., 2021). For our setting, this property reflects diminishing marginal returns achievable by one product’s supply when another product becomes more abundant: $Π (q_{1}, q_{2} + 1) - Π (q_{1}, q_{2}) \geq Π (q_{1} + 1, q_{2} + 1) - Π (q_{1} + 1, q_{2})$ . We plan to follow this strategy as well. However, our stochastic rather than the formerly-studied proportional substitution warrants extra care. The result is shown in the following.

Theorem 3 (Submodularity)

The expected profit function $Π (\cdot, \cdot)$ defined at (18) is submodular on the lattice $[0, {\bar{d}}_{1} + {\bar{d}}_{2}]^{2}$ .

With Theorem 3, the discrete maximization problem involving a general submodular objective function is still NP-hard. Especially, we find our objective $Π (\cdot, \cdot)$ of (18) to possess neither of the L $^{♮}$ and M $^{♮}$ concavities, two properties that according to Murota (1998) and Chen and Li (2021) could otherwise help usher in efficient solution algorithms. The counterexamples have been provided in Appendix B of the E-Companion.

Fortunately, a double greedy heuristic, which optimizes in one dimension while keeping the other dimensions fixed, has turned out to be efficient in approximately maximizing a submodular function in a bounded feasible region; see, for example, Bian et al. (2017) and Zhang et al. (2021). For the current problem involving two discrete decision variables and a rectangular feasible region, the double-greedy idea can be embedded in an algorithm that achieves exact optimality at the expense of a longer running time. This modified double greedy algorithm is also detailed in Appendix B of the E-Companion. It can be shown to terminate at a global maximizer.

6. Extensions to the Multi-Product Case

We now consider the general model with multi-product substitutions. Although more complicated, the learning procedure here is an extension of its two-product counterpart.

6.1. The One-Time Substitution Model

We begin by considering a one-time substitution model where customers attempt substitution among some $I$ products at most once. We still let $D_{i, k}^{t}$ be the random base demand for product $i$ at block $k$ of period $t$ ; all these levels are independent; for the same $i$ and $k$ , the $D_{i, k}^{t}$ levels across different $t$ ’s continue to share some common CDF $F_{i, k}$ and probability mass function $f_{i, k}$ on some finite support ${0, 1, \dots, {\bar{d}}_{i, k}}$ for some ${\bar{d}}_{i, k} \geq 1$ . Again, we treat ${\bar{d}}_{i} \equiv \sum_{k = 1}^{K} {\bar{d}}_{i, k}$ as an upper bound for product $i$ ’s period-wise demand. There remains to be a block-independent probability $p_{j i}$ for a customer to turn to product $i$ when her initial request to product $j$ is unfulfilled. Certainly, $\sum_{i \neq j} p_{j i} \leq 1$ for any $j = 1, 2, \dots, I$ .

Let us define independent random variables $C_{j, k}^{t} (1), C_{j, k}^{t} (2), \dots$ with each $C_{j, k}^{t} (m)$ being binomial at parameter $\sum_{i \neq j} p_{j i} \leq 1$ . Also, let $E_{j, k}^{t} (1), E_{j, k}^{t} (2), \dots$ be independent multinomial random variables so that each $E_{j, k}^{t} (m)$ has a chance of $p_{j i} / \sum_{i^{'} \neq j} p_{j i^{'}}$ being equal to any $i \neq j$ . The previously used symbol $B_{j i, k}^{t} (m)$ should now stand for $C_{j, k}^{t} (m) \cdot 1 (E_{j, k}^{t} (m) = i)$ . Each new $B_{j i, k}^{t} (m)$ indicates that in period $t$ and block $k$ , product $j$ ’s “ $m$ -th” demand shortage generates a substitution attempt that targets product $i$ . Although $B_{j i, k}^{t} (m)$ ’s across different $i$ ’s for $I \geq 3$ are no longer independent because $\sum_{i \neq j} B_{j i, k}^{t} (m) = C_{j, k}^{t} (m)$ , they remain i.i.d. for different $m$ ’s. This independence in $m$ would help us to derive the profit submodularity in Theorem 3’ later. Generalizing on (3), the retailer can observe sales levels

z_{i, k}^{t} \equiv min {o_{i, k}^{t}, d_{i, k}^{t} + \sum_{j \neq i} W (d_{j, k}^{t}, o_{j, k}^{t} | b_{j i, k}^{t} (1), \dots)} .

(26)

For the inventory dynamic, (4) still holds true.

Under unit selling prices $s_{i}$ , variable ordering costs $c_{i}$ , and unit salvage values $v_{i}$ , the retailer faces the following expected profit function of his order-up-to levels $q_{1}, \dots, q_{I}$ :

\begin{aligned} Π (q_{1}, \dots, q_{I}) \\ \equiv \sum_{i = 1}^{I} {s_{i} E [min {q_{i}, D_{i} + \sum_{j \neq i} W (D_{j}, q_{j} ∣ B_{j i} (1), \dots)}] \\ - c_{i} q_{i} + v_{i} (q_{i} - E [min {q_{i}, D_{i} \\ + \sum_{j \neq i} W (D_{j}, q_{j} ∣ B_{j i} (1), \dots)}])}, \end{aligned}

(27)

which is a multi-product version of (18).

For the current case, the data set $S_{T}$ could still be given in the fashion of (5) albeit with $i = 1, 2$ adapted to $i = 1, 2, \dots, I$ . The following parts of $S_{T}$ can be exploited for our learning purposes:

when all other products $j \neq i$ for a particular $i$ satisfy $o_{j, k}^{t} \geq {\bar{d}}_{j, k}$ , we are certain that product $i$ faces the base demand only, thus allowing us to devise an estimate on the base demand CDFs $F_{i, k}$ ;

when $o_{j, k}^{t} = 0, \dots, {\bar{d}}_{j, k} - 1$ and yet $o_{l, k}^{t} \geq {\bar{d}}_{l, k}$ for $l \neq i, j$ , we can estimate the composite demand for product $i$ whose substitution-induced portion is solely traceable to the positivity of $p_{j i}$ .

We now have the following multi-product counterpart to (13) and (14):

\begin{aligned} y_{i, k}^{t} = & z_{i, k}^{t} \times 1 (o_{i, k}^{t} \geq 1 and o_{l, k}^{t} \geq {\bar{d}}_{l, k}, \forall l \neq i), \\ δ_{i, k}^{t} = & 1 (z_{i, k}^{t} \leq o_{i, k}^{t} - 1 and o_{l, k}^{t} \geq {\bar{d}}_{l, k}, \forall l \neq i); \end{aligned}

(28)

for

o = 0, \dots, {\bar{d}}_{j, k} - 1

\begin{aligned} y_{i (j), k}^{t} (o) = & z_{i, k}^{t} \times 1 (o_{i, k}^{t} \geq 1, o_{j, k}^{t} = o, and o_{l, k}^{t} \geq {\bar{d}}_{l, k}, \forall l \neq i, j), \\ δ_{i (j), k}^{t} (o) = & 1 (z_{i, k}^{t} \leq o_{i, k}^{t} - 1, o_{j, k}^{t} = o, and o_{l, k}^{t} \geq {\bar{d}}_{l, k}, \forall l \neq i, j) . \end{aligned}

(29)

These variables form posets that are fed into KM-type estimations for the CCDF’s of both base and composite demands in a fashion similar to the two-product case.

The CCDF’s are then used to compute expectations for the estimations of substitution probabilities in a fashion similar to our earlier approach from (9) to (12). The only differences lie in the definitions of involved coefficients. Corresponding to (9) and (10), we now have

\begin{aligned} ψ_{i (j), k} (o) & \equiv \frac{1}{T} \sum_{t = 1}^{T} 1 (o_{i, k}^{t} \geq 1, o_{j, k}^{t} = o, \\ and o_{l, k}^{t} \geq {\bar{d}}_{l, k}, \forall l \neq i, j), \\ ς_{i, k} \equiv & \frac{1}{T} \sum_{t = 1}^{T} 1 (o_{i, k}^{t} \geq 1, and o_{l, k}^{t} \geq {\bar{d}}_{l, k}, \forall l \neq i) . \end{aligned}

(30)

These coefficients are then used again in (11)’s definition of

ω_{j i, k}

and (12)’s of

{\hat{p}}_{j i}

. The complexity of this learning procedure can be similarly derived as that for the earlier two-product special case of Section 4.3. It is

O (I {max}_{i = 1}^{I} {\bar{d}}_{i} (I T + {max}_{j = 1}^{I} {\bar{d}}_{j}))

for the current multi-product setting.

For the profit maximization problem itself, definitions similar to those in (19) and (21) including $Π^{*}$ , the $q_{i}^{*}$ ’s, ${\hat{Π}}^{*} (S_{T})$ , and the ${\hat{q}}_{i}^{*} (S_{T})$ ’s, can easily be extended to the current multi-product case. Our analysis relies on multi-product versions of Assumptions 1 to 3. The new Assumptions 1 $^{'}$ and 2 $^{'}$ are straightforward generalizations of their respective non-primed versions. Although still a generalization of its former self, the new version of Assumption 3 would look slightly different.

Assumption 3 $^{'}$ . For some strictly positive constants ${\underline{ϑ}}_{1}$ and ${\underline{ϑ}}_{2}$ , the order-up-to level $O_{i, 1}^{t}$ of any product $i$ at the beginning of any period $t$ satisfies

\begin{aligned} P (O_{i, 1}^{t} \geq max_{j \neq i} {max}_{k = 1}^{K} {{\bar{d}}_{i, k} + {\bar{d}}_{j, k}} | F_{t - 1}) \geq {\underline{ϑ}}_{1}, \\ P (O_{i, 1}^{t} \leq \sum_{k = 1}^{K - 1} {\bar{d}}_{i, k} | F_{t - 1}) \geq {\underline{ϑ}}_{2}, \end{aligned}

where the inequalities above should be understood in a sample-wise fashion.

The reason for “ $max_{j \neq i}$ ” to appear in Assumption 3 $^{'}$ is that product $i$ has to occasionally have ample inventory to satisfy any one other product $j$ ’s substitution requests. Again, the current requirement concerning the ${max}_{k = 1}^{K} {{\bar{d}}_{i, k} + {\bar{d}}_{j, k}}$ ’s is weaker than that concerning the $\sum_{k = 1}^{K} {{\bar{d}}_{i, k} + {\bar{d}}_{j, k}}$ ’s. Our seven-product tests to be presented in Section 7 would partially testify to the validity of Assumptions 2 $^{'}$ and 3 $^{'}$ .

Under Assumptions 1 $^{'}$ to 3 $^{'}$ , all of the theoretical results for the two-product case can be extended to the multi-product problem under one-time substitution. For the learning part, all convergences of demand distributions, substitution probabilities, and optimal profits remain at the same respective rates over the horizon length $T$ of data. Some coefficients’ exponential growths with respect to the product number $I$ are quite indispensable: every product has to face the surplus demand from the other $I - 1$ products. For the sake of brevity, we state these results without presenting their proofs in Appendix C of the E-Companion.

For the static optimization problem under one-time substitution, we also have the multi-product version of Theorem 3 regarding the submodularity of the objective function.

Theorem 3 $^{'}$ (Submodularity for Multi-Product Case). The expected profit function $Π (\cdot, \dots, \cdot)$ defined at (27) is submodular on the lattice $[0, \sum_{i = 1}^{I} {\bar{d}}_{i}]^{I}$ .

However, the derivation for the multi-product case is considerably more involved than that for its two-product counterpart. We present the details in Appendix E of the E-Companion. To tackle the resulting combinatorial replenishment problem, we adopt a well-established double-greedy heuristic for maximizing submodular functions; see, for example, Bian et al. (2017) and Zhang et al. (2021). In Appendix E of the E-Companion, we describe this heuristic, which iteratively constructs two complementary solutions and merges them through probabilistic updates to achieve a guaranteed constant-factor approximation. We also conduct numerical validations of this heuristic to examine its approximation quality and sensitivity to estimation errors.

6.2. The Markov-Chain Substitution Model

We also consider a Markov-chain substitution model, where customers may attempt multiple substitutions until they either meet substitutes or give up on their own volition. Our algorithm would remain effective while achieving a similar regret rate albeit with a less expressive coefficient. Markov-chain choice models as studied by Blanchet et al. (2016) and Şimşek and Topaloglu (2018) were initially not intended for our case with a paucity of data. Nevertheless, we can incorporate into our model the idea of demand shifting according to a homogeneous Markov chain structure.

As in the one-time substitution, each product at each block has a base demand, and the substitution probabilities are block-independent. If product $i$ is available, a customer will purchase it and then leave the system. If it is out of stock, the customer either switches to another product $j \neq i$ with probability $p_{i j}$ or with a probability $1 - \sum_{j \neq i} p_{i j}$ , leaves the system without making any purchase. Upon switching to a product $j$ , her request is treated as a base demand for that product. If $j$ is also out of stock, the customer will switch again according to the transition probabilities $p_{j k}$ for $k \neq j$ . This process will go on until the customer either successfully purchases a product or exits the system without purchasing. For this to work, we would naturally require that $\sum_{j \neq i} p_{i j} < 1$ for every $i$ . This is compatible with our European-chain example as shown in Figure 1. It is also commonly adopted for Markov-chain choice models; see, for example, Blanchet et al. (2016).

For the multi-product algorithm, recall that (28) estimates base-demand distributions using data in which the starting inventories of all products exceed their respective upper bounds. To brace for the impacts of Markov-chain substitution, we need to modify (29) to

\begin{aligned} y_{i (j), k}^{t} (o) = z_{i, k}^{t} \times 1 (o_{j, k}^{t} = o, o_{i, k}^{t} \geq 1, and o_{l, k}^{t} \geq {\bar{d}}_{l, k} + {\bar{d}}_{l, j} - o, \\ \forall l \neq i, j), \\ δ_{i (j), k}^{t} (o) = 1 (z_{i, k}^{t} \leq o_{i, k}^{t} - 1, o_{j, k}^{t} = o, and o_{l, k}^{t} \geq {\bar{d}}_{l, k} + {\bar{d}}_{l, j} - o, \\ \forall l \neq i, j), \end{aligned}

for

o = 0, \dots, {\bar{d}}_{j, k} - 1

. This would guarantee that the observed demand for product

i

has covered both the base demand and the one-time substitution demand from product

j

, as all other products maintain sufficient inventories to cover their own base demand and any substitution demand from product

j

. Concerning the estimation for substitution probabilities, we need to modify

ψ_{i (j), k} (o)

in (30) to

\frac{1}{T} \sum_{t = 1}^{T} 1 (o_{i, k}^{t} \geq 1, o_{j, k}^{t} = o, o_{l, k}^{t} \geq {\bar{d}}_{l, k} + {\bar{d}}_{j, k} - o, \forall l \neq i, j) .

Unfortunately, the profit function

Π (q_{1}, \dots, q_{I})

would depend on underlying parameters made up of the base demand distributions and substitution probabilities in a more complex way under Markov-chain substitution than that revealed by (27) under one-time substitution. Indeed, we doubt the possibility of ever being able to express the profit as an explicit function of the parameters in the new scenario. Such dependencies may only be achieved through simulations. Thus, the multi-product extension of Theorem 2 remains elusive so far. Fortunately, alternative techniques can help us work out a bound with the same

T

-trend.

Let $I$ and $J$ be the sets of all products and available products, respectively. We find it essential to work out the $p_{i k} (J)$ ’s with each standing for the eventual probability that a customer originally intending to purchase product $i$ ends up purchasing one available product $k \in J \subseteq I$ after potentially multiple rounds of substitution attempts. According to Blanchet et al. (2016) and Şimşek and Topaloglu (2018), these probabilities satisfy the linear system: $ϱ_{i k} (J) = p_{i k} + \sum_{l \in I ∖ J} ϱ_{i l} (J) \cdot p_{l k}$ . Let $ϱ_{K_{1}, K_{2}} (J) = (ϱ_{i k} (J))_{i \in K_{1}, k \in K_{2}}$ for any $K_{1}, K_{2} \subseteq I$ . Note we have assumed $\sum_{k \in I ∖ {i}} p_{i k} < 1$ . By Şimşek and Topaloglu (2018), there would be a unique $ϱ_{i, I ∖ J} (J)$ that solves

ϱ_{i, I ∖ J} (J) = p_{i, I ∖ J} (I - p_{I ∖ J, I ∖ J})^{- 1},

(31)

where

I

denotes the identity matrix of an appropriate dimension. Leveraging (31), we can arrive to

ϱ_{i, J} (J) = p_{i, J} + p_{i, I ∖ J} (I - p_{I ∖ J, I ∖ J})^{- 1} p_{I ∖ J, J} .

(32)

It is by further exploiting (32) that we have come to the following regret bound; see details in Appendix C of the E-Companion.

Theorem 4 (Regret Bound Under Markov-Chain Substitution)

Under the Markov-chain substitution model, there exists a positive problem-related constant such that

Π^{*} - E [Π ({\hat{q}}_{1}^{*} (S_{T}), \dots, {\hat{q}}_{I}^{*} (S_{T}))] \leq {\bar{A}}^{them 4} \sqrt{\frac{\log T}{T}} .

Though not presented due to space limitations, we have multi-product counterparts to Proposition 1, Theorem 1, Proposition 2, Corollary 1, and Theorem 2. The multi-product version of the coefficient ${\bar{A}}^{them 2}$ can be expressed and hence with its exponential dependencies on $I$ and $K$ clearly demonstrated. This is unfortunately not the case with the current coefficient ${\bar{A}}^{them 4}$ even though Theorem 4 has the same $T$ -rate as the multi-product version of Theorem 2. For the current Markov-chain substitution, $Π$ ’s dependencies on the underlying demand-related parameters are much more involved than those expressible by (18) or its multi-product counterpart (27). In our derivation, intricacies about Markov-chain substitution such as those reflected in (32) will have to be dealt with. When attempting to optimize $Π (\cdot)$ , we can use Monte Carlo simulation to circumvent the lack of a closed form. A greedy method iteratively visiting the various coordinates standing for products is found to be competitive against exhaustive search. See details in Appendix E of the E-Companion.

7. Numerical Studies

Our numerical studies use both synthetic and real data. Our real data comes from a central bakery serving multiple stores of a European grocery chain. They can also help to partially verify assumptions.

7.1. Synthetic-Data Experiments

We conduct numerical studies to evaluate the performance of our KM-based learning algorithm under different number of products. We also investigate settings in which the true demand upper bounds are unknown. In Appendix D of the E-Companion, we compare our algorithm with the online method of Chen and Chao (2020) and examine the robustness of our approach. In these synthetic-data experiments, block-wise demands within each sixteen-block period follow bounded Poisson distributions. Details are provided in Appendix E of the E-Companion.

7.1.1. Experiments With Different Product Numbers

Here, we evaluate the performance of our algorithm in both two-product and twenty-product settings. Being measured are two relative errors corresponding to Corollary 1 and Theorem 2, respectively:

\begin{aligned} In-Sample Gap: | Π^{*} - {\hat{Π}}^{*} (S_{T}) | / Π^{*}, \\ Regret: (Π^{*} - Π ({\hat{q}}_{1}^{*} (S_{T}), {\hat{q}}_{2}^{*} (S_{T}))) / Π^{*} . \end{aligned}

Note that the first measures the profit-wise difference between the model perceived from real data and the ground truth; whereas, the second measures the quality of the decisions reached for the perceived model when being applied back to the actual one. The test results for the various cases are presented in Table 1. The sub-cases a, b, c, and d refer to different ways in which substitution probabilities are assumed.

Table 1.
Synthetic numerical example for the in-sample gap and regret.

Exp. Number of products $T = 500$ $T = 1000$ $T = 2000$ $T = 5000$ $T = 10, 000$

1.a 2 In-sample gap 1.42% 1.07% 0.52% 0.21% 0.03‱

Regret 1.23% 0.99% 0.61% 0.02% 0.00%

1.b 2 In-sample gap 1.46% 1.03% 0.34% 0.07% 0.01‱

Regret 1.19% 0.53% 0.09% 0.02% 0.00%

1.c 20 In-sample gap 12.11% 9.73% 5.89% 2.93% 1.13%

Regret 11.78% 9.82% 6.23% 3.27% 1.24%

1.d 20 In-sample gap 13.73% 9.32% 6.93% 2.96% 1.06%

Regret 13.14% 10.77% 6.39% 3.91% 1.43%

Exp.	Number of products		$T = 500$	$T = 1000$	$T = 2000$	$T = 5000$	$T = 10, 000$
1.a	2	In-sample gap	1.42%	1.07%	0.52%	0.21%	0.03‱
		Regret	1.23%	0.99%	0.61%	0.02%	0.00%
1.b	2	In-sample gap	1.46%	1.03%	0.34%	0.07%	0.01‱
		Regret	1.19%	0.53%	0.09%	0.02%	0.00%
1.c	20	In-sample gap	12.11%	9.73%	5.89%	2.93%	1.13%
		Regret	11.78%	9.82%	6.23%	3.27%	1.24%
1.d	20	In-sample gap	13.73%	9.32%	6.93%	2.96%	1.06%
		Regret	13.14%	10.77%	6.39%	3.91%	1.43%

From Table 1, we observe that the relative errors decrease steadily across all settings. In the two-product case, our estimates converge rapidly to the ground truth, with the in-sample gap and regret both falling below 1.5% by $T = 500$ . This confirms the convergence behavior predicted by Corollary 1 and Theorem 2. These fast convergences are achieved with $K = 16$ blocks, suggesting that the theoretical exponential-in-K worst case is unlikely to pose practical problems.

When $I = 20$ , it would take $T = 2000$ for the relative regret to get below $7 %$ and $T = 10, 000$ for it to drop down to around $1 %$ . These convergences have corroborated with Corollary 1 $^{'}$ and Theorem 2 $^{'}$ . On the other hand, $I$ indeed slows down these processes more severely than $K$ does. As we have shown in Figure 1, in real practice, the products can often be grouped into small clusters. Since learning procedures can be done one cluster at a time, a large product number in the beginning might not be as serious a problem.

Moreover, the empirical sensitivities of regret with respect to $K$ and $I$ are far milder than what the theoretical bounds could conservatively guarantee. Although the theoretical constant ${\bar{A}}^{them 2^{'}}$ can exceed $10^{56}$ when $K = 16$ and $I = 20$ , our numerical results indicate that an effective constant below $3 \times 10^{3}$ suffices in practice, implying that the theoretical worst-case bound might have much room for improvement.

We also examine the computational scalability of our methods. The learning procedure for the multi-product case has a polynomial complexity of $O (I max_{i} {\bar{d}}_{i} (I T + max_{i} {\bar{d}}_{i}))$ as shown in Section 6.1. In our experiments, the actual computation times scale approximately linearly with $T$ : for the two-product case, from 0.3 seconds ( $T = 500$ ) to 6 seconds ( $T = 10, 000$ ), and for the twenty-product case, from 40 to about 830 seconds. These demonstrate that even in large-scale multi-product settings, the proposed estimation and optimization procedures remain computationally efficient and readily implementable in practice.

7.1.2. Experiments With Unknown Demand Upper Bounds

In this experiment, we evaluate the robustness of our KM-based learning algorithm when the true demand upper bounds are not directly observable. Such cases frequently arise in practice because retailers can only observe realized sales, which are often censored by inventory availability. To approximate these unobservable upper bounds, we use different quantiles of the observed sales as proxies for demand upper bounds and then examine how the choices of quantiles affect algorithmic performances.

For each product and block, we take the 50%, 60%, 70%, 80%, 90%, 95%, and 99% quantiles of observed sales as candidate bounds. The algorithm is then executed using these quantiles. The resulting regret levels across different period lengths $T$ are reported in Table 2.

Table 2.
Regrets when using quantiles of sales as demand upper bounds.

Applied quantiles of sales $T = 500$ $T = 1000$ $T = 2000$ $T = 5000$ $T = 10, 000$

50% 40.41% 38.69% 37.13% 38.07% 37.25%

60% 38.07% 38.29% 38.07% 36.96% 38.12%

70% 10.36% 9.87% 9.78% 8.97% 8.96%

80% 10.13% 10.13% 10.03% 9.87% 9.87%

90% 6.45% 4.30% 2.33% 2.30% 1.75%

95% 5.28% 3.55% 2.05% 1.09% 0.72%

99% 5.59% 3.67% 1.96% 0.77% 0.61%

True upper 5.17% 3.34% 1.99% 0.87% 0.63%

bounds

Applied quantiles of sales	$T = 500$	$T = 1000$	$T = 2000$	$T = 5000$	$T = 10, 000$
50%	40.41%	38.69%	37.13%	38.07%	37.25%
60%	38.07%	38.29%	38.07%	36.96%	38.12%
70%	10.36%	9.87%	9.78%	8.97%	8.96%
80%	10.13%	10.13%	10.03%	9.87%	9.87%
90%	6.45%	4.30%	2.33%	2.30%	1.75%
95%	5.28%	3.55%	2.05%	1.09%	0.72%
99%	5.59%	3.67%	1.96%	0.77%	0.61%
True upper	5.17%	3.34%	1.99%	0.87%	0.63%
bounds

The results in Table 2 indicate that the 95%-quantile of sales provides a strong balance between robustness and accuracy. Lower quantiles (e.g., 60%–80%) tend to underestimate the true upper bounds, leading to premature censoring and thus higher regrets. Conversely, using overly high quantiles, such as 99%, yields minimal additional improvements at the expense of unnecessary computational burdens. The 95%-quantile consistently delivers low regrets, around 1% for $T = 10, 000$ , and its performance is nearly identical to that obtained by imposing true demand upper bounds.

7.2. Real Data Experiments

This numerical study is based on a European grocer with hundreds of stores, ranging from small convenience stores in city centers to superstores at suburban locations. We focus on the product category of pre-baked fresh bread, as it dominates store-level waste costs. While representing 40% of baked goods revenue for our selected stores, it represents over 90% of baked goods spoilage, measured in euros.

Each store receives deliveries of pre-baked fresh goods from a local bakery, which does one milk run to multiple stores every day before the stores open. These deliveries are contractually agreed, and do not affect the order quantity decision. On any given day each grocer may request up to 30 fresh pre-baked SKUs, which can be segmented by price (premium or economy), and independently by grain (plain white vs. wholegrain), leading to potential substitutions within each segment.

Store managers at the grocery place order toward the end of each business day, while production at the bakery takes place overnight and all deliveries occur before the stores open. The stock at a grocer is then sold over a business day ( $K = 16$ hours). As any leftover stock is written off, there is no salvage value.

We carry out partial validations of Assumptions 2 and 3; indeed, our seven-product data set would enable those of their multi-product counterparts, Assumptions 2 $^{'}$ and 3 $^{'}$ . After learning relevant proportions from our 343-day-long data, we find it safe to use any value below 15% as the ${\underline{α}}_{i, k}$ ’s in Assumptions 2 and 2 $^{'}$ .

Assumptions 3 and 3 $^{'}$ are, on the other hand, more difficult to check empirically due to their insistence on the condition of almost every sample path. Nevertheless, we have managed to validate them partially. The entire data set can be divided into two, four, and eight sections. We can then estimate the two parameters ${\underline{ϑ}}_{1}$ and ${\underline{ϑ}}_{2}$ for every section. Our ultimate estimates for the various division schemes would come from taking respective minimums of these section-wise estimates.

Our estimates for ${\underline{ϑ}}_{1}$ under the four division schemes are, respectively, $8.3 %$ , $4.7 %$ , $2.1 %$ , and $1.6 %$ . Therefore, it seems reasonable to assume a ${\underline{ϑ}}_{1}$ that is above $1 %$ for the current case. Our estimates for ${\underline{ϑ}}_{2}$ under the four division schemes are, respectively, $100 %$ , $100 %$ , $100 %$ , and $98.6 %$ . Therefore, it seems reasonable to assume a ${\underline{ϑ}}_{2}$ that is above $90 %$ .

We then compare our KM-based learning algorithm against the main benchmarks. There are two popular methods for validating a policy against a data set: In-sample Performance and Out-of-sample Performance.

For In-Sample Performance, we use an entire data set $S_{T}$ to make demand-distribution and substitution-probability estimates. After reaching the optimal order-up-to levels ${\hat{q}}^{*} (S_{T})$ using the estimated parameters, we obtain the expected profit ${\hat{Π}}^{*} (S_{T}) \equiv \hat{Π} ({\hat{q}}^{*} (S_{T}) ∣ S_{T})$ , where the profit function $\hat{Π} (\cdot | S_{T})$ is data-based as well. The defining feature here is that the data used for training is also used for testing.

For Out-of-Sample Performance, we split the data set $S_{T}$ into some $T^{'}$ -long $S_{Train}$ for training and the remaining $(T - T^{'})$ -long $S_{Test}$ for testing. The days need not be consecutive. We use the training set to obtain problem parameters and their corresponding product-wise order-up-to levels ${\hat{q}}_{i}^{*} (S_{Train})$ . However, the latter are not tested on the former. Instead, we use the testing set to generate a new cohort of problem parameters, which are further used to calculate the testing-generated profit function $\hat{Π} (\cdot ∣ S_{Test})$ . It is to this testing-generated problem that we apply the training-generated decisions in order to achieve a profit $\hat{Π} ({\hat{q}}^{*} (S_{Train}) ∣ S_{Test})$ . This approach also tests for the presence of over-fitting.

7.2.1. Methods Considered

We compare the following learning methods to be applied on real data sets.

In sample average approximation (SAA) without substitution, we use sample averages of daily demand data while pretending there were no substitutions in order to construct empirical demand distributions. For optimization, we treat the multi-product problem as one where replenishment quantities for different products are set separately.

In the KM-based learning approach without substitution, we ignore substitution effects by forcing $p_{j i} = 0$ and estimate the demand distributions with hourly sales data.

In the KM-based learning approach with substitution effects, we take substitutions into account in both our learning and optimization.

7.2.2. Experimental Results

We carry out experiments for two-, three, and seven-product scenarios using real data. For the two-product case, we run experiments for two stores. For Store #1, we have $T = 371$ days’ worth of sales data for two substitutable products with the sales prices $s_{1} = 1.68$ and $s_{2} = 1.83$ , as well as unit costs $c_{1} = 1.09$ and $c_{2} = 1.32$ . For Store #2, we have $T = 394$ days’ worth of sales data for two products with sales prices $s_{1} = 1.50$ and $s_{2} = 1.06$ , as well as unit costs $c_{1} = 1.83$ and $c_{2} = 1.32$ . For both stores, we use all three learning methods to produce their in-sample and out-of-sample performances. For the latter, we use $T^{'} = 248$ for training and $T - T^{'} = 123$ for testing in Store #1; also, we use $T^{'} = 263$ for training and $T - T^{'} = 131$ for testing in Store #2. We report the experiment results of three- and seven-product scenarios in Appendix D.3 of the E-Companion.

The resulting profits for the two-product test are listed in Table 3. Besides three columns corresponding to the three methods, we reserve one column for each store’s performance resulting from its actual practice.

Table 3.
Profits under different policies for the two-product case.

Store Actual practice SAA without substitution KM without substitution KM with substitution

1 In-sample performance 11.95 14.81 14.97 15.22

Out-of-sample performance 14.64 15.12 15.67 16.28

2 In-sample performance 6.51 8.47 8.69 8.89

Out-of-sample performance 7.52 8.27 8.34 8.88

Store		Actual practice	SAA without substitution	KM without substitution	KM with substitution
1	In-sample performance	11.95	14.81	14.97	15.22
	Out-of-sample performance	14.64	15.12	15.67	16.28
2	In-sample performance	6.51	8.47	8.69	8.89
	Out-of-sample performance	7.52	8.27	8.34	8.88

SAA = sample average approximation; KM = Kaplan–Meier.

From Table 3, we see that in terms of in-sample performance, our method (KM with substitution) is much better than not only the actual practice but also the with-learning case of SAA for both stores. We also note that KM without substitution, although good enough, is inferior to our method with the substitution effects being considered. On the other hand, we caution that due to de-censoring, the KM estimator describes overall higher demand than the SAA, and adding substitution would mechanically boost the revenue.

For Store #1, the out-of-sample profit is slightly higher than the in-sample one. While the latter is a product of the entire data set, the out-of-sample profit is obtained by optimizing with parameters estimated from the first two-thirds of the data and evaluating the resulting fixed decision under parameters re-estimated from the remaining one-third. In this data set, the realized demand levels during the final one third happen to be higher than those in earlier periods, making this segment more favorable to fixed inventory decisions. A similar pattern is observed for all benchmark approaches. Thus, the higher out-of-sample performance is largely attributable to data characteristics.

8. Concluding Remarks

We have studied demand learning and replenishment optimization for a multi-product inventory management problem involving both demand censoring and stochastic substitutions. Our efforts have led to a data-driven approach for offline demand learning, the convergence of estimates and attainment of tight regret bounds, an optimization algorithm for the two-product case, and the submodularity of the multi-product case’s objective function. These accomplishments were inspired by a practical retailing problem, to which the literature has not yet provided a satisfactory solution.

There remain promising directions for future research. An efficient solution procedure for the multi-product profit-optimization problem remains elusive. Also, it would be better if our KM-based estimation method relying on only weak censoring indicators could be applied to other inventory problems with complex structures. We have not treated time-varying demand structures. Learning of time-varying parameters has always been a challenging issue; see, for example, Mao et al. (2021) and Chen (2021). New techniques will be required to simultaneously deal with demand censoring and stochastic substitutions.

The block-independence feature of substitution probabilities is furthermore a point that has been exploited by our learning algorithm. In practice, each $p_{j i}$ may be interpreted as a long-run average that smooths over short-term fluctuations. It may also be a macroscopic manifestation of more nuanced situations where customers of different categories subject their individual substitution decisions to the influences of the various products’ attributes and contextual factors such as promotions and seasonality. We have left such microscopic details unexplored. Future research may take a closer look at these remaining issues.

Supplemental Material

sj-pdf-1-pao-10.1177_10591478261424592 - Supplemental material for Offline Learning and Optimization for Multi-Product Inventory Management With Stockout-Based Substitution

Supplemental material, sj-pdf-1-pao-10.1177_10591478261424592 for Offline Learning and Optimization for Multi-Product Inventory Management With Stockout-Based Substitution by Yijie Zheng, Youhua (Frank) Chen, Carl Philip T Hedenstierna and Jian Yang in Production and Operations Management

Footnotes

Acknowledgments

The authors thank the Departmental Editor, the Senior Editor, and three anonymous reviewers for valuable comments and suggestions throughout the whole review process.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Hong Kong Research Grants Council (RGC) General Research Fund [Grant CityU 11502122 to Chen YF and Grant PolyU 15511925 to Zheng Y].

ORCID iDs

Yijie Zheng

Youhua (Frank) Chen

Carl Philip T Hedenstierna

Jian Yang

Supplemental Material

Supplemental material for this article is available online (doi: ).

How to cite this article

Zheng Y, Chen YF, Hedenstierna CPT and Yang J (2026) Offline Learning and Optimization for Multi-Product Inventory Management With Stockout-Based Substitution. Production and Operations Management XX(XX) 1–20.

References

Abdallah

Vulcano

(2021) Demand estimation under the multinomial logit model from sales transaction data. Manufacturing Service Operation Management 23(5): 1196–1216.

Anupindi

Dada

Gupta

(1998) Estimation of consumer demand with stock-out based substitution: An application to vending machine products. Marketing Science 17(4): 406–423.

Besbes

Muharremoglu

(2013) On implications of demand censoring in the newsvendor problem. Management Science 59(6): 1407–1424.

Bian

Mirzasoleiman

Buhmann

, et al. (2017) Guaranteed non-convex optimization: Submodular maximization over continuous domains. In: Artificial Intelligence and Statistics, pp.111–120. PMLR.

Blanchet

Gallego

Goyal

(2016) A markov chain approximation to choice modeling. Operations Research 64(4): 886–905.

Broder

Rusmevichientong

(2012) Dynamic pricing under a general parametric choice model. Operations Research 60(4): 965–980.

Buzek

(2018) Out of stocks, out of luck: How retailers alienate customers and lose billions due to poor inventory practices. https://www.radial.com/files/2022/06/out-of-stock-solutions.pdf.

Chen

(2021) Data-driven inventory control with shifting demand. Production and Operations Management 30(5): 1365–1385.

Chen

Chao

(2020) Dynamic inventory control with stockout substitution and demand learning. Management Science 66(11): 5108–5127.

10.

Chen

(2021) Discrete convex analysis and its applications in operations: A survey. Production and Operations Management 30(6): 1904–1926.

11.

Chen

Owen

Pixton

, et al. (2022) A statistical learning approach to personalization in revenue management. Management Science 68(3): 1923–1937.

12.

eMarketer (2022) Total retail sales worldwide from 2020 to 2025 (in trillion U.S. dollars). https://www.statista.com/statistics/443522/global-retail-sales/ (accessed 28 August 2019).

13.

Foldes

Rejto

(1981) Strong uniform consistency for nonparametric survival curve estimators from randomly censored data. Annals of Statistics 9(1): 122–129.

14.

Gruen

Corsten

(2007) A comprehensive guide to retail out-of-stock reduction in the fast-moving consumer goods industry.

15.

Heyman

Sobel

(2004) Stochastic Models in Operations Research: Stochastic Optimization. Mineola, NY: Dover Publications.

16.

Huh

Levi

Rusmevichientong

, et al. (2011) Adaptive data-driven inventory control with censored demand based on Kaplan-Meier estimator. Operations Research 59(4): 929–941.

17.

Huh

Rusmevichientong

(2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Mathematics of Operations Research 34(1): 103–123.

18.

Jain

Rudi

Wang

(2014) Demand estimation and ordering under censoring: Stock-out timing is (almost) all you need. Operations Research 63(1): 134–150.

19.

Kaplan

Meier

(1958) Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53(282): 457–481.

20.

Kök

(2011) Optimal and competitive assortments with endogenous pricing under hierarchical consumer choice models. Management Science 57(9): 1546–1563.

21.

Levi

Roundy

Shmoys

(2007) Provably near-optimal sampling-based policies for stochastic inventory control models. Mathematics of Operations Research 32(4): 821–839.

22.

Lyu

Zhang

Xin

(2024) Ucb-type learning algorithms with Kaplan–Meier estimator for lost-sales inventory models with lead times. Operations Research 72(4): 1317–1332.

23.

Mao

Zhang

Zhu

, et al. (2021) Near-optimal model-free reinforcement learning in non-stationary episodic mdps. In: Int’l Conf. Mach. Learn, pp.7447–7458. PMLR.

24.

Murota

(1998) Discrete convex analysis. Mathematical Programming 83: 313–371.

25.

Musalem

Olivares

Bradlow

, et al. (2010) Structural estimation of the effect of out-of-stocks. Management Science 56(7): 1180–1197.

26.

Nagarajan

Rajagopalan

(2008) Inventory models for substitutable products: Optimal policies and heuristics. Management Science 54(8): 1453–1466.

27.

Netessine

Rudi

(2003) Centralized and competitive inventory models with demand substitution. Operations Research 51(2): 329–335.

28.

Newman

Ferguson

Garrow

, et al. (2014) Estimation of choice-based models using sales data from a single firm. Manufacturing Service Operation Management 16(2): 184–197.

29.

Schlapp

Fleischmann

(2018) Multiproduct inventory management under customer substitution and capacity restrictions. Operations Research 66(3): 740–747.

30.

Shi

Chen

Duenyas

(2016) Nonparametric data-driven algorithms for multiproduct inventory systems with censored demand. Operations Research 64(2): 362–370.

31.

Şimşek

Topaloglu

(2018) An expectation-maximization algorithm to estimate the parameters of the Markov chain choice model. Operations Research 66(3): 748–760.

32.

Vulcano

Van Ryzin

Ratliff

(2012) Estimating primary demand for substitutable products from sales transaction data. Operations Research 60(2): 313–334.

33.

Zhang

Chao

Shi

(2020) Closing the gap: A learning algorithm for lost-sales inventory systems with lead times. Management Science 66(5): 1962–1980.

34.

Zhang

Xie

Sarin

(2021) Multiproduct newsvendor problem with customer-driven demand substitution: A stochastic integer program perspective. INFORMS Journal on Computing 33(3): 1229–1244.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.08 MB