Governed agentic AI for retail baskets: A consumer world model with inventory-aware actions

Abstract

Retail recommendation systems increasingly operate as real-time decision engines that must personalize suggestions while respecting operational constraints such as inventory availability, category rules, and promotion policies. This is especially challenging in basket-based retail because transactions are set-valued and the observed checkout order is operational rather than behavioral. We study a governed agentic AI system for basket recommendation in which a Bayesian consumer world model serves as the agent’s internal state. The model maintains calibrated beliefs over latent shopping profiles and updates them online as items are observed while representing basket context in an order-invariant way. The agent then selects recommendation slates under explicit governance, combining interpretable control levers (e.g., profile-context trade-off, bounded exploration) with operational guardrails and feasibility masking (e.g., in-stock status, category, promotion eligibility). Using large-scale grocery transaction data, we evaluate a framework in both offline next-item prediction and an operations-coupled simulator with inventory and promotion dynamics. The agent achieves a higher hit rate than a nonagentic variant as well as various strong item–item baselines under a common holdout protocol. In the simulation, ranking accuracy changes little, yet the agent delivers substantial gains in revenue and inventory productivity by steering demand toward feasible complements and enabling controlled substitution when constraints bind. This highlights an operations insight: under binding feasibility, decision quality can improve through constraint-aware substitutions and inventory coupling even when conventional ranking metrics remain unchanged. Overall, the results show how basket-aware demand models can be deployed as governed, agentic policies that coordinate personalization with operational objectives.

Keywords

Inventory-Constrained Personalization Operational Governance and Controls Retail Decision Automation Basket Recommendation Agentic AI

1. Introduction

Recent advances in generative artificial intelligence (GenAI) and agentic AI are shifting personalization from better predictions to autonomous decision-making at the interface of marketing and operations. Leading digital platforms, including Amazon, Walmart, and Alibaba, are already transitioning from static recommendation rules to intelligent systems that dynamically coordinate inventory and fulfillment in real time (Jassy, 2025; Law, 2025; Malik, 2026). As companies increasingly delegate pricing and recommendation to AI systems, this shift raises urgent governance questions regarding trust, control, and objective misalignment. A purely “marketing” agent may recommend unavailable items or over-rotate toward demand at the expense of margins, while a purely “operations” system may become rigid and undermine consumer engagement. The central question is therefore not only how to personalize, but how to design agentic decision systems that coordinate demand-shaping and operational feasibility safely and transparently? This need is particularly acute in retail recommendation, where retailers must personalize assortments and suggestions under constraints such as inventory availability, promotion rules, category balance, and margin targets. Nevertheless, much of the recommender-systems literature remains focused on offline ranking metrics, treating operational constraints as postprocessing rather than as first-class inputs to the decision policy. In this work, we seek to make the coordination between marketing and operations explicit by treating both consumer relevance and operational feasibility first-class in the decision policy. From the perspective of operations management (OM), our core contribution is to treat recommendations as a constrained decision-making problem under feasibility constraints, where value depends on coordinating demand with inventory and business rules rather than simply on offline predictive accuracy.

Basket-based retail further complicates the problem. Grocery and mass-merchant transactions are inherently set-valued because consumers purchase multiple items together, and the relevance of a candidate product depends not only on long-run preferences but also on what is already in the basket. Moreover, transaction data record baskets at checkout rather than the latent sequence by which intentions form, and the scan order is operational rather than behavioral. As a result, sequence-based assumptions are empirically fragile and operationally misaligned. These features pose two challenges. First, models must capture within-basket complementarities and substitutions without relying on unobserved item sequences. Second, recommendation systems must be deployable as online policies that adapt to evolving baskets under explicit governance, rather than as static rankers optimized simply for offline accuracy. Addressing these challenges requires a consumer-level representation that can be updated online and a decision layer that selects recommendations from a constrained action set under auditable controls.

We address these challenges by proposing an agentic basket recommender that combines probabilistic demand modeling and operational control through a two-layer architecture: a consumer world model and a governed decision layer:

(1)
A consumer world model is a compact probabilistic representation of shopper intent and within-trip coherence that supports belief-state decision-making. Concretely, our consumer world model maintains and updates a belief over a small number of interpretable latent shopping profiles (e.g., pantry stock-up, fresh meals, household care) and evaluates candidate items using an order-invariant, set-based representation of the current basket. Note that “world model” here does not refer to a language model. Rather, it is a Bayesian demand model learned from transaction sets that provides the agent’s internal state (beliefs over profiles) and predictive structure (how basket composition shifts candidate plausibility) needed for sequential decision-making under uncertainty.
(2)
The second layer operationalizes the consumer world model as a sequential decision policy. As a trip unfolds, the agent updates its belief over latent profiles using the same sufficient statistics that drive offline inference and scores candidates by combining long-run preference fit with within-trip basket coherence. The key design choice is reuse. Instead of training a separate ranker or introducing ad hoc heuristics, the online policy reuses the probabilistic objects learned offline as its scoring primitives.

The agent is governed by design. In retail, recommendations are only valuable if they are feasible and controllable. Thus, recommendations must respect inventory availability, promotion eligibility, category policies, and margin or compliance guardrails. To this end, we embed governance directly into the decision rule rather than treating it as a postprocessing patch. Governance enters in two ways: the agent acts over a constrained action set induced by operational masking and business rules applied before ranking, and the policy exposes a small set of auditable levers that regulate behavior within that governed action space. In our framework, these levers include a profile-context trade-off that determines how strongly the policy emphasizes stable preferences versus within-trip coherence, and a bounded exploration rate that manages uncertainty without sacrificing control. This separation between the learned consumer world model and the governed decision layer enables monitoring and adjustment when constraints bind or conditions change, without the need for re-estimation of the underlying demand structure.

Through our empirical study, we discover insights into operations as well as implications for the operations–marketing interface. When feasibility constraints bind, the recommendation becomes a constrained control problem in which value is created by coordinating demand with operational feasibility other than by improving unconstrained predictive accuracy alone. In an inventory-constrained environment, we observe that standard ranking metrics may change little, yet revenue and inventory productivity improve materially because the policy steers choices toward feasible complements and enables controlled substitution under stockouts. This highlights a theory-relevant distinction between prediction quality and decision quality in constrained systems, and motivates evaluating recommender policies with OM outcomes in addition to offline accuracy, such as revenue, margin, stockout/substitution, and gross margin return on inventory.

This paper makes the following contributions:
We develop a basket-aware latent-profile Bayesian demand model for set-valued retail data that separates stable household preferences from within-trip basket effects, capturing complementarities and substitutions without relying on unobserved item sequences.

We introduce an order-invariant, set-based context specification that varies across candidate items (and thus affects relative rankings), avoiding the cancellation issues that arise in naive context formulations.

We show how the fitted probabilistic structure can be reused to build an agentic recommendation policy.

We embed the policy in a governed decision layer where feasibility masking and business rules are applied before ranking, and behavior is regulated via auditable levers. In an inventory-coupled simulation, we show an OM-relevant distinction between prediction quality and decision quality: economic outcomes (e.g., revenue and inventory productivity) can improve materially through constraint-aware substitution and feasibility coupling even when prediction accuracy changes little.

The remainder of the paper is organized as follows. Section 2 reviews related literature on the operations–marketing interface in retail settings, basket recommendation, world models for decision-making, and agentic AI and governance. Section 3 introduces the basket-aware latent-profile consumer world model, including its order-invariant context representation and estimation procedure. Section 4 describes the agentic decision layer, including within-trip belief updating, basket-aware scoring, and governance levers such as feasibility masking and exploration. Section 5 describes the data and evaluation design, including offline next-item prediction and an operations-coupled environment that captures inventory constraints and substitution dynamics. Section 6 concludes with implications, limitations, and directions for future research.
2. Literature review

We position our research at the intersection of four converging streams of literature to propose a unified framework for retail decision-making. First, we situate our work within the operations–marketing interface, where demand shaping and fulfillment are jointly determined under inventory constraints. Second, we draw on basket recommendation and generative modeling for retail baskets, emphasizing the distinction between sequential and set-based representations of consumer behavior. Third, we connect to the emerging world-model perspective on decision systems, and position our Bayesian demand model as a consumer world model that supports belief-state decision-making. Finally, we relate our approach to recent work on agentic AI and governance, where autonomy requires feasibility-aware action spaces and auditable managerial controls.

2.1. The operations–marketing interface in retail

Retail decisions sit at the operations–marketing interface because demand shaping and fulfillment are jointly determined by operational feasibility. A long operations tradition shows that inventory availability and stockout risk directly change realized sales, substitution patterns, and the value of demand stimulation, so policies that ignore feasibility can destroy value even when they increase nominal demand (Akçay et al., 2020; Ghosh et al., 2022). In omnichannel settings, inventory information is itself a strategic operational lever. For example, research shows that sharing reliable availability information changes consumer behavior and improves performance precisely because feasibility shapes choice (Gallino and Moreno, 2014). These results motivate a central implication for recommendation: a recommender optimized as a pure demand generator is incomplete unless the recommendation policy is coupled with what the retailer can actually fulfill.

Marketing literature has studied recommender systems and decision aids for more than two decades, emphasizing that they affect not only prediction accuracy but also consumer search and choice (Donnelly et al., 2024; Fang et al., 2026; Wan et al., 2024). Early work shows that interactive decision aids can materially change how consumers evaluate alternatives and what they ultimately select (Häubl and Trifts, 2000). In parallel, marketing research developed formal recommendation systems grounded in preference modeling and early e-commerce settings (Ansari et al., 2000). Subsequent work developed recommendation methods using purchase histories and evaluated their performance in settings where the key signal is revealed through observed transactions (Bodapati, 2008). Related research also shows how transaction data can be used to study longer-term customer outcomes such as retention and churn (Ascarza, 2018). At the same time, practical frictions such as missing feedback can limit predictive performance (Ying et al., 2006). More recent work further shows that recommender effectiveness depends not only on the underlying algorithm but also on how recommendations are explained and presented to consumers, with important effects on trust, click-through, and search behavior (Chen et al., 2024a; Gai and Klesse, 2019). Beyond the individual level, recommendation networks can reshape demand in electronic markets by changing which products become visible and connected to each other (Oestreicher-Singer and Sundararajan, 2012), and recommender systems can influence aggregate sales patterns such as product diversity (Fleder and Hosanagar, 2009; Zheng et al., 2025). This literature provides an important marketing foundation for our setting, where recommendation policies must shape demand in ways that are also operationally feasible.

As marketing decisions are increasingly delegated to AI systems, value creation depends not only on demand shaping but also on execution under operational constraints, which makes the operations–marketing interface more central (Huang and Rust, 2021; Kopalle et al., 2022). Huang and Rust (2022) argue that deployable AI in marketing is often collaborative and must be designed to coordinate with organizational and operational constraints instead of optimizing a narrow predictive objective in isolation. This perspective motivates our focus on integrating feasibility into the policy definition (Demirezen and Kumar, 2016; Xiao and Xu, 2018). In the context of recommendation, this implies decision rules that restrict and shape the action space before ranking instead of repairing infeasible recommendations after the fact.

Recent OM and operations research work has begun to operationalize personalization as a constrained decision problem beyond a pure prediction task. In personalized assortment and revenue-management settings, the objective is not only to predict preferences but to select what to show or offer subject to business constraints (inventory, capacity, or feasibility), often with provable structure or scalable optimization methods (Chen et al., 2024b; Golrezaei et al., 2014; Kallus and Udell, 2020). Closest in spirit to our setting, this stream treats personalization as control of the decision set and its economic consequences, rather than as an offline ranking exercise. For example, Chen et al. (2024b) derive an inventory-balancing policy using fluid approximations, offering strong theoretical guarantees for online assortment under limited inventory. However, their approach relies on parametric choice models, that is, multinomial logit, which focuses on substitution and inventory rationing, abstracting away the complex, set-based complementarities found in grocery baskets. Our work complements this by focusing on the generative nature of the problem. Our agent uses a world model to simulate how inventory availability interacts with latent profiles and basket context to shape demand dynamically.

In contrast, much of the recommender-systems literature still evaluates models primarily by offline accuracy and then imposes feasibility constraints as postprocessing, like removing out-of-stock items after ranking. That design implicitly assumes that feasibility is peripheral to the policy. At the operations–marketing interface, feasibility constraints are not rare edge cases but an endogenous part of the environment that the policy should anticipate. To this end, we design our framework to follow the operations perspective. Feasibility is made first-class by defining the policy over a governed action space constructed before ranking, and by evaluating performance using operational outcomes, such as substitutions, stockouts, and inventory productivity, in addition to offline prediction metrics.

2.2. Basket recommendation: sequences versus sets

Modern recommenders commonly represent user activity as sequences and learn next-item prediction using sequential architectures, such as recurrent neural networks and transformer-based recommenders (Kang and McAuley, 2018; Smirnova and Vasile, 2017; Sun et al., 2019). This sequence view is well aligned with clickstreams and browsing logs, where temporal order is meaningful. However, grocery and mass-merchant transactions are typically observed as checkout baskets, which means that the data record the set of items purchased, while the scan order is operational and often unrelated to preference formation. Thus, treating baskets as sequences can inject noise by hallucinating temporal dependencies that are not behaviorally grounded.

Meanwhile, recent strategy perspectives on GenAI note that text-first or token-sequence paradigms are not universally appropriate and that representations should match the data-generating process and the decision context (Feuerriegel et al., 2024). The basket retail setting reinforces this point. Order-sensitive sequence models can introduce misleading structure, pointing instead toward order-invariant set models that capture coherence without imposing unobserved temporal dependencies. This motivates a second literature pivot, modeling baskets as sets.

This distinction between sequence and set representations is also reflected in the marketing literature on recommendation and online behavior. In browsing environments where consumers interact with products over time, clickstream-based models treat the path and timing of interactions as informative signals, making order- and path-dependent representations natural (Bucklin and Sismeiro, 2003; Moe and Fader, 2004; Montgomery et al., 2004; Sun, 2025). In contrast, a complementary stream builds models and decision rules from purchase histories and basket-shopping behavior, often in large assortments where the primary empirical object is the shopping trip or basket and the key dependence lies in cross-item structure rather than a behaviorally meaningful within-basket sequence (Ansari et al., 2000; Jacobs et al., 2016, 2021). This aligns directly with grocery retail, where checkout data record sets purchased, so imposing an item order can add noise rather than information. Accordingly, our modeling choice treats the basket as an unordered set while still capturing complementarity and substitution within that set.

In machine learning, permutation-invariant set representations, such as deep sets and set transformers, provide a principled way to embed unordered collections while preserving expressiveness (Lee et al., 2019; Zaheer et al., 2017). While recent information systems literature has advanced the use of hyperbolic embeddings to model hierarchical situational contexts (Bauman et al., 2025), grocery baskets require capturing the combinatorial structure of item co-occurrence. In retail recommendation, the operational requirement calls the model to capture complementarity and substitution within the set without relying on unobserved micro-sequences. Our consumer world model explicitly adopts an order-invariant context representation and uses it to explain basket coherence at the profile level, aligning the statistical object with the operational data-generating process.

Importantly, our objective is not only to fit baskets offline, but to reuse the same set-based structure online as the basket evolves. This reuse requirement is rarely emphasized in sequence-based work, where the deployed policy often differs from the training objective or relies on a separately tuned ranker. Our approach instead builds a probabilistic set-based demand model whose sufficient statistics and scoring primitives are designed to be directly consumed by an online policy.

2.3. World models for decision-making

A world model is typically defined as an internal predictive model of the environment that supports decision-making, for example, by enabling belief updates, forecasting outcomes of actions, or computing action values in partially observed settings (Sutton and Barto, 1998). In model-based reinforcement learning, learned world models are often implemented as latent-dynamics simulators, for example, the world models line, Dreamer, emphasizing representation learning and long-horizon rollout for control (Ha and Schmidhuber, 2018; Hafner et al., 2019).

Our setting differs in the environment being modeled. The relevant environment is the consumer choice process under basket context and operational constraints, where the key hidden state is shopper intent rather than physical dynamics. Therefore, we align with the definition of world models as predictive internal models for decision-making, but instantiate it as a probabilistic consumer world model. Formally, we propose a Bayesian generative model learned from historical baskets that (1) represents latent intent via interpretable profiles, (2) provides an order-invariant mechanism for how the observed basket shifts the plausibility of candidates, and (3) yields sufficient statistics that can be updated online.

This construction supports the two core functions expected of a world model in decision systems. First, it provides the agent’s internal state, beliefs over latent intent. Second, it enables the predictive mapping from state and context to outcomes (basket-aware choice likelihood), which is sufficient to evaluate and compare candidate actions under constraints in our setting. Compared with latent-dynamics world models, we prioritize interpretability, auditable reuse, and operational controllability over expressive black-box simulation. From a marketing decision-systems viewpoint, this also aligns with the emphasis on AI as an internal, decision-supporting representation of the environment, where interpretability and managerial actionability matter alongside predictive performance (Huang and Rust, 2021). After discussing the relevance of our model to world models, hereafter, we use “consumer world model” and “world model” interchangeably.

2.4. Agentic AI and governance

A second recent shift is from predictive models to agentic models, systems that maintain state, take actions, and pursue objectives under constraints. In the literature on large language models, agentic behavior is operationalized via tool use and iterative reasoning loops and simulated multi-agent environments (Park et al., 2023; Yao et al., 2022). In operations contexts, this shift creates an immediate governance problem that delegating decisions to agents raises questions of controllability, auditability, and objective misalignment, especially when constraints related to inventory, compliance, and promotions bind frequently.

A recurring theme in recent work is that the managerial challenge is no longer simply whether AI can predict, but whether it can be directed, audited, and aligned with business objectives when given autonomy (Berente et al., 2021; Huang and Rust, 2022). In retail settings, these governance questions are amplified by frequent feasibility constraints and cross-functional objectives, and collaborative AI designs explicitly recognize the need to keep operational and organizational inputs configurable and transparent (Rai, 2020). Our governed agentic layer operationalizes this view by enforcing feasibility before ranking and by exposing auditable levers that regulate how the policy trades off profile fit, basket coherence, and exploration. These design choices follow directly from governance concerns highlighted across marketing, operations, and AI safety.

Adjacent literature further strengthens why governance must be embedded in the policy rather than bolted on. In reinforcement learning and safety research, a related and common point is that high-performing policies can fail under distribution shift or when objectives are misspecified, which motivates explicit constraints and safety guardrails (Raisch and Krakowski, 2021). We operationalize governance in two ways that fit retail practice. Feasibility is first enforced by constructing a constrained action space before ranking, and then the agent exposes a small number of auditable levers that managers can tune without retraining the underlying world model. This connects agentic decision automation to the operations–marketing interface, where value arises not only from better preference estimation, but from disciplined coordination between demand shaping and operational feasibility.

3. Consumer world model: basket-aware latent profile demand model

In this section, we formally introduce the consumer world model. Here, a world model is a compact probabilistic representation of consumer intent and within-trip coherence that supports belief-state decision-making. It produces two objects to be processed by the governed decision layer: a household-level belief over latent shopping profiles and its online update rule, and a basket-aware predictive structure that scores candidate items from an order-invariant set representation of the current basket. Note again that the world model here is a Bayesian demand model learned from transaction sets rather than a language model.

Shoppers rarely add items in isolation. A weekly grocery run usually blends recurring patterns, and the items that end up being purchased together reflect those overlapping profiles. Therefore, we assume that the relevance of an item depends on the stable shopping profiles of households as well as the other items observed together in the basket. Further, empirically, retail data record baskets at checkout, not the latent sequence by which intentions are formed. The scan order at the register is operational, not behavioral. As a result, we treat each basket as an unordered set. Specifically, in our model, “context” simply means the set of items observed together, not a revealed sequence. This preserves the economic intuition of within-basket coherence while remaining faithful to what the data record. With these two assumptions, our approach handles two signals at the same time. The first is a household’s latent shopping profiles, that is, stable, interpretable patterns, such as fresh food, pantry stock-up, household care, etc., summarizing long-run tendencies. The second is a basket-aware context signal that captures how copresent items make some candidates more or less appropriate on this trip. For example, bread and cereal raise the plausibility of milk, glass cleaner raises the plausibility of wipes, and soda may substitute for juice. To keep this contextual signal operationally simple and scalable, we summarize baskets in a low-dimensional item-embedding space and let each latent profile respond differently to that summary.

In the following subsections, we first formalize this intuition with a generative model. In summary, consumers carry a mixture of latent profiles across trips, each realized item in a basket is claimed by one of those profiles, and the probability of including an item combines profile-specific base appeal with a compact, order-agnostic function of the items in the basket. Then, we describe how to estimate the model efficiently via a variational objective and how the learned structure feeds the agentic policy used online.

3.1. The model

We model a trip as the interaction of persistent household tendencies captured by a small number of latent shopping profiles and profile-specific coherence within the unordered set of items observed in a basket. The construction is explicitly order-invariant in that we use only which items coappear instead of the latent sequence by which they were added. We specify the model as follows:

For each household $i = 1, \dots, N$ , draw a $K$ -dimensional profile-mix vector

θ_{i} \sim {Dir}_{K} (η),

where

θ_{i k}

represents the long-run share with which household

i

exhibits profile

k

across trips.

On trip $t$ , the observed basket is a set $B_{i t} \subseteq {1, \dots, D}$ . For each realized item $d \in B_{i t}$ , define its set-based context as

C_{i t} (d) = B_{i t} ∖ {d} .

This aligns with the checkout process and is order-invariant.

First, assign a latent profile to the item according to the household’s mixture:

z_{i t d} \sim Mult (θ_{i}) .

Conditional on $z_{i t d} = k$ , the model then specifies the probability that item $d$ appears in the basket as a function of the leave-one-out context set $C_{i t} (d)$ .

While shoppers do add items one by one, empirically, the order is unobserved and potentially missing-not-at-random. Conditioning only on the set $C_{i t} (d)$ respects the data the retailer actually sees (copresence) and avoids assumptions about a hidden sequence. In the next subsection, we impose a low-rank structure on $β_{k, C}$ that preserves this invariance while scaling to realistic catalogs and yielding interpretable quantities for the online agent.

3.2. Structured parameterization of basket effects

A literal $β_{k, C}$ for every profile $k$ and every possible context set $C \subseteq {1, \dots, D}$ is neither identifiable nor implementable. Further, retailers could neither calibrate nor govern such an object. Therefore, we impose a compact, data-driven structure that captures item–item coherence, lets profiles respond differently to the same context, and scales to realistic assortments while remaining order-invariant. To capture within-basket coherence at scale, we represent each item $d$ by a low-dimensional embedding vector $E_{d} \in R^{r}$ learned from basket co-occurrence. Intuitively, $E$ provides a compact representation of item relatedness (complements/substitutes) without requiring an explicit $D \times D$ interaction matrix. We construct $E$ via a regularized low-rank factorization of an item co-occurrence matrix. Online E-Companion (EC) A.1 provides details on constructing $E$ .

For a focal item $d$ in basket $B_{i t}$ , define the leave-one-out context set

C_{i t} (d) = B_{i t} ∖ {d} .

We summarize any unordered context set

C

with the low-rank (order-invariant) embedding

s (C) = \sum_{j \in C} E_{j} \in R^{r} .

To evaluate a candidate item

d

against context

C

, we form a candidate-dependent transform

g_{d} (C) = s (C) ⊙ E_{d},

where

⊙

denotes elementwise (Hadamard) product. Intuitively,

s (C)

captures the basket composition, while

g_{d} (C)

injects the candidate-specific geometry of item

d

into the context representation. Therefore, the probability of observing product

d

under profile

k

is modeled by

\log p (d ∣ k, C_{i t} (d)) \propto u_{k d} + v_{k}^{⊤} g_{d} (C_{i t} (d)),

(1)

where

u_{k d}

captures the profile-item baseline and

v_{k} \in R^{r}

governs how profile

k

reacts to basket composition in the low-rank space. Because

g_{d} (\cdot)

depends on

E_{d}

, the context term varies across candidates even when the underlying context set is fixed, and thus affects relative rankings instead of canceling under normalization.

Our scoring structure also admits a practical cold-start handling mechanism that is compatible with the current model. For a new item $d$ with limited or no co-occurrence history, we can initialize its embedding $E_{d}$ using available taxonomy metadata (e.g., commodity/subcommodity) by setting $E_{d}$ to the average embedding of existing items in the same group, and initialize the profile-item baseline $u_{k d}$ by backing off to the corresponding group-level baseline. Then, the policy can score $d$ immediately via $g_{d} (C) = s (C) ⊙ E_{d}$ , while subsequent transactions update $E_{d}$ and $u_{k d}$ as co-occurrence evidence accumulates.

Exponentializing and normalizing over feasible candidates produces a valid multinomial. In estimation, $E$ is learned once from historical baskets, $(u_{k d}, v_{k})$ are then fitted in a single variational expectation-maximization (EM) procedure with the rest of the model. Further, we condition on observed basket sizes and compositions and infer latent profile assignments for item occurrences within each trip. That is, given a household’s realized basket $B_{i t}$ , we infer expected profile responsibilities for items in $B_{i t}$ without modeling the stopping decision or basket size. This yields a well-defined conditional likelihood for estimation while avoiding additional assumptions on trip termination. The result is a compact, operationally usable representation. Retailers can now tune profile weights or constrain candidate sets without retraining a dense interaction model.

3.3. Estimation summary and world-model outputs

Exact maximization of the marginal likelihood is infeasible because it integrates over household profile mixtures $θ_{i}$ and sums over all latent item-profile assignments $z_{i t d}$ . To this end, we estimate the basket-aware latent profile model using a standard mean-field variational EM procedure and optimize an evidence lower bound (ELBO). Because context enters only through the low-rank embedding, the updates remain scalable for realistic catalogs. We defer the full derivations of ELBO, factorization, and coordinate-ascent update equations to online EC A.2, and details on the EM steps to online EC A.3.

At a high level, each iteration in the estimation process consists of:

E-step. Update item-profile responsibilities and household-level profile posteriors.

M-step. Update profile-item baseline terms and profile-specific context response weights under the low-rank embedding.

The fitted model serves as the paper’s consumer world model for online decision-making. Here, the latent profile mixture plays the role of the hidden state, and the basket-aware emission in (1) specifies how observed context shifts the likelihood of candidate items. This matches the role of a world model in agentic systems, while remaining transparent and auditable because the structure is Bayesian rather than a black-box simulator.

The proposed world model yields two reusable objects that are consumed by the governed decision layer in Section 4: household-level profile posteriors (and their streaming sufficient statistics) that support within-trip belief updates, and basket-aware scoring primitives that quantify how an unordered basket context shifts candidate plausibility. Specifically, we pass forward:

Long-run household profile information: ${\hat{θ}}_{i}$ (and sufficient statistics such as $γ_{i}$ ), which initialize and regularize within-trip beliefs.

Basket-aware scoring primitives: $(u_{k d}, v_{k}, E)$ together with the order-invariant context transform, which define the candidate-dependent score contributions used online.

In Section 4, we reuse these same objects to implement real-time belief updating and feasibility-aware slate selection, rather than introducing a separately trained ranker.

4. Governed decision layer: agentic basket recommendation policy

The basket-aware latent profile model in Section 3 is intentionally offline. From historical baskets, it learns household-level profile mixtures $θ_{i}$ , profile-item affinities $u_{k d}$ , and profile-specific responses to set-based basket composition via the low-rank embedding $E$ and weights $v_{k}$ . In contrast, operational recommenders must act sequentially in the real world. They observe a changing basket, revise their view of the shopper, and form slates in real time under operational constraints.

To this end, we augment the Bayesian layer with an agentic online policy that can be viewed as a sequential decision problem with explicit governance. We index within-trip decision epochs by $s = 1, 2, \dots$ , where the agent has observed $s - 1$ items so far, and $B^{(s)}$ denotes the unordered set of these observed items. As summarized in Figure 1, at each step $s$ , the policy uses $B^{(s)}$ to update its view of the shopper, applies governance to construct a feasible set of recommendable items, scores and selects a top- $L$ slate by blending long-run preference fit with within-trip coherence, and transitions to step $s + 1$ after the realized choice updates the basket. The policy exposes a small set of auditable levers that regulate behavior under constraints without retraining the underlying world model.

Figure 1.

Decision-diagram view of the governed recommendation loop (time-unrolled).

4.1. Governed action space: feasibility before ranking

At each within-trip step $s$ , the agent observes the current unordered basket $B^{(s)}$ and must produce a recommendation slate. However, in operational retail, the agent is not free to rank the entire catalog. The set of valid recommendations is endogenously constrained by inventory availability, promotion eligibility, category and assortment policies, and margin or compliance guardrails. Therefore, we make governance first-class by defining the policy over a governed action space, a constrained action set constructed before any ranking occurs.

Let $D$ denote the full catalog. Governance induces a feasible candidate set by applying operational masking and retailer rules before ranking:

C^{(s)} = Mask (D ∖ B^{(s)}, {I_{d}^{(s)}}_{d \in D}, {P_{d}^{(s)}}_{d \in D}, R),

(2)

where

{I_{d}^{(s)}}

are on-hand inventories,

{P_{d}^{(s)}}

are promotion/eligibility flags, and

R

denotes retailer rules such as restricted-item policies, compliance exclusions, category balance, and diversity constraints. The mask can include hard feasibility constraints such as in-stock requirements (

I_{d}^{(s)} > 0

), promotion eligibility, or compliance exclusions, as well as structured business rules such as a maximum number of items per subcategory or brand diversity requirements. By design, the agent will never consider any

d \notin C^{(s)}

at step

s

. Note that while we present governance primarily as hard masking in (2), the same framework supports soft feasibility via bounded, transparent score adjustments within

C^{(s)}

(see online EC C.1). In Section 5.4, we further show how these adjustments can be used in the inventory-coupled simulation.

Defining feasibility as a preranking action-space constraint offers two advantages. First, it prevents failure modes common to unconstrained recommenders, such as proposing unavailable items or violating policy constraints and then attempting to repair the slate post hoc. Second, it makes governance auditable, so changes in business rules map directly into changes in $C^{(s)}$ , separating what is allowed to be recommended from how the agent prioritizes allowed items. In the remainder of this section, the agent scores and ranks only those items in $C^{(s)}$ using the world-model primitives learned offline, and then selects a slate from this governed action space.

In summary, governance enters the policy through the definition of $C^{(s)}$ and optional bounded feasibility penalties. Subsequent belief updates and scoring rules take $C^{(s)}$ as given and operate only over feasible actions. This way, we directly embed feasibility and controllability into the agent’s decision process rather than appending them as postprocessing.

4.2. Belief state via world-model reuse

In this section, we introduce what a belief state is and how we reuse the previously fitted world model to update it. We defer the technical details to online EC B.1.

Belief state. The offline consumer world model yields a household-level profile mixture ${\hat{θ}}_{i}$ that summarizes long-run shopping tendencies across trips. During the online recommendation phase, as the basket evolves, the agent maintains a within-trip belief state $b_{i}^{(s)} \in Δ^{K - 1}$ over the $K$ latent profiles and updates this belief state over time. Intuitively, $b_{i}^{(s)}$ represents the agent’s current posterior view of “which shopping intent is active right now,” given the partial basket. At each step $s$ , the policy depends on the unordered set $B^{(s)}$ and is thus order-invariant: permutations of the same observed items yield the same recommendation (up to tie-breaking or bounded exploration). This does not imply that recommendations are identical across different intermediate baskets. The policy is state-dependent in the usual control sense because different sets $B^{(s)}$ correspond to different contexts. Whether different recommendation sequences can lead to different final baskets depends on how consumers respond to recommendations (i.e., the environment dynamics), which we do not attempt to identify causally in this paper.

Update via reuse of offline sufficient statistics. A key design choice is reuse. The online belief update uses the same variational inference logic (details are given in online EC A.2) used to fit the world model. Concretely, $b_{i}^{(s)}$ is initialized at the start of a trip from the household’s long-run mixture ${\hat{θ}}_{i}$ , and is then updated by accumulating evidence from newly observed items in $B^{(s)}$ using the same learned primitives that define the world model. As items arrive, the belief typically concentrates quickly toward the profiles most consistent with the emerging basket, which is exactly the behavior needed for sequential recommendation: early observations disambiguate intent, and later recommendations become increasingly targeted.

Stability and two-timescale interpretation. In practice, early baskets can be noisy. For example, the first observed item may be generic. To address this challenge, we apply light smoothing to stabilize updates at small $s$ and prevent overreaction to atypical, noisy items. This yields a clean two-timescale decomposition:

${\hat{θ}}_{i}$ captures cross-trip preferences (slow-moving, household-level, e.g., who this household tends to be)

$b_{i}^{(s)}$ captures within-trip intent (fast-moving, trip-level, e.g., what appears to be active given the current unordered basket)

After the trip ends, the household’s long-run parameters can be refreshed using the expected profile counts implied by the trip, providing a lightweight streaming update without revisiting historical baskets.

Connection to governed scoring. The belief state is the sole “memory” the agent carries within a trip. Specifically, subsequent scoring and governed slate selection in Section 4.3 depend on $B^{(s)}$ , $b_{i}^{(s)}$ , and the governed candidate set $C^{(s)}$ . This separation is operationally useful: governance determines what can be recommended, while $b_{i}^{(s)}$ determines how the agent prioritizes feasible candidates by translating the fitted world model into real-time intent tracking.

4.3. Governed scoring and slate selection

At each within-trip decision epoch $s$ , the agent observes the current unordered basket $B^{(s)}$ , its belief state $b_{i}^{(s)}$ , and the operational state relevant for governance. Again, the key design choice is reuse. The online policy does not introduce a separately trained ranker or a new heuristic scoring rule. Instead, it reuses the same probabilistic primitives $(u_{k d}, v_{k}, E)$ and the set function $g_{d} (\cdot)$ learned offline by the world model, and combines them with an explicitly governed action space and a small set of auditable levers.

Candidate scoring. For any candidate item $d \notin B^{(s)}$ , the agent forms two components using the world-model objects:

{prof}_{d} (i, s) = \sum_{k = 1}^{K} b_{i k}^{(s)} u_{k d}, {ctx}_{d} (i, s) = \sum_{k = 1}^{K} b_{i k}^{(s)} v_{k}^{⊤} g_{d} (B^{(s)}),

where

g_{d} (\cdot)

is defined in Section 3.2. The first term profile fit captures long-run preference fit under the current posterior over profiles. Items aligned with the currently believed profile mix receive higher base scores. The second term basket (context) fit captures within-trip coherence using an order-invariant, set-based summary of the basket. Context depends only on which items coappear in the current basket. We summarize the basket with the low-rank embedding and score how profile

k

reacts to that set. We then form the composite world-model score

s_{d}^{(s)} = (1 - λ) {prof}_{d} (i, s) + λ {ctx}_{d} (i, s), λ \in [0, 1],

(3)

where

λ

is an interpretable governance lever that trades off stable preferences and within-basket coherence. In our simulations,

λ

matches the parameter used to report sensitivity. Note that equation (3) is the online counterpart of the logit in equation (1), with

ϕ_{i}

replaced by the within-trip posterior

b_{i}^{(s)}

and with feasibility masking applied before normalization.

Governed slate selection. Given the set of feasible candidates $C^{(s)}$ introduced in Section 4.1, the agent ranks only feasible candidates using the world-model score in (3). The slate of size $K$ is selected from $C^{(s)}$ using a bounded exploration mechanism: with probability $1 - ε$ , return the top- $L$ items by score, and otherwise with probability $ε$ , perturb a small number of tail positions with lower-ranked feasible items. The exploration rate $ε \in [0, 1]$ is a second auditable lever that regulates how aggressively the policy explores under uncertainty without sacrificing feasibility or control. Importantly, both $λ$ and $ε$ act on top of reused world-model primitives. Adjusting these levers changes policy behavior online without retraining the demand model.

In summary, the policy at step $s$ proceeds as follows: (1) form the governed candidate set $C^{(s)}$ via masking, (2) score feasible items using reused world-model primitives and the belief state, (3) apply optional bounded rule-based adjustments within $C^{(s)}$ when operations coupling is enabled (to appear later), and (4) select a slate using auditable levers $(λ, ε)$ . This organization makes governance first-class, keeps the online agent tightly coupled to the learned world model, and provides a practical path to constraint-aware decision automation in retail environments.

4.4. Relation to nonagentic and classical baselines

Our agent nests several common recommenders that we evaluate under the same holdout protocol, candidate catalog, and operational masking.

Static latent-profile recommender (static). Freeze beliefs and ignore basket context by setting $b_{i}^{(s)} = {\hat{θ}}_{i}$ for all $s$ and $λ = 0$ :

s_{d}^{Static} (i) = \sum_{k = 1}^{K} {\hat{θ}}_{i k} u_{k d} .

This uses the same learned profile-item terms

u_{k d}

as the agent but removes within-trip adaptation and set-based context.

Context-only (item–item) methods. Drop profiles and score purely from co-occurrence in the current basket $B^{(s)}$ :

PPMI/Lift: sum association scores between candidates and items in $B^{(s)}$ ; no household heterogeneity.

EASEr: closed-form linear item–item model; score $d$ by $(W x^{(s)})_{d}$ where $x^{(s)}$ is the basket indicator and $W$ is the learned symmetric weight matrix.

Item2Vec: cosine similarity between the candidate embedding and the basket embedding.

These methods use the same train split and candidate space but maintain no belief state.

Nonagentic contextual variant. Include set-based context while freezing beliefs and disabling exploration ( $ε = 0$ ). This preserves the context term but is not agentic because $b_{i}^{(s)}$ does not update within a trip.

Our full agent combines online belief updates ( $b_{i}^{(s)}$ ), the structured, order-invariant context score in (3), and calibrated exploration via $ε$ . Improvements over static isolate the value of within-trip adaptation and context, and improvements over PPMI/EASEr/Item2Vec reflect the added value of household-level heterogeneity and Bayesian belief tracking, holding the catalog, masking rules, and evaluation protocol fixed.

4.5. Policy controls in the agentic layer

The agentic policy exposes a small set of interpretable controls that map directly to how the decision rule behaves online. These are the same objects we use in belief updating, scoring, and slate selection. They are also the auditable levers referenced in our governance definition in Section 1. A key advantage of the governed decision layer is that managers can adjust behavior online without retraining the consumer world model.

Managerial interpretation of governance levers. Table 1 summarizes the main governance levers, what each lever changes in the decision rule, and practical guidance on when to increase or decrease it. Two implications are worth noting. First, the profile-context weight $λ$ primarily controls whether recommendations are more like long-run, habit-stable personalization (low $λ$ ) or trip/coherence completion within the current basket (high $λ$ ). Second, the exploration rate $ε$ is a bounded mechanism for managing uncertainty and drift. Increasing $ε$ can accelerate learning during regime shifts such as campaigns or new-item launches, while decreasing it is preferred in tightly constrained categories where stability and compliance dominate.

Table 1.
Governance levers and managerial interpretation.

Lever What it controls When to increase/decrease

$λ$ Preference vs. basket coherence in scoring Increase for trip-driven baskets (e.g., “fresh meal” baskets where complements matter) Decrease for broad stock-up trips (e.g., “household restock” trips where long-run brand/size preferences matter more than cross-item pairing)

$ε$ Bounded exploration in slate selection Increase under drift/new items (e.g., fast-changing demand during holidays or when assortment rotates) Decrease in compliance- or capacity-sensitive categories (e.g., regulated categories or tight inventory caps where mistakes are costly)

Masking, penalties Feasible action set and scarcity handling Tighten under stockouts/compliance Relax when availability is stable

Lever	What it controls	When to increase/decrease
$λ$	Preference vs. basket coherence in scoring	Increase for trip-driven baskets (e.g., “fresh meal” baskets where complements matter) Decrease for broad stock-up trips (e.g., “household restock” trips where long-run brand/size preferences matter more than cross-item pairing)
$ε$	Bounded exploration in slate selection	Increase under drift/new items (e.g., fast-changing demand during holidays or when assortment rotates) Decrease in compliance- or capacity-sensitive categories (e.g., regulated categories or tight inventory caps where mistakes are costly)
Masking, penalties	Feasible action set and scarcity handling	Tighten under stockouts/compliance Relax when availability is stable

Monitoring and guardrails. Because the online agent reuses the probabilistic structure learned in Section 3, we can monitor both model-facing diagnostics (e.g., a rolling ELBO-style fit or posterior concentration) and operational key porframce indicators (e.g., fill rate, stockouts, substitution frequency, margin) to detect drift or environmental changes. When triggers fire, governance is exercised by adjusting $(λ, ε)$ , tightening or relaxing feasible-catalog rules, or rolling back to a previous configuration, without re-estimating the underlying world model. In Section 5.4, we further study optional, bounded inventory- and price-aware adjustments applied within the governed action space.

5. Data and empirical results

In this section, we provide an empirical evaluation of our consumer world model and the governed agentic policy built on top of it. Because our framework is designed for decision-making under feasibility constraints, we report both offline next-item accuracy on held-out baskets, which tests whether the learned latent profiles and set-based context capture consumer coherence, as well as operational outcomes in a controlled simulation, which tests whether the same scoring primitives translate into improved retail performance under stock constraints.

5.1. Data and evaluation metric

Data and preprocessing. We evaluate the model on the Dunnhumby Complete Journey dataset, a standard benchmark in grocery retail. The data span 2 years of transactions for roughly 2,500 U.S. households at a single large retailer. Each record contains a household ID, basket identifier, timestamp, item identifier, paid price, retailer discount, and a hierarchical taxonomy including department, commodity, and subcommodity. For managerial interpretability, we aggregate stock keeping units to the commodity level. This reveals category patterns relevant to assortment and promotion (e.g., infant formula, frozen seafood, seasonal décor) and reduces sparsity. After aggregation, we retain roughly 300 commodities that are consistently active. To focus on a stable demand structure, we restrict attention to commodities that are consistently active across the sample period. Thus, cold-start behavior for newly introduced products is limited in this dataset and is left for future work. For each household $i$ , we sort baskets chronologically and apply an 80/20 split at the basket level. We use the first 80% of trips as training data and the remaining 20% as holdout. Item embeddings are constructed using training baskets only, and holdout trips are used for evaluation only.

Evaluation metric. We evaluate next-item recommendation with a basket-reveal protocol as follows.

For each holdout basket, expand into item-level “events.” Because baskets are unordered, we use a randomized permutation per basket; conditionals are set-based, so results are invariant in expectation.

At step $s$ , the policy observes the partial basket $B^{(s)}$ and produces a top- $L$ slate. Throughout, the slate size $L$ is treated as an exogenous user interface (UI)/channel parameter (we use $L = 10$ unless otherwise noted) rather than an endogenous decision that varies by household or profile.

Record a hit if the next held-out item appears in the slate.

For household $i$ , we define the hit rate as

{Hit @ 10}_{i} = \frac{1}{T_{i}} \sum_{s = 1}^{T_{i}} 1 {true next item \in {slate}_{10}^{(s)}},

where

T_{i}

is the number of prediction steps in that user’s holdout baskets. We report the average over all households with nonempty holdout.

The protocol is applied identically to our proposed solution method as well as the benchmark methods. We next summarize these benchmark methods used in the evaluation.

Benchmark methods. We benchmark the performance of our proposed recommender against four baselines. Specifically, we consider the following five methods:

Agentic latent-profile recommender (Agent proposed): the Agent is the full framework proposed in Sections 3 and 4. We fit a basket-aware latent-profile model with $K = 3$ profiles and a low-rank item embedding $E$ from co-occurrence. Note that $K = 3$ is selected based on held-out performance and interpretability, and sensitivity to nearby choices of $K$ is reported below. Online, the agent maintains a belief state $b_{i}^{(s)}$ , updates it as items are observed, and scores candidates via equation (3) with a tunable profile-context trade-off $λ$ .

Static latent-profile recommender (Static): the Static is based only on the model proposed in Section 3 and it is a nonagentic variant. Product recommendations are ranked by the household’s offline mix ${\hat{θ}}_{i}$ and base item terms $u_{k d}$ , no within-trip update and no basket context. Isolates the incremental value of online belief updates and basket awareness.

PPMI/Lift (item–item): classical co-occurrence baseline built from the train-only baskets and items incidence matrix. We score each candidate by the sum of pairwise association weights with items in the current basket. Captures set copresence, ignores household heterogeneity.

EASEr (closed-form item–item): modern item–item method that learns a symmetric weight matrix with $ℓ_{2}$ regularization in closed form; scores are linear in the current basket indicator. This is usually stronger than PPMI on sparse catalogs.

Item2Vec (embedding-based item–item): it embeds the current basket by summing its item vectors and scoring candidates by cosine similarity to this basket vector. This captures high-order co-occurrence beyond pairwise counts.

Choice of the number of latent profiles. We select the number of latent profiles by comparing $K$ on held-out next-item prediction. The agent’s Hit@10 is stable across $K \in {3, 4, 5}$ , yielding 0.3408, 0.3405, and 0.3373, respectively. Because performance differences are negligible while smaller $K$ keeps the latent structure compact and the profile summaries easier to present; we use $K = 3$ in the main results.

5.2. Results

The Agent score in equation (3) trades off long-run preference fit and within-basket coherence through $λ$ . We calibrate $λ$ by sweeping $λ \in {0, 0.05, 0.10, 0.20, 0.30}$ and evaluating Hit@10. A higher Hit@10 indicates more frequent placement of the realized next item in the top-10 slate.

Figure 2 shows that performance improves as $λ$ increases into the $0.2$ – $0.3$ range, with the best value at $λ = 0.30$ , which we use as the default in all results below unless otherwise noted.

Figure 2.

Hit@10 under different basket-context weight $λ$ .

With $λ = 0.30$ , Table 2 reports representative results from the five frameworks described in Section 5.1. The results show that the Agent materially improves accuracy. Comparing with using the same latent structure in a nonagentic way, the Agent achieves a 0.1306 increase in Hit@10 over Static. This isolates the value of within-trip belief updating and basket-aware scoring, that is, the agent considers who the shopper tends to be and what is currently in the cart.

Table 2.

Next-item recommendation performance (Hit@10) on the same holdout users and baskets.

Method	Hit@10	Change relative to Agent
Agent (proposed)	0.3408	–
Static	0.2102	$- 0.1306$
EASEr	0.1958	$- 0.1450$
Item2Vec	0.1740	$- 0.1668$
PPMI	0.0064	$- 0.3344$

The performance gap is driven by reusing the world model for online control. The Static variant ranks items using only the household’s offline profile mix ${\hat{θ}}_{i}$ and baseline item terms $u_{k d}$ , and thus cannot adapt to within-trip evidence. By contrast, the Agent explicitly reuses the same sufficient statistics that underlie offline inference to update its within-trip belief state $b_{i}^{(s)}$ , and the same basket-aware scoring primitives $(u_{k d}, v_{k}, E)$ and the set function $g_{d} (\cdot)$ to compute context-sensitive scores. This reuse shows up most clearly at early reveal steps. Once even a single item is observed, $b_{i}^{(s)}$ concentrates and the context term becomes informative, producing the large step-level lift reported in Figure 3. In other words, the gain is not from a different training objective, but from using the same learned probabilistic structure as a sequential policy.

Figure 3.

Step 2 Hit@10 comparison (Agent vs. Static).

Item–item methods are competitive but dominated. EASEr is the strongest classical baseline here, yet the Agent outperforms it by 0.145. This suggests that combining stable household-level profiles with a structured, profile-specific response to basket context adds predictive power beyond pure item–item similarity. Item2Vec lands in between the baseline methods. However, coarse co-occurrence alone is insufficient. PPMI/Lift performs poorly under a realistic next-item protocol and full candidate set, highlighting the brittleness of naive “people who bought $X$ also bought $Y$ ” rules. The latent-profile layer and agentic updating shape raw co-occurrence into usable decision logic.

From an operations standpoint, the uplift is achieved without deep sequence models. The core is a transparent Bayesian latent-profile layer plus a low-rank item space, which produces interpretable levers such as profiles, basket weights, $λ$ , and exploration rate. The same structure can be constrained by inventory, promotion, and category rules, which we exploit in a downstream simulation to study revenue, stockouts, and controllability under realistic constraints.

5.3. Ablations and robustness

We report diagnostics under the same order-invariant basket-reveal protocol as in Section 5.1. We focus on the first nontrivial context case (one observed item, $| B^{(s)} | = 1$ ), where within-trip belief updating and basket coherence become active.

Figure 3 reports the resulting lift of the Agent over Static once context is available. After observing one item, the Agent’s Hit@10 increases from 0.258 (Static) to 0.619 (Agent). Because both methods share the same offline profiles and candidate space, this improvement reflects the Agent’s within-trip belief update and basket-aware context term rather than differences in training data or catalog. In line with the reuse design in Section 4, the lift indicates that the Agent converts the fitted world model into real-time belief updates and context-sensitive scoring.

5.4. Simulation with operational constraints

Offline Hit@10 is computed on held-out historical baskets under a fixed protocol and a largely unconstrained catalog, so it measures how well a model predicts what shoppers actually bought. However, in practice, realized purchases are shaped by uncertainty, which means that feasibility masking, stockouts, and substitution restrict what can be shown and what can be fulfilled. As a result, a simulation Hit@10, computed on these endogenous, constraint-shaped outcomes, can differ from offline accuracy and may compress policy differences even when revenue, margin, and inventory productivity change materially.

To this end, we complement the offline next-item evaluation with a lightweight simulator that couples recommendation to inventory and basic retail operations. We design this simulator to mirror the governed decision layer in Section 4. Specifically, feasibility is enforced before ranking by constructing a governed action space, the policy scores candidates using the reused world-model primitives, optional operational feasibility (soft constraints) are applied within the governed set, and a slate is selected using auditable levers.

5.4.1. Setups

Time and trip generation

Time evolves in discrete periods $τ = 1, \dots, T$ . In each period, households stochastically generate trips calibrated to their training frequencies. When a trip occurs, items are revealed sequentially within the trip using the same order-invariant staging protocol as in the offline evaluation, that is, at within-trip step $s$ , the policy observes the unordered set of the first $s - 1$ items, denoted $B^{(s)}$ , and produces a top- $L$ recommendation slate.

Operational state

The environment maintains per-item on-hand inventory $I_{τ d}$ , promotion/eligibility flags $P_{τ d}$ , and optional group constraints $G$ (e.g., at most one item per subcategory). Inventory decreases upon fulfillment and is replenished according to a simple deterministic restocking schedule.

Governed action space

At step $s$ in period $τ$ , the policy ranks only a governed candidate set obtained by masking the catalog using operational state and retailer rules according to equation (2).

Policies and world-model reuse

We evaluate the Agent, the nonagentic Static baseline, and (in ablations) classic linear baselines (EASEr, PPMI, Item2Vec). Consistent with Section 4, the Agent scores candidates using the same primitives learned offline and reused online: belief updates produce $b_{i}^{(s)}$ , and candidate scores are computed from $(u_{k d}, v_{k}, E)$ through equation (3).

Soft constraints within the governed set

To couple ranking to operations without changing the underlying demand model, we optionally apply transparent, bounded score adjustments within the governed candidate set $C_{τ}^{(s)}$ . In our implementation, we include an inventory-scarcity penalty that downweights low-stock items while keeping feasibility enforced by masking:

{\tilde{s}}_{d}^{(τ, s)} = s_{d}^{(τ, s)} - κ (1 - (\frac{I_{τ d}}{max_{j \in C_{τ}^{(s)}} I_{τ j}})^{γ}), d \in C_{τ}^{(s)},

with

κ > 0

and

γ > 0

. We also allow a tiny price or margin nudge only when stock is healthy (e.g.,

I_{τ d}

above a threshold), with the nudge magnitude capped so it cannot dominate preference signals. All such hooks act only on scores and do not alter belief updates or the fitted world model.

Slate selection with auditable controls

Given scores ${\tilde{s}}_{d}^{(τ, s)}$ over $d \in C_{τ}^{(s)}$ , the slate of size $L$ is formed as

TopL ({\tilde{s}}_{d}^{(τ, s)}; d \in C_{τ}^{(s)}) with ε -greedy exploration,

where

ε

is the bounded exploration control and

λ

enters the score via equation (3). This makes the simulator consistent with the governed decision layer in that feasibility and rules determine what can be recommended, while

(λ, ε)

regulate how the policy behaves over feasible options.

Consumer choice and fulfillment

Conditional on a trip, the consumer forms a purchase desire over the shown slate using a logit utility

u_{τ d} = α s_{d}^{(τ, s)} + β_{prom} 1 {P_{τ d}} - β_{price} \log (1 + {price}_{d}),

and attempts to purchase the argmax draw. Because infeasible items are masked, the attempted item is always in stock at the moment of recommendation. To capture operational pressure, we define a desired item as the consumer’s top choice under the unmasked preference ranking (i.e., before feasibility filtering). When the desired item is unavailable (out of stock), the consumer substitutes the best available in-stock alternative from the recommended slate. Otherwise, the desired item is purchased. Successful fulfillment decreases inventory and books revenue and margin.

Configuration and metrics

We initialize on-hand levels proportional to item popularity, restock deterministically every fixed number of sessions, and apply mild nudges only above stock thresholds. We track a simulation analogue of next-item accuracy, $Hit @ 10$ , defined as whether the simulated purchased item at step $s$ appears in the recommended top- $10$ slate, along with revenue and margin, purchases and average ticket, average desired price, substitution rate, stockout pressure, and fill rate. Specifically, we define stockout pressure as the fraction of purchase occasions where the consumer’s unconstrained top-choice item (before feasibility filtering) is unavailable, and we define substitution rate as the fraction of occasions where the ultimately fulfilled purchase differs from that top choice. Because substitution is enabled and the policy masks infeasible items from the recommendation slate, the fill rate is 100% by design. Therefore, we use stockout pressure to capture operational scarcity rather than lost sales.

5.4.2. Results

We run the simulator for $10$ random seeds with identical operational settings and compare policies using paired seed-by-seed differences (Agent $-$ Static). Table 3 summarizes the results including mean $Δ \pm$ pooled SD, along with $p$ -values. Statistically significant paired differentces are marked bold.

Table 3.
Simulation results: paired differences (Agent $-$ Static) across 10 seeds.

Outcome Mean $Δ$ Pooled SD $p$ -Value

Hit@10 $+ 0.002$ $0.003$ $0.182$

Revenue $+ 5544$ $333$ $< 10^{- 11}$

Purchases $+ 3365$ $68$ $< 10^{- 15}$

Avg. ticket $- 0.76$ $0.05$ $< 10^{- 11}$

Avg. desired price $- 0.82$ $0.06$ $< 10^{- 11}$

Substitution rate $+ 3.5 %$ $0.2 %$ $< 10^{- 12}$

Stockout pressure $+ 3.5 %$ $0.2 %$ $< 10^{- 12}$

Outcome	Mean $Δ$	Pooled SD	$p$ -Value
Hit@10	$+ 0.002$	$0.003$	$0.182$
Revenue	$+ 5544$	$333$	$< 10^{- 11}$
Purchases	$+ 3365$	$68$	$< 10^{- 15}$
Avg. ticket	$- 0.76$	$0.05$	$< 10^{- 11}$
Avg. desired price	$- 0.82$	$0.06$	$< 10^{- 11}$
Substitution rate	$+ 3.5 %$	$0.2 %$	$< 10^{- 12}$
Stockout pressure	$+ 3.5 %$	$0.2 %$	$< 10^{- 12}$

Since substitution is always enabled and we mask infeasible items, fill rate is 100% for both. This suggests that the Agent delivers significantly higher revenue primarily by driving more purchases of slightly cheaper baskets. Meanwhile, the simulation $Hit @ 10$ changes little because feasibility masking and substitution reshape the realized choice process and compress differences in predicting the purchased next item. The higher substitution rate and stockout pressure reflect that the Agent more often recommends high-demand items and, when these bind, routes to feasible substitutes rather than suppressing demand.

The key operations insight is that, under binding feasibility, maximizing an offline ranking metric like Hit@10 is not sufficient for economic outcomes. Relative to Static, the Agent increases revenue primarily by driving more purchases with slightly cheaper baskets while managing feasibility through substitution within the governed action space, leaving Hit@10 essentially unchanged. The higher substitution and stockout-pressure rates indicate that the Agent more often surfaces high-demand items, and when these bind, value is preserved through controlled substitution rather than suppressing demand.

Overall, the simulation results demonstrate how a governance-first policy organization, combined with world-model reuse and lightweight bounded operations hooks, can shift economic outcomes in inventory-constrained environments without extensive retraining.

5.5. External validity and robustness

In this subsection, we test whether the economic lift of the agentic policy is confined to particular parts of the catalog, sensitive to reasonable changes in replenishment, or an artifact of price and volume mix. Across the following checks, we find that the qualitative pattern is stable. Relative to Static, the Agent increases revenue and margin with moderate, bounded increases in stockout pressure and substitution. In addition, $Δ$ Hit@10 remains small, further indicating that operational coupling drives the gains, instead of a change in offline ranking accuracy.

Robustness to catalog scope. We first test whether the simulation lift is driven by specific parts of the catalog by restricting inventory and replenishment to one super-department at a time (CENTER_STORE_FOOD, FRESH, NONFOOD). We partition the catalog into these three super-departments and rerun the simulator with inventory and restocks restricted to one super-department at a time. This ensures that within-group feasibility drives activity and prevents cross-group spillovers. Table 4 shows positive revenue and margin deltas in all three, with modest increases in stockout and substitution rates. We also observe that $Δ$ Hit@10 is near zero or slightly negative, consistent with the main result that revenue lift is not explained by a ranking-accuracy jump. (Note that in this configuration, substitution is enabled and lost sales are not allowed, so every stockout pressure event results in a substitution. Hence $Δ SubstRate$ can equal $Δ SO_Pressure$ by construction.)

Robustness to basket size. Next, we examine whether the economic lift is concentrated in a particular size of basket. Using the transaction data, we first compute each household’s average number of items per basket and partition households into different tiers: Small (T1), Medium (T2), and Large (T3). Then, we rerun the simulation within each segment using the same operational settings as in the main specification. Table 5 shows that the Agent improves both revenue and margin in all three trip-size groups, indicating that the economic lift is not confined to a single basket-size regime. The gains are largest for medium-size trips, but remain positive for both smaller and larger trips. Meanwhile, $Δ$ Hit@10 is negative in all three segments, consistent with our broader finding that the economic lift is not driven by a simple improvement in offline ranking accuracy. In this segmented configuration, differences in stockout pressure and substitution are negligible, so the main heterogeneity appears in the magnitude of economic lift other than in operational feasibility.

Table 4.
Agent and Static policy comparison.

Center_Store_Food Fresh Nonfood

No. of items 95 82 124

Purchases fulfilled (Agent) 7,123.6 5,529.9 6,031.1

Purchases fulfilled (Static) 5,092.3 4,763.3 4,945.3

$Δ$ Hit@10 $- 0.04$ $\pm$ $0.00$ $- 0.01$ $\pm$ $0.00$ $- 0.01$ $\pm$ $0.00$

$Δ$ Revenue $4, 552.99$ $\pm$ $111.56$ $2, 518.85$ $\pm$ $299.56$ $4, 483.16$ $\pm$ $512.48$

$Δ$ Margin $1, 365.90$ $\pm$ $33.47$ $755.66$ $\pm$ $89.87$ $1, 344.95$ $\pm$ $153.74$

$Δ$ SO_Pressure $7.20 %$ $\pm$ $0.73 %$ $18.71 %$ $\pm$ $1.31 %$ $8.56 %$ $\pm$ $0.56 %$

$Δ$ SubstRate $2.29 %$ $\pm$ $0.52 %$ $6.64 %$ $\pm$ $0.89 %$ $8.56 %$ $\pm$ $0.56 %$

	Center_Store_Food	Fresh	Nonfood
No. of items	95	82	124
Purchases fulfilled (Agent)	7,123.6	5,529.9	6,031.1
Purchases fulfilled (Static)	5,092.3	4,763.3	4,945.3
$Δ$ Hit@10	$- 0.04$ $\pm$ $0.00$	$- 0.01$ $\pm$ $0.00$	$- 0.01$ $\pm$ $0.00$
$Δ$ Revenue	$4, 552.99$ $\pm$ $111.56$	$2, 518.85$ $\pm$ $299.56$	$4, 483.16$ $\pm$ $512.48$
$Δ$ Margin	$1, 365.90$ $\pm$ $33.47$	$755.66$ $\pm$ $89.87$	$1, 344.95$ $\pm$ $153.74$
$Δ$ SO_Pressure	$7.20 %$ $\pm$ $0.73 %$	$18.71 %$ $\pm$ $1.31 %$	$8.56 %$ $\pm$ $0.56 %$
$Δ$ SubstRate	$2.29 %$ $\pm$ $0.52 %$	$6.64 %$ $\pm$ $0.89 %$	$8.56 %$ $\pm$ $0.56 %$

Table 5.

Basket-size robustness across seeds (mean $\pm$ SD).

Trip-size segment	$Δ$ Hit@10	$Δ$ Revenue	$Δ$ Margin
Small (T1)	$- 0.031$ $\pm$ 0.010	1,950.00 $\pm$ 85.63	585.00 $\pm$ 25.69
Medium (T2)	$- 0.035$ $\pm$ 0.007	2,920.50 $\pm$ 111.12	876.15 $\pm$ 33.34
Large (T3)	$- 0.035$ $\pm$ 0.011	1,623.00 $\pm$ 78.43	486.90 $\pm$ 23.53

Robustness to slate size. Because retail interfaces and channels differ in how many items can be displayed in practice, we rerun the simulation with slate sizes $L \in {5, 10, 15}$ to test whether the Agent’s advantage depends on a particular slate size. Table 6 shows that the Agent continues to outperform the Static benchmark across all tested slate sizes. The revenue and margin lift remain positive throughout, while $Δ$ Hit@10 changes only modestly. Moreover, stockout pressure remains moderate and declines slightly as the slate expands, suggesting that broader slates provide more flexibility to surface viable alternatives without overturning the qualitative advantage of the Agent. These results indicate that the economic gains are not an artifact of a single fixed slate size and remain stable across plausible UI and channel configurations.

Table 6.

Slate-size robustness across seeds (mean $\pm$ SD).

Slate size $L$	$Δ$ Hit@10	$Δ$ Revenue	$Δ$ SO_Pressure
5	$- 0.013$ $\pm$ 0.003	19,011.00 $\pm$ 339.97	3.65% $\pm$ 0.25%
10	$- 0.013$ $\pm$ 0.004	19,693.00 $\pm$ 432.87	2.85% $\pm$ 0.20%
15	$- 0.015$ $\pm$ 0.004	20,399.50 $\pm$ 285.16	2.42% $\pm$ 0.12%

Robustness to replenishment. We then test sensitivity to replenishment dynamics by varying the restocking cycle (every 150 vs. 188 sessions) and lot size (baseline vs. $+ 25 %$ ), holding the recommendation policy fixed.

The pattern in Table 7 suggests that lengthening the cycle raises stockout pressure and slightly erodes the revenue lift, which is operationally intuitive. It also shows that larger lots partially offset this pressure, improving revenue while managing stockouts via substitution under capacity constraints.

Robustness to exploration. Finally, we examine the exploration control $ε$ , an explicit managerial lever in the governed agent, to characterize the exploration–exploitation trade-off in both ranking metrics and economic outcomes. Table 8 reports the Agent’s performance as $ε$ varies. The results show a clear and predictable exploration–exploitation trade-off. As $ε$ increases, Hit@10 declines, consistent with greater exploration, while revenue rises monotonically. At the same time, stockout pressure also increases, indicating that the policy more often surfaces high-demand items that subsequently bind under inventory constraints. Margin and purchases increase in the same direction (not shown for brevity). We therefore interpret $ε$ as a meaningful governance lever, where managers can tune it to trade off ranking fidelity against economic lift and operational pressure. In the main specification, we retain $ε = 0.05$ as a conservative exploration setting.

Table 7.

Restocking sensitivity of the Agent relative to the Static policy.

Setting	$Δ$ Revenue	$Δ$ SO_Pressure
Every 150, qty $\times 1.00$	6,781 $\pm$ 1,148	3.37% $\pm$ 0.16%
Every 150, qty $\times 1.25$	7,039 $\pm$ 1,004	3.07% $\pm$ 0.15%
Every 188, qty $\times 1.00$	5,694 $\pm$ 615	3.91% $\pm$ 0.20%
Every 188, qty $\times 1.25$	5823 $\pm$ 484	3.67% $\pm$ 0.22%

Table 8.

Sensitivity to the exploration control $ε$ across seeds (mean $\pm$ SD).

$ε$	Agent Hit@10	Revenue	SO_Pressure
0.00	0.134 $\pm$ 0.001	41,640.00 $\pm$ 228.10	0.0% $\pm$ 0.0%
0.05	0.127 $\pm$ 0.002	42,404.00 $\pm$ 309.52	2.8% $\pm$ 0.2%
0.20	0.116 $\pm$ 0.003	43,352.50 $\pm$ 313.75	5.6% $\pm$ 0.3%
0.30	0.108 $\pm$ 0.001	44,172.50 $\pm$ 285.56	8.1% $\pm$ 0.4%

Mechanism: inventory productivity. To rule out a pure price/volume artifact, we report gross margin, average inventory carrying cost, average units on hand, gross margin return on inventory (GMROI), and revenue per on-hand unit in Table 9.

(For clarity, we define GMROI as gross margin divided by average inventory investment at cost over the simulation horizon, that is, $GMROI = Margin / AvgInvCost$ .) Based on the results, the Agent raises revenue and margin while reducing average inventory levels, lifting both GMROI and revenue-per-on-hand. This indicates that the lift comes from better inventory productivity, that is, more transactions at slightly lower tickets with healthy substitution, rather than simply pushing higher-priced baskets.

Overall, these robustness checks show that the Agent’s revenue and margin gains are not driven by a single setting. The revenue and margin gains generalize across broad product groups and persist under plausible changes in replenishment frequency and lot size, exploration intensity, basket size, and slate size. Improvements in GMROI and revenue-per-on-hand point to higher inventory productivity as the main mechanism. Consistent with the governed-decision framing, the Agent tends to surface feasible, in-demand complements and enables substitution when constraints bind, yielding more completed baskets at slightly lower tickets without relying on deeper sequence models or heavy retraining.

Table 9.

Economic metrics across seeds (mean $\pm$ SD).

Metric	Agent	Static	$Δ$ (Agent $-$ Static)
Revenue	23,490.90 $\pm$ 188.20	17,791.00 $\pm$ 226.85	5,699.90 $\pm$ 264.12
Margin	7,047.27 $\pm$ 56.46	5,337.30 $\pm$ 68.06	1,709.97 $\pm$ 79.23
AvgInvCost	34,671.08 $\pm$ 65.87	36,666.04 $\pm$ 79.40	$- 1, 994.97$ $\pm$ 92.44
AvgUnitsOnHand	13,105.65 $\pm$ 29.96	14,826.10 $\pm$ 24.35	$- 1, 720.45$ $\pm$ 43.25
GMROI	0.20 $\pm$ 0.00	0.15 $\pm$ 0.00	0.06 $\pm$ 0.00
Revenue per On-Hand	1.79 $\pm$ 0.02	1.20 $\pm$ 0.02	0.59 $\pm$ 0.02

6. Discussion

In this paper, we propose a basket-aware, order-invariant decision framework that bridges offline demand modeling and online, operationally governed recommendation. We develop a latent-profile world model that separates stable household preferences from within-trip basket context, and we show how the same learned structure can be reused by an agentic layer to make sequential, feasibility-aware decisions. This design reflects practical retail operations, such as inventory constraints, substitution, and manager-tunable controls, while retaining enough flexibility to capture complementarity within baskets.

From a modeling perspective, our contribution is twofold. First, we introduce an order-invariant formulation of basket context that remains candidate-dependent, so that context meaningfully affects relative rankings without imposing an artificial checkout order on basket data. Second, we show how a shared low-rank item representation can stabilize estimation in large catalogs while serving as a common interface between offline inference, online belief updating, and real-time scoring.

From an operational standpoint, the agentic layer illustrates how probabilistic preference models can be used under explicit guardrails. The composite score decomposes into a long-run preference component and a within-trip coherence component, combined through an interpretable weight with additional controls. These controls, such as feasibility masking, bounded exploration, and scarcity-based score adjustments, allow the policy to adapt to inventory and business rules without retraining the underlying world model. Empirically, this separation helps us explain why decision quality need not appear as a large change in conventional accuracy. From our simulation, we find that economic outcomes (e.g., revenue and inventory productivity) move materially even when the hit rate changes little, because feasibility and substitution reshape the realized choice and fulfillment process. This highlights a practical implication for OM: evaluating decision systems in constrained environments requires the consideration of outcome metrics (e.g., GMROI, revenue per on-hand, stockout pressure, substitution) in addition to offline ranking accuracy.

The framework also supports monitoring and controlled experimentation. Because the agent’s belief updates mirror the offline inference logic, changes in posterior concentration and evidence accumulation can provide auditable signals for state changes. More broadly, the separation between preference learning and operational scoring enables organizations to adjust governance parameters independently of model retraining, enabling faster iteration and safer implementation.

6.1. Limitations and future research

More broadly, this work opens up multiple research opportunities at the intersection of OM, agentic decision-making, and governance. We highlight the following directions:

We condition on realized basket sizes. Jointly modeling basket formation and stopping decisions would connect the consumer world model to OM outcomes such as welfare, congestion, and substitution under capacity and service constraints.

Our simulation abstracts from strategic consumer responses to recommendations. A richer consumer-behavior model that captures persuasion and learning effects, that is, how recommendations change choices and how consumers adapt over time, is a valuable extension and a natural direction for future work. In addition, incorporating consumer learning and anticipation would enable agentic analysis of feedback loops, including when governance policies mitigate unintended demand shifting.

Our strongest gains arise when feasibility binds and substitution is meaningful. In regimes with weak substitution or near-perfect availability, the operational uplift may be smaller. Likewise, under highly volatile, short-horizon preferences (e.g., trend- or event-driven baskets), stronger short-horizon signals (e.g., recency weighting or seasonality/promotion indicators) may be needed for belief tracking. This motivates OM-style characterization of when a feasibility-aware recommendation is most valuable and how governance levers should adapt across scarcity regimes.

The current evaluation emphasizes consistently active items. Extensions are needed for cold-start products and rapidly changing assortments, where the agent must learn safely online. This setting highlights governance trade-offs between bounded exploration and business risk.

Multi-objective governance (e.g., margin, fairness, long-run brand objectives, compliance) and field validation remain open directions, including methods for auditing decisions and translating policy levers into manager-operable controls.

Overall, this research demonstrates how basket-aware demand modeling and agentic decision-making can be integrated in a way that is both statistically principled and operationally actionable. By emphasizing order invariance, auditable controls, and feasibility-first decision-making, the proposed approach provides a foundation for implementing AI-driven retail decision systems that embed predictive modeling with operational objectives.

Supplemental Material

sj-pdf-1-pao-10.1177_10591478261458079 - Supplemental material for Governed agentic AI for retail baskets: A consumer world model with inventory-aware actions

Supplemental material, sj-pdf-1-pao-10.1177_10591478261458079 for Governed agentic AI for retail baskets: A consumer world model with inventory-aware actions by Xiexin Liu and Xinwei Chen in Production and Operations Management

Footnotes

ORCID iDs

Xiexin Liu

Xinwei Chen

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online (doi: ).

How to cite this article

Liu X and Chen X (2026) Governed agentic AI for retail baskets: A consumer world model with inventory-aware actions. Production and Operations Management x(x): 1–19.

References

Akçay

Natarajan

(2020) Category inventory planning with service level requirements and dynamic substitutions. Production and Operations Management 29(11): 2553–2578.

Ansari

Essegaier

Kohli

(2000) Internet recommendation systems. Journal of Marketing Research 37(3): 363–375.

Ascarza

(2018) Retention futility: Targeting high-risk customers might be ineffective. Journal of Marketing Research 55(1): 80–98.

Bauman

Tuzhilin

Unger

(2025) Hypercars: Using hyperbolic embeddings for generating hierarchical contextual situations in context-aware recommender systems. Information Systems Research 36(2): 871–895.

Berente

Recker

(2021) Managing artificial intelligence. MIS Quarterly 45(3): 1433–1450.

Bodapati

(2008) Recommendation systems with purchase data. Journal of Marketing Research 45(1): 77–93.

Bucklin

Sismeiro

(2003) A model of web site browsing behavior estimated on clickstream data. Journal of Marketing Research 40(3): 249–267.

Chen

Tian

Jiang

(2024a) When post hoc explanation knocks: Consumer responses to explainable AI recommendations. Journal of Interactive Marketing 59(3): 234–250.

Chen

Simchi-Levi

(2024b) Assortment planning for recommendations at checkout under inventory constraints. Mathematics of Operations Research 49(1): 297–325.

10.

Demirezen

Kumar

(2016) Optimization of recommender systems based on inventory. Production and Operations Management 25(4): 593–608.

11.

Donnelly

Kanodia

Morozov

(2024) Welfare effects of personalized rankings. Marketing Science 43(1): 92–113.

12.

Fang

Kim

Chintagunta

(2026) Too many or too few? Information cues in recommender systems and consequences for search and purchase behavior. Journal of Marketing 90(1): 9–28.

13.

Feuerriegel

Hartmann

Janiesch

(2024) Generative AI. Business & Information Systems Engineering 66(1): 111–126.

14.

Fleder

Hosanagar

(2009) Blockbuster culture’s next rise or fall: The impact of recommender systems on sales diversity. Management Science 55(5): 697–712.

15.

Gai

Klesse

(2019) Making recommendations more effective through framings: Impacts of user-versus item-based framings on recommendation click-throughs. Journal of Marketing 83(6): 61–75.

16.

Gallino

Moreno

(2014) Integration of online and offline channels in retail: The impact of sharing reliable inventory availability information. Management Science 60(6): 1434–1451.

17.

Ghosh

Paul

Zhu

(2022) Stocking under random demand and product variety: Exact models and heuristics. Production and Operations Management 31(3): 1006–1032.

18.

Golrezaei

Nazerzadeh

Rusmevichientong

(2014) Real-time optimization of personalized assortments. Management Science 60(6): 1532–1551.

19.

Schmidhuber

(2018) World models. arXiv preprint. arXiv:1803.10122.

20.

Hafner

Lillicrap

, et al. (2019) Dream to control: Learning behaviors by latent imagination. arXiv preprint. arXiv:1912.01603.

21.

Häubl

Trifts

(2000) Consumer decision making in online shopping environments: The effects of interactive decision aids. Marketing Science 19(1): 4–21.

22.

Huang

Rust

(2021) A strategic framework for artificial intelligence in marketing. Journal of the Academy of Marketing Science 49(1): 30–50.

23.

Huang

Rust

(2022) A framework for collaborative artificial intelligence in marketing. Journal of Retailing 98(2): 209–223.

24.

Jacobs

Fok

Donkers

(2021) Understanding large-scale dynamic purchase behavior. Marketing Science 40(5): 844–870.

25.

Jacobs

Donkers

Fok

(2016) Model-based purchase predictions for large assortments. Marketing Science 35(3): 389–404.

26.

Jassy

(2025) Message from CEO Andy Jassy: Some thoughts on Generative AI. https://www.aboutamazon.com/news/company-news/amazon-ceo-andy-jassy-on-generative-ai (accessed 8 January 2026).

27.

Kallus

Udell

(2020) Dynamic assortment personalization in high dimensions. Operations Research 68(4): 1020–1037.

28.

Kang

McAuley

(2018) Self-attentive sequential recommendation. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp.197–206.

29.

Kopalle

Gangwar

Kaplan

(2022) Examining artificial intelligence (AI) technologies in marketing via a global lens: Current trends and future research opportunities. International Journal of Research in Marketing 39(2): 522–540.

30.

Law

(2025) How Alibaba.com plans to reshape B2B trade with AI mode. https://aimagazine.com/news/how-alibaba-com-plans-to-reshape-b2b-trade-with-ai-mode (accessed 8 January 2026).

31.

Lee

Kim

(2019) Set transformer: A framework for attention-based permutation-invariant neural networks. In: International conference on machine learning. PMLR, pp.3744–3753.

32.

Malik

(2026) The next generation of AI-powered retail media. https://www.walmartconnect.com/resources/articles/2025/the-next-generation-of-ai-powered-retail-media (accessed 8 January 2026).

33.

Moe

Fader

(2004) Dynamic conversion behavior at e-commerce sites. Management Science 50(3): 326–335.

34.

Montgomery

Srinivasan

(2004) Modeling online browsing and path analysis using clickstream data. Marketing Science 23(4): 579–595.

35.

Oestreicher-Singer

Sundararajan

(2012) The visible hand? Demand effects of recommendation networks in electronic markets. Management Science 58(11): 1963–1981.

36.

Park

O’Brien

Cai

(2023) Generative agents: Interactive simulacra of human behavior. In: Proceedings of the 36th annual ACM symposium on user interface software and technology, pp.1–22.

37.

Rai

(2020) Explainable AI: From black box to glass box. Journal of the Academy of Marketing Science 48(1): 137–141.

38.

Raisch

Krakowski

(2021) Artificial intelligence and management: The automation–augmentation paradox. Academy of Management Review 46(1): 192–210.

39.

Smirnova

Vasile

(2017) Contextual sequence modeling for recommendation with recurrent neural networks. In: Proceedings of the 2nd workshop on deep learning for recommender systems, pp.2–9.

40.

Sun

(2025) How does prepopulating search bars with keywords affect online consumer behavior? A field experiment. Marketing Science 44(6): 1217–1231.

41.

Sun

Liu

(2019) BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp.1441–1450.

42.

Sutton

Barto

(1998) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

43.

Wan

Kumar

(2024) How do product recommendations help consumers search? Evidence from a field experiment. Management Science 70(9): 5776–5794.

44.

Xiao

(2018) Should an online retailer penalize its independent sellers for stockout? Production and Operations Management 27(6): 1124–1132.

45.

Yao

Zhao

(2022) ReAct: Synergizing reasoning and acting in language models. arXiv preprint. arXiv:2210.03629.

46.

Ying

Feinberg

Wedel

(2006) Leveraging missing ratings to improve online recommendation systems. Journal of Marketing Research 43(3): 355–365.

47.

Zaheer

Kottur

Ravanbakhsh

(2017) Deep sets. Advances in Neural Information Processing Systems 30.

48.

Zheng

Tong

Kwon

(2025) Frontiers: Recommending what to search: Sales volume and consumption diversity effects of a query recommender system. Marketing Science 44(3): 516–524.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.13 MB

0.00 MB