Fair Incentives for Repeated Engagement

Abstract

We study a decision-maker’s problem of finding optimal monetary incentive schemes for retention when faced with agents whose participation decisions (stochastically) depend on the incentive they receive. Our focus is on policies constrained to fulfill two fairness properties that preclude outcomes wherein different groups of agents experience different treatment on average. We formulate the problem as a high-dimensional stochastic optimization problem and study it through the use of a closely related deterministic variant. We show that the optimal static solution to this deterministic variant is asymptotically optimal for the dynamic problem under fairness constraints. Though solving for the optimal static solution gives rise to a nonconvex optimization problem, we uncover a structural property that allows us to design a tractable, fast-converging heuristic policy. Traditional schemes for retention ignore fairness constraints; indeed, the goal in these is to use differentiation to incentivize repeated engagement with the system. Our work (i) shows that even in the absence of explicit discrimination, dynamic policies may unintentionally discriminate between agents of different types by varying the type composition of the system, and (ii) presents an asymptotically optimal policy to avoid such discriminatory outcomes.

Keywords

Stochastic models Algorithmic fairness Platform churn Customer retention

1. Introduction

Stakeholder retention is a fundamental challenge faced by many organizations. For instance, it was said of former IBM executive Buck Rodgers that he “behaved as if every IBM customer were on the verge of leaving and that [he’d] do anything to keep them from bolting” (Rodgers and Shook, 1986). Indeed, though it is conventional wisdom that continued growth is necessary to ensure the success of a business, studies have routinely found that increasing customer retention rates by as little as 5% could lead to an increase in profits of up to 95% (Gallo, 2014).

Of the many ways in which an organization can increase retention, one important lever it has at its disposal is that of unconditional monetary incentives, which are paid out regardless of the retention outcome. Such incentives are used in many practical contexts:

Customer-side retention by for-profit corporations: Many loyalty programs use monetary rewards, for example, gift cards in exchange for redeemable points, to retain customers. When the redemption rewards vary with time and points need to be redeemed at an arbitrary point of time (e.g., after a certain point balance has been achieved but before they expire), the customer’s reward can be modeled as, effectively, random. A concrete example of this arises in luxury condos that offer reward programs to their tenants (Gables Residential, 2023). Though tenants receive loyalty points every month, points can only be redeemed when the balance reaches a certain point. In addition, points need to be redeemed quickly, lest they expire, and their redemption is independent of a lease renewal decision, that is, they are unconditional.

Nonprofit organizations: Lotteries have been found to be extremely effective in incentivizing charitable giving in empirical studies (Landry et al., 2006). In practice, certain banks have set up programs, referred to in Germany as Gewinnsparen, that enroll their clients for recurring donations for local causes in exchange for a chance to win a cash amount (Stiftung Warentest, 2004). Monetary awards have also been found to effectively combat volunteer attrition, which plagues nonprofit organizations across the board (Downs et al., 2014; Frey and Gallus, 2017).

In implementing these sorts of monetary incentives, one important concern that a decision-maker faces is that of fairness, especially in light of the abundance of examples in which algorithms deployed in the real world unintentionally discriminate against protected groups (Kleinberg et al., 2018). Within the context of monetary incentives for retention, a decision-maker may want to maximize her bang-per-buck by paying individuals with different earnings sensitivities different amounts in order to minimize the cost of retaining them. However, since individuals’ earnings sensitivities correlate with protected classes such as gender (Heckert et al., 2002), such unconstrained policies would discriminate between classes.

Our work aims to better understand the role of fairness in the optimal design of unconditional monetary incentives for repeated engagement. Indeed, despite the frequent use of such incentives in practice, as well as extensive empirical work on their effectiveness in a variety of settings, to the best of our knowledge, there have been few attempts to develop theoretical insights into their design and the associated risk of algorithmic group discrimination. We further detail our contributions below.

1.1. Summary of Contributions

We consider a model wherein agents join a system in each (discrete-time) period, and receive a (possibly random) reward to remain in the system in the next period. At the end of the period, unaware of the underlying distribution from which rewards are drawn, agents probabilistically make a decision based on the reward received in the period to stay in the system, or leave once and for all. Specifically, we assume that agents are partitioned into types defined by (i) their sensitivity to rewards, formalized via a departure function that maps rewards received to the probability of departing, and (ii) the rate at which they join the system.¹ This model, though simple, gives rise to a spectrum of models of agent behavior. In particular, most of our results hold for any departure functions, as long as these functions are nonincreasing in the reward paid out to an agent.

The decision-maker collects some revenue associated with the number of agents in the system in each period and incurs the cost associated with incentivizing these agents to stay according to the chosen reward distribution (where the support of this distribution is assumed to be an arbitrary finite set). The goal of the decision-maker is to determine the optimal policy to maximize her long-run average profit. As is common in classical stochastic control problems, the infinite-horizon Markov Decision Process (MDP) associated with the decision-maker’s optimization problem suffers from the curse of dimensionality, which motivates the task of finding near-optimal policies.

One natural approach a decision-maker may want to take to maximize her profit is to learn, then discriminate: given the history of rewards paid out to each agent, the decision-maker could try to estimate each agent’s type, and “target” agents whom she believes would stay in the system for lower rewards. However, not only is such explicit discrimination potentially problematic from a public relations standpoint, but it also runs counter to group fairness, which at a high level requires that an algorithm treat (reward) individuals belonging to different groups (e.g., demographic groups) similarly.² Avoiding this sort of explicit discrimination, the decision-maker can then turn to dynamic policies that draw rewards i.i.d. from the same distribution in each period, all the while actively managing the number of agents in the system by varying the distribution across periods. However, even these seemingly fair policies may discriminate implicitly: we show that they can lead to some groups receiving consistently higher rewards than others due to the members of the former (latter) group self-selecting into periods with higher (lower) rewards.

Against this backdrop, our work aims to characterize retention policies that fulfill two stringent fairness requirements: (i) agents must be paid from the same reward distribution in each time period, and (ii) agents of different types must experience the same reward distribution on average, over a long enough time horizon (Definition 1). The two fairness constraints respectively enforce distributional envy-freeness within and across periods. On the one hand, if one views the reward distribution abstraction as agents playing a lottery for retention, distributional envy-freeness within periods ensures that agents know they are playing the same lottery. This is a reasonable principle given that, in the systems we consider, agents are symmetric from a revenue perspective (e.g., there is no “specialization” across types). On the other hand, though group fairness may not be a requirement in all settings, the lack thereof may be problematic (for instance, in settings where departure probabilities are correlated with protected classes such as gender and race (Heckert et al., 2002)). In particular, in such contexts, an audit—or even a company-authored DEI report—might find in hindsight that, despite avoiding explicit discrimination and not even attempting to learn the types of agents, an organization may nonetheless pay higher bonuses/rewards to some demographic than to another; whether or not this is justifiable, it poses a potential risk that organizations should be aware of.

In our first contribution, we show that there exists a static policy that is asymptotically optimal amongst the space of all policies that fulfill our fairness criteria (Theorems 1 and 3). We also show that dynamic policies can strictly outperform any static policy (Propositions 2 and 3); in other words, the asymptotic value of dynamic policies in our setting arises solely from the ability to (implicitly) discriminate between types. Though we do not seek to prescribe our stringent fairness constraints for all retention settings, a main insight of our work is that the value of a dynamic policy, in our setting, comes only from exploiting different retention probabilities across groups.

The proof that a static policy is asymptotically optimal is constructive, that is, we design a heuristic fluid-based static policy for which our asymptotic guarantees hold as the market size is scaled by a parameter $θ$ . We moreover show (Theorem 3) that our fluid-based heuristic is fast-converging in the sense that it converges to a fluid upper bound at a rate of $O (1 / θ)$ (under a mild technical condition on the decision-maker’s revenue function; without this condition it converges at a rate of $O (1 / \sqrt{θ})$ ).

While our policy satisfies these natural desiderata, computing it requires us to solve a high-dimensional nonconvex optimization problem which is, a priori, nontrivial to optimize. In our final technical contribution, we show a surprising structural property of the problem that allows us to efficiently compute its optimal solution. In particular, independent of the size of the reward set, the number of agent types, and their departure probabilities, there exists a fluid-optimal reward distribution that places positive weight on at most two rewards (Theorem 2). This allows us to identify an optimal solution by considering all pairs of rewards and then solving a KKT condition that consists of a single equation in one variable for each pair. A similar two-reward structure of fluid-based policies has been identified in other operational settings (Bassamboo and Randhawa, 2016); we use this result to derive insights into the structure of the fluid optimal policy for certain special cases. In particular, we show that the convexity of the departure probability function impacts the optimal dispersion level of the optimal reward scheme, lending credence to the use of “surprise-and-delight” lotteries for retention (Proposition 6).

1.1.1. Structure of the Paper

In Section 1.2, we survey related literature. We then present the model and formulate the decision-maker’s optimization problem in Section 2. We use a deterministic relaxation of our system to show in Section 3 that the fluid-based heuristic is optimal amongst all policies that satisfy our fairness constraints; though we also show the existence of dynamic policies that outperform the fluid heuristic, these policies are inherently discriminatory. Section 4 is devoted to analyzing the fluid-based heuristic and proving its fast convergence to the value of the fluid relaxation in a large-market regime. Finally, Section 5 leverages our analysis of the fluid heuristic to characterize optimal policies in special cases of interest. All figures and proofs of results are relegated to the E-Companion.

1.2. Related Work

1.2.1. Workforce Capacity Planning

Our work is related to the topic of workforce capacity planning, which has a long history in the operations management literature (for an excellent survey, see, e.g., De Bruecker et al., 2015). Within this line of work, we highlight papers that consider attrition and retention aspects of workforce planning. In contrast to our work, which focuses on the question of issuing monetary incentives throughout an agent’s lifetime to retain them, these works are concerned with hiring, promotion, and termination decisions. For example, motivated by the naval aviation system, early work by Grinold (1976) considered optimal accession policies when aviators have a known and deterministic lifetime. More recently, Hu et al. (2016) studied optimal hiring and admission and training policies for junior nurses, a profession in which attrition is pervasive, and as a result has been a central focus of much of the workforce planning literature. In their model, a fixed and exogenous fraction of the population leaves the system in each period, whereas in ours the decision-maker aims to set incentives in order to affect their retention. This work also resembles ours in that agents are homogeneous from a skills perspective (though ours are heterogeneous with respect to their departures).

A subset of the literature on workforce capacity planning is interested in worker heterogeneity; however, most of these works focus on heterogeneity with respect to skill set, not with respect to attrition. Ahn et al. (2005) consider a model in which workers turn over independently of the organization’s policy and the state of the system, whereas Gans and Zhou (2002) and Arlotto et al. (2014) allow workers’ departure decisions to depend on the state of the system, but not the decision-makers’ policy nor their own history in the system. Most recently, Jaillet et al. (2022) considered a more complex model of hiring, dismissing, and promoting when workers’ resignation decisions depend on their “time-in-grade,” or lifetime, in the system. To the best of our knowledge, no works in this stream consider the fairness implications of policies.

1.2.2. Customer Retention

We highlight the most closely related works here, as this area has a rich history in the marketing literature (for a survey, see, e.g., Ascarza et al., 2018). To the best of our knowledge, none of these papers focus on designing fair customer retention policies. On the contrary, the goal in these latter works is precisely to use differentiation in order to incentivize customers to stay. For example, in a computational study Lemmens and Gupta (2020) define a profit-based loss function to predict, for each customer, the financial impact of a retention intervention, ranking customers based on the marginal impact of the intervention on churn, and postintervention profits. Aflaki and Popescu (2014) develop theoretical insights around optimal retention policies, in a setting where customer “types,” or sensitivities to interventions, are known by the decision-maker, thus allowing for customer differentiation; in their model, the optimal decision across the population decouples into optimal decisions for each individual customer. A separate stream of work investigates how capacity decisions affect service access quality, and customer retention as a result (Afèche et al., 2017; Furman et al., 2021). In contrast to the interventions we consider, which occur in each period, in the settings these latter papers consider, the decision-maker is constrained to make a single decision at the beginning of the time horizon.

1.2.3. Empirical Work on Effectiveness of Monetary Incentives

The effectiveness of monetary incentives for retention is also well-documented in the medical community. Empirical studies highlight their efficacy within the context of adherence to medication (Kimmel et al., 2012; Volpp et al., 2008), weight loss and exercise (Meeker et al., 2021; Volpp et al., 2008), postpartum compliance (Stevens-Simon et al., 1994), and home-based health monitoring (Sen et al., 2014), for instance.

1.2.4. Algorithmic Fairness

Finally, our work adds to the large and growing body of work on the design of fair algorithms. We note that there is no universally agreed-upon notion of fairness (for a comprehensive overview of the different notions of fairness that have been considered in the literature, see, e.g., Mehrabi et al., 2021). The notion of fairness that we choose to focus on is that of group fairness (as opposed to individual fairness (Dwork et al., 2012)), which itself has no single definition. The one closest to ours, that arises in the machine learning literature within the context of group-fair classifiers, is statistical parity (Corbett-Davies et al., 2017). This requires that individuals in both protected and nonprotected groups have equal probabilities of being assigned to the positive predicted class; interpreting the set of rewards as possible predicted classes, our fairness definition can be understood as a multidimensional variant of statistical parity.

An operational setting that is related to our study is fair (online) resource allocation; this has received significant recent attention (Allouah et al., 2022; Balseiro and Xia, 2022; Banerjee et al., 2023; Banerjee and Freund, 2024; Bateni et al., 2022; Freund et al., 2023; Manshadi et al., 2023; Sinclair et al., 2023). Other related literature on fair algorithms includes pricing problems with fairness constraints such as those studied by Cohen et al. (2021, 2022) and Salem et al. (2021). However, none of these works model customer attrition.

2. Preliminaries

We consider a discrete-time, infinite-horizon model of an organization, which we henceforth generically refer to as a system. In each period agents join the system, receive a reward, and decide whether to stay in the system for future periods or leave. The decision-maker makes a profit, in each period, composed of the revenue from the number of agents in the system, net of the cost of the rewards paid out to agents. For example, within the employment context, an agent corresponds to a worker performing a set of tasks in each period, with the reward corresponding to a bonus incentive; an agent can similarly correspond to a customer who enjoys service from her cable provider in a given period, with the reward corresponding to a discount. We formalize each component of the model below, beginning with some technical notation. For clarity of exposition, we defer a lengthy discussion of modeling assumptions to the end of the section.

Technical Notation. Throughout the paper, $R^{+} = {x \in R | x \geq 0}$ , $R^{> 0} = {x \in R | x > 0}$ , and $N^{+} = {i \in N | i \geq 1}$ . For $K \in N^{+}$ , we let $[K] = {1, \dots, K}$ . We use $supp (f)$ to denote the support of a given probability mass function $f$ , that is, $supp (f) = {x : f (x) > 0}$ , and $| supp (f) |$ denotes the cardinality of its support. Moreover, given set $Ξ \subset R$ , we let $Δ^{| Ξ |} = {x \in [0, 1]^{| Ξ |} : \sum_{r} x_{r} = 1}$ be the standard probability simplex over $Ξ$ . Finally, $e_{r}$ denotes the unit vector in the direction of reward $r$ .

2.1. Basic Setup

2.1.1. Agents

We assume there are $K \in N^{+}$ types of agents, defined by (i) their reward sensitivity, and (ii) the rate at which they join the system. Specifically, for $i \in [K]$ , a type $i$ agent is associated with a departure probability function $ℓ_{i} : Ξ \mapsto (0, 1]$ , where $Ξ \subset R^{+}$ is a finite set of rewards from which the decision-maker chooses to compensate its agents.³ We assume that $ℓ_{i}$ is nonincreasing and known to the decision-maker. Let $r_{min} = inf {r | r \in Ξ}$ and $r_{max} = sup {r | r \in Ξ}$ . Moreover, the number of type $i$ arrivals in period $t$ , $A_{i} (t)$ , is drawn i.i.d. (across types and periods) from a $Pois (λ_{i})$ distribution also known to the decision-maker; let $A (t) = (A_{i} (t), i = 1 \in [K])$ .

Remark 1
The exogenous arrival assumption is for ease of exposition. In Appendix EC.4.1 we show that our main result applies to an endogenous entry model in which agents choose to enter the system by comparing their long-run average earnings to a type-dependent reservation value.
2.1.2. Periods

Period $t \in N$ is defined by the following sequence of events: (i) for all $i \in [K]$ , $A_{i} (t)$ type $i$ agents join the system; (ii) each agent $w$ is in the system (e.g., enjoying service, or working, for the duration of the period), and collects a (possibly random) reward $r_{w}$ ; (iii) having collected reward $r_{w}$ , agent $w$ , unaware of the reward distribution, departs from the system with probability $ℓ_{i_{w}} (r_{w})$ , where $i_{w} \in [K]$ denotes the type of agent $w$ ; otherwise, she remains in the system and moves onto the next period. We assume that the agent’s decision to leave the system is made independently from all other agents, and independently of her prior history of rewards, that is, agents make a decision based only on their most recent experience; moreover, once an agent leaves, she does not return in a later period. This latter assumption—that the agent is “lost for good”—is standard in the workforce planning and customer retention literature (see, e.g., Aflaki and Popescu, 2014; Arlotto et al., 2014.)

We use $D_{i} (t)$ to denote the number of type $i$ departures at the end of period $t$ , with $D (t) = (D_{i} (t), i \in [K])$ . Finally, let $N_{i} (t)$ be the number of type $i$ agents in the system in period $t$ (and thus requiring payment), with $N (t) = (N_{i} (t), i \in [K])$ , and $N (t) = \sum_{i \in [K]} N_{i} (t)$ . $N (t)$ is based on the number of agents who were in the system in the previous period and did not depart, as well as the number of new agents who joined at the beginning of the current period. Given the described dynamics, the state of the system is fully characterized by $N (t)$ , which evolves as

N (t + 1) = N (t) - D (t) + A (t + 1), \forall t \in N .

We next specify the decision-maker’s objective and corresponding optimization problem.

2.1.3. Objective

Given $N (t)$ , the total number of agents in period $t$ , the decision-maker obtains revenue $R (N (t))$ , where $R : R^{+} \mapsto R^{+}$ . We assume that $R$ is time-invariant, and depends only on the total number of agents in the system, rather than the type composition of the agent pool in each period. $R$ is moreover assumed to be $L$ -Lipschitz continuous and differentiable over $R^{+}$ , as well as nondecreasing and concave. Given ${r_{w}}_{w = 1}^{N (t)}$ , the (possibly random) set of rewards paid out to agents in period $t$ , the period- $t$ profit is given by:

Π (t) = R (N (t)) - \sum_{w = 1}^{N (t)} r_{w} .

Let

\hat{Π} (t)

denote the expected profit in period

t

, where the expectation is taken over the randomness in the reward realizations.

We formulate the decision-maker’s optimization problem as a discrete-time, infinite-horizon MDP, where the objective is to maximize the long-run average profit. Suppose the initial condition is $N (0) = c_{0}$ , $c_{0} \in N^{K}$ . For any policy $φ$ , let $v (φ)$ denote the long-run average profit under $φ$ (assuming this limit exists). Formally:

v (φ) = lim_{T \to \infty} \frac{1}{T} E [\sum_{t = 1}^{T} \hat{Π} (t) | N (0) = c_{0}] .

In complete generality, a policy

φ

maps the set of agents in the system in a given period, and the history of rewards observed by each agent, to a distribution of rewards for each of these agents in that period. We restrict our attention to the set of policies for which the above limit exists. More importantly, we impose that any policy

φ

satisfy the following two fairness criteria:

If two agents are in the system in the same period, the distribution from which their rewards are drawn is identical. Thus, $φ$ must map the set of agents in the system in a given period, and the history of rewards observed by each agent, to a single distribution over rewards in every period. We denote the distribution in period $t$ by $x (t) = (x_{r} (t), r \in Ξ)$ . Then, in a fixed period $t$ , we have

\hat{Π} (t) = R (N (t)) - N (t) (\sum_{r} r x_{r} (t)),

where the second term captures the expected payout across all

N (t)

agents who each receive reward

r

with probability

x_{r} (t)

The average reward distribution observed by agents of different types, conditional on their types, is “approximately” the same over time. We refer to this latter constraint as group fairness and provide its mathematical formalization in Section 3.

At this point we reiterate that it is not our goal to prescribe our fairness definitions as the only reasonable ones across all industries. However, given the spectrum of ways in which a decision-maker can differentiate between groups, it is unclear where one should draw the line. For instance, consider a policy in which type 1 agents receive a reward of 50 almost surely, whereas type 2 agents receive a reward of zero 49% of the time, a reward of 100 49% of the time, and a reward of 50 2% of the time. The reward distributions seen by both agent types have the same expectation and median; however, depending on the context, one type may be perceived as having a more desirable reward distribution (as Type 1 benefits from the certainty of a reward of 50 in each period). Our fairness definition may be particularly stringent in choosing where to draw the line, but it provides a simple first step to understanding how heterogeneous groups differentially self-selecting into the system may produce unfair outcomes.

2.2. Large-Market Regime

Given the size of the state space, the curse of dimensionality renders the goal of solving the MDP to optimality intractable. As a result, we turn to the more attainable goal of designing asymptotically optimal policies in a so-called large-market limit.

The regime we consider is defined by a sequence of systems parametrized by $θ \in N^{+}$ , with $λ_{i}^{θ} = θ λ_{i}$ for all $i \in [K]$ , and departure probabilities ${ℓ_{i} (\cdot)}_{i \in [K]}$ held fixed. We use $N^{θ} (t)$ to denote the number of agents in the system in period $t$ in the scaled system, and $N^{θ}$ the steady-state number of agents in the system. In order to keep the cost and revenue of any given policy in the same order, we define the normalized profit in the scaled system at time $t$ to be

{\hat{Π}}^{θ} (t) = R (\frac{N^{θ} (t)}{θ}) - \frac{N^{θ} (t)}{θ} (\sum_{r} r x_{r} (t)) .

For

θ \in N^{+}

, we let

v_{θ} (φ)

denote the long-run average normalized profit of policy

φ

We briefly motivate this choice of scaling via a newsvendor-like revenue function. Suppose $R (N) = min {N, D}$ , for all $N \in N$ and some $D > 0$ . Then, scaling both demand and supply by $θ$ , a more standard scaling in which we simply divide $R$ by $θ$ would give us:

\frac{1}{θ} R (N^{θ}) = \frac{1}{θ} min {N^{θ}, θ D} = min {\frac{N^{θ}}{θ}, D} = R (\frac{N^{θ}}{θ}) .

With strictly concave and increasing

R

, a more standard scaling in which the revenue function is not normalized (i.e.,

R (N^{θ})

is not replaced with

R (N^{θ} / θ)

), yields a vacuous asymptotically optimal solution

x = e_{r_{min}}

, and the objective going to

- \infty

. Scaling the system in this manner overcomes such vacuities. Finally, we remark that a constant additive loss in

{\hat{Π}}^{θ}

translates to an

Ω (θ)

loss in the nonnormalized profit

θ R (N^{θ} / θ) - N^{θ} (\sum_{r} r x_{r})

2.2.1. Deterministic Relaxation of the Stochastic System

In order to analyze first-order differences between policies, we consider a deterministic relaxation of the stochastic system, formally defined below.

Let $(x (t))_{t \in N^{+}}$ denote a sequence of reward distributions. In each period $t$ , for all $i \in [K]$ , we observe $λ_{i}$ arrivals and ${\tilde{N}}_{i} (t) \sum_{r} ℓ_{i} (r) x_{r} (t)$ departures of type $i$ agents, where ${\tilde{N}}_{i} (t)$ satisfies the following inductive relation:

{\tilde{N}}_{i} (t + 1) = {\tilde{N}}_{i} (t) + λ_{i} - {\tilde{N}}_{i} (t) \sum_{r} ℓ_{i} (r) x_{r} (t), \forall t \in N^{+} .

(1)

Let

\tilde{N} (t) = \sum_{i} {\tilde{N}}_{i} (t)

, for

t \in N^{+}

. Let

\tilde{φ}

denote the policy which pays out rewards from

(x (t))_{t \in N^{+}}

in this deterministic system, and let

\tilde{Π} (\tilde{φ})

denote the long-run average profit induced by

\tilde{φ}

in the deterministic system, that is,

\tilde{Π} (\tilde{φ}) = lim_{T \to \infty} \frac{1}{T} \sum_{t = 1}^{T} [R (\tilde{N} (t)) - (\sum_{r} r x_{r} (t)) \tilde{N} (t)] .

(2)

Given policy

φ

, when necessary we use

{\tilde{N}}^{φ} (t)

to emphasize the dependence on

φ

. (As in the stochastic system, we assume

\tilde{φ}

is such that this limit exists.)

In order to establish the connection between the two systems, consider the following coupling: for any $T, θ \in N^{+}$ , fix payout distributions $x (1), \dots, x (T)$ . Moreover, let $Δ (T)$ be the absolute difference between the expected profit in the stochastic system and the profit in the deterministic system over $T$ periods, that is,

\begin{aligned} Δ (T) & := | (\sum_{t = 1}^{T} E [R (\frac{N^{θ} (t)}{θ})] - \sum_{r} r x_{r} (t) E [\frac{N^{θ} (t)}{θ}]) \\ - (\sum_{t = 1}^{T} R (\tilde{N} (t)) - \sum_{r} r x_{r} (t) \tilde{N} (t)) | . \end{aligned}

The following proposition states that the long-run average profit in the deterministic system converges to the long-run average expected profit in the large-market regime.

Proposition 1

Suppose ${\tilde{N}}_{i} (0) = E [\frac{N_{i}^{θ} (0)}{θ}]$ for all $i \in [K]$ . Then, there exists a constant $C_{0} > 0$ (independent of $T$ , $θ$ ) such that $lim_{T \to \infty} \frac{Δ (T)}{T} \leq C_{0} / \sqrt{θ}$ .

We conclude the section with a discussion of our modeling assumptions.

2.3. Discussion of Modeling Assumptions

2.3.1. Memoryless Agents

One limitation of our model is that agents decide to stay or leave based only on the most recent reward; in Appendix EC.8 we extend our fairness result to a setting in which agents have a two-period memory. The fact that the same results hold there suggests that similar results may hold for even more general models of memory. However, generalizing the memoryless assumption of our model is not the main objective of our work. Indeed, the assumption is motivated by the fact that agents typically lack insight into the algorithms that generate the decisions they receive (e.g., why an algorithm paid out a reward in a given period, in our setting). Moreover, the memoryless assumption follows a long tradition of models that consider agent attrition (also referred to as disengagement, in some settings) in the operations literature. For instance, within the context of recommender systems, Ben-Porat et al. (2022) and Bastani et al. (2022) assume that the probability of a user disengaging with the recommendations depends only on the quality of the most recent recommendation. Afèche et al. (2017) similarly note that this sort of “recency effect” is typically assumed in the customer retention setting, in models that link demand to past service levels (Hall and Porteus, 2000; Ho et al., 2006; Liu et al., 2007). Lemmens and Gupta (2020) also consider memoryless customers in their churn prediction problem. As a result, the focus of our work is not on improving existing models of agent memory, but rather on leveraging existing memoryless models to gain insights into fairness considerations for these well-studied systems.

We conclude the discussion of the memoryless assumption by noting that the abstraction of a period is very general. For example, in a contractual employment setting, a period could be considered to be the length of the contract. In a noncontractual setting, a period could constitute however long agents are believed to consider past rewards before making the decision to leave the system. In the setting where agents are customers, a period would be the duration of the subscription contract.

2.3.2. Time-invariance of the Revenue Function

Another assumption upon which our model relies is the fact that the revenue function depends only on the number of agents in the system in a given period. In an employment setting, for instance, the time-invariant assumption models a mature market, with newsvendor-like dynamics. The work performed in the system can be viewed as “low-skill,” in the sense that workers arriving at the system are homogeneous, and the decision-maker does not benefit from workers gaining skill specificity with time. In the customer retention setting, on the other hand, stationarity of the revenue function is a reasonable assumption within the context of profit generated from the number of active subscribers in a mature market (ignoring heterogeneity in subscription plans). An interesting question beyond the scope of our work is whether a dynamic policy outperforms a static policy when the revenue function is dictated by a state that can be either low or high; though a static policy would do much worse in such a setting, it is unclear whether a dynamic policy can outperform a static one without violating our fairness constraints.

2.3.3. Unconditional Versus Conditional Incentives

As in Lemmens and Gupta (2020), we focus on unconditional incentives, wherein the decision-maker pays out the reward independently of the agent’s decision to stay or leave, as opposed to conditional incentives. Empirical evidence of the effectiveness of such unconditional incentives has been found in the behavioral sciences, for example, within the context of physician and patient surveys (Abdulaziz et al., 2015; Young et al., 2015; Rosoff et al., 2005), in addition to clinical study enrollment (Young et al., 2020; Kumar et al., 2022). We believe that the analysis of conditional incentives can be similarly approached.

3. Optimal Policies via the Deterministic Relaxation

In this section, we design and analyze a heuristic policy within the context of the deterministic system. We begin by formalizing the group fairness constraint, first introduced in Section 2.

Definition 1 (Group-fair policy)

A policy $\tilde{φ}$ defined by sequence of reward distributions $(x (t))_{t \in N^{+}}$ is group fair if, for all $δ > 0$ , there exists $τ_{0} \in N^{+}$ such that for all $τ > τ_{0}$ :

\begin{aligned} {‖ \frac{1}{\sum_{t = t^{'}}^{t^{'} + τ} {\tilde{N}}_{i}^{φ} (t)} \sum_{t = t^{'}}^{t^{'} + τ} {\tilde{N}}_{i}^{φ} (t) x (t) - \frac{1}{\sum_{t = t^{'}}^{t^{'} + τ} {\tilde{N}}_{j}^{φ} (t)} \sum_{t = t^{'}}^{t^{'} + τ} {\tilde{N}}_{j}^{φ} (t) x (t) ‖}_{1} \\ < δ \forall t^{'} \in N^{+}, \forall i, j \in [K] . \end{aligned}

(3)

Informally, a group-fair policy guarantees that, over any long enough time interval, the expected reward distributions respectively observed by different agent types do not differ too greatly.

We first show that, despite the unwieldiness of the group fairness constraint, there exists an exceedingly simple group-fair policy that is optimal in the context of the deterministic system: a policy that pays out the same distribution in each period.

3.1. Optimality of the Fluid-Based Heuristic

Consider the following optimization problem, termed Fluid-Opt, which computes the optimal static policy in the deterministic system described above:

\begin{aligned} {\tilde{Π}}^{*} & := max_{x \in Δ^{| Ξ |}, \tilde{N} \in N^{K}} R (\sum_{i} {\tilde{N}}_{i}) - (\sum_{r} r x_{r}) (\sum_{i} {\tilde{N}}_{i}) \\ s.t. λ_{i} = {\tilde{N}}_{i} \sum_{r} ℓ_{i} (r) x_{r} \forall i \in [K] . \end{aligned}

(Fluid-Opt)

Here, the stability constraint ensures that, for each type, the number of arrivals and departures are equal, and follows from plugging

x_{r} (t) = x_{r}

, for all

r \in Ξ, t \in N^{+}

into (1). Note moreover that omitting the group fairness constraint (3) is without loss of generality, as static policies are necessarily group fair. We have the following theorem.

Theorem 1

Let $Φ$ denote the space of all fair policies. Then, $sup_{φ \in Φ} \tilde{Π} (φ) = {\tilde{Π}}^{*}$ . That is, there exists an optimal fair policy that is static.

In the remainder of the paper, we refer to the optimal static policy as the fluid heuristic. The proof of Theorem 1 is constructive. In particular, we show that the static policy which allocates each reward $r \in Ξ$ according to its long-run average probability under any fair dynamic policy induces a weakly higher long-run average revenue at a weakly lower cost, thus implying weakly improved profit. In fact, in Appendix EC.4.1 we prove an even stronger statement: that in a system with endogenous arrivals, where types choose to join the system by comparing their respective long-run average rewards to a reservation wage (see Appendix EC.4.1 for a formal specification of such a model), there exists an optimal fair policy that is static. Since exogenous arrivals are a special case of endogenous arrivals (i.e., all types have a reservation wage of zero), we obtain Theorem 1.

3.2. Impact of Discrimination by Type

We next investigate the impact of the two fairness constraints imposed. In particular, when expanding the space of policies beyond fair ones, one approach a decision-maker could take would be in the flavor of learn, then discriminate: by deploying machine learning algorithms to learn agents’ types, a decision-maker can leverage this additional information to then pay agents of different types different amounts. We say that such policies explicitly discriminate.

Proposition 2 formalizes the intuition described above that policies that learn agent types and target “cheaper” agents can greatly outperform optimal fair policies.

Proposition 2
Consider the setting with $K = 2$ , $Ξ = {0, v_{1}, v_{2}}$ , $v_{1} < v_{2}$ , and the following departure probabilities:
$\begin{aligned} ℓ_{1} (r^{'}) & = {\begin{cases} 1 & if r^{'} = 0 \\ 0 & if r^{'} \in {v_{1}, v_{2}} \end{cases} and \\ ℓ_{2} (r^{'}) & = {\begin{cases} 1 & if r^{'} \in {0, v_{1}} \\ 0 & if r^{'} = v_{2} . \end{cases} \end{aligned}$
Moreover, let $R (\tilde{N}) = α min {\tilde{N}, D}$ , $α > 2 v_{2}$ , and $λ_{1} = D / 4, λ_{2} = D / 2$ . Then, there exists a policy $φ^{b}$ that explicitly discriminates such that $\tilde{Π} (φ^{b}) - \tilde{Π} (φ^{s}) = Ω (D)$ , where $φ^{s}$ is the optimal static policy.

The policy $φ^{b}$ that we construct is belief-based, that is, it targets cheaper type 1 agents by first learning their type and then keeping them in the system, all the while keeping type 2 agents out of the system. Specifically, $φ^{b}$ learns the type of agents early on by paying all arriving agents $v_{1}$ . If an agent stays in the system after having been paid $v_{1}$ , then this agent is necessarily a type 1 agent, who is “cheaper” to keep in the system than a type 2 agent. Once enough type 1 agents are in the system, the policy no longer needs to keep arriving agents in the system and can pay them nothing for the rest of time.

The above policy clearly violates our first fairness desideratum of drawing rewards from the same distribution for all agents within a given period. Our next result shows that there exist policies that satisfy this first fairness constraint, but fail to be group fair; moreover, avoiding group fairness allows this policy to outperform any fair policy by an unbounded amount. We refer to this more subtle version of discrimination, which pays agents in the same period according to the same distribution, as implicit discrimination. In order to illustrate this, we introduce the notion of a cyclic policy.
Definition 2 (Cyclic policy)

Policy $φ$ is cyclic if there exists $τ \in N^{+}$ such that $x (t + τ) = x (t)$ for all $t \in N^{+}$ . The smallest $τ$ for which this holds is the cycle length of policy $φ$ , which we term $τ$ -cyclic.

When making the distinction between a $τ$ -cyclic policy $φ^{τ}$ and another policy $φ$ , we sometimes use ${\tilde{N}}_{i}^{τ} (t)$ , for $i \in [K], t \in N^{+}$ . Proposition 3 shows that cyclic policies may implicitly discriminate, and outperform the optimal static policy.

Proposition 3
Suppose $K = 2$ , and $λ_{2} = λ, λ_{1} = 0.1 λ$ , $λ > 0$ . Let $Ξ = {0, r}$ , for some $r > 0$ , with departure probabilities given by:
$ℓ_{1} (r^{'}) = {\begin{cases} 0 & if r^{'} = r \\ 0.1 & if r^{'} = 0 \end{cases} and ℓ_{2} (r^{'}) = {\begin{cases} 0.5 & if r^{'} = r \\ 1 & if r^{'} = 0. \end{cases}$
Suppose moreover that $R (\tilde{N}) = α \tilde{N}$ , $α \in [0.7 r, r)$ . Consider the cyclic policy $φ^{c}$ of length $2$ which alternates between the two rewards in every period, that is, the policy defined by $(x_{r} (t), t \in N^{+})$ such that:
$x_{r} (t) = {\begin{cases} 1 & if t odd \\ 0 & if t even . \end{cases}$
Then, $\tilde{Π} (φ^{c}) - \tilde{Π} (φ^{s}) = Ω (λ) .$

The cyclic policy described above engages in strategic reward slashing: it induces a large number of type 2 agents to stay in the system every other period, thus benefiting from their presence in the next period. In this next period, however, the decision-maker is able to retain all of its revenue as net profit by not incentivizing agents to stay in the system. We show in Proposition EC.1 that under $φ^{c}$ , while a type 1 agent in expectation receives the higher reward approximately 50% of the time, a type 2 agent is only paid the higher reward 40% of the time in expectation. Thus, $φ^{c}$ fails to satisfy the group-fairness constraint. (This instance technically violates the assumption that $ℓ (r_{max}) > 0$ . We chose these inputs for ease of exposition; one can similarly construct instances where $ℓ_{1} (r_{max}) = ϵ$ for small enough $ϵ > 0$ .)

In both examples constructed above, a reward of 0 and high exogenous arrival rates were chosen for clarity of exposition. One can similarly construct an example with $r_{min} > 0$ , and significantly smaller arrival rates (e.g., ${\tilde{λ}}_{1} + {\tilde{λ}}_{2} = 0.01 (λ_{1} + λ_{2})$ ), with a significantly longer learning period/period of building up the number of agents in the system. Thus, these insights are not intrinsically tied to the exogenous arrival rate, or a large presence of “free” agents. Finally, this phenomenon is also not tied to the fact that agents join the system independent of their expected earnings. In this example, the expected reward of type 1 is $\frac{19}{39} \cdot r$ and the average reward of type 2 is $0.4 r$ (see Proposition EC.1 for a derivation). Hence, in the setting with endogenous participation decisions (see Appendix EC.4.1), as long as type 1 and 2 agents have reservation wages of at most $\frac{19}{39} \cdot r$ and $0.4 r$ , respectively, both types would choose to join the system under an endogenous arrival model as in Remark 1, and type 2 agents would still receive lower rewards on average.

We conclude the section by noting that a natural quantity to consider is the price of fairness, that is, the worst-case ratio (across all problem instances) between the decision-maker’s optimal profit with all fairness constraints relaxed, and her profit under the optimal fair policy. A slight modification to the instance in Proposition 3 immediately gives us that the price of fairness in our setting is unbounded. To see this, let $v^{}$ denote the value of the optimal fair solution for this instance. Defining revenue function $\hat{R} (\tilde{N}) = R (\tilde{N}) - v^{}$ , the optimal fair solution for this new problem instance achieves a profit of zero, whereas the cyclic policy still achieves strictly positive profit, thus resulting in an unbounded price of fairness.
4. Structure and Analysis of the Fluid Heuristic Policy

Leveraging a deterministic relaxation to develop asymptotically optimal policies is a classical approach in many stochastic control problems. In these models, not only is the deterministic relaxation is a natural upper bound, but it is also an attractive candidate for developing good policies due to its tractability; in particular, the corresponding fluid problem can typically be cast as either a linear or convex program. In this section, we first establish that our problem does not inherit such a convenient structure. Proposition 4 formalizes this.

Proposition 4
(\textsc{Fluid-Opt}) is, in general, nonconvex.

Thus, solving (\textsc{Fluid-Opt}) efficiently is a priori a difficult task given its high-dimensional and nonconvex nature. Though the issue of nonconvexity is typically circumvented via an exchange of variables (Cao et al., 2020), this approach fails in our setting due to the nonlinearity in the total cost of employing ${\tilde{N}}_{i}$ agents, ${\tilde{N}}_{i} (\sum_{r} r x_{r})$ , for $i \in [K]$ . Further, the arbitrary heterogeneity in agent types (i.e., the lack of assumptions on ${ℓ_{i} (r)}_{i \in [K]}$ stronger than nonincreasing in $r$ ), makes a crisp characterization of the optimal solution seems elusive. Despite this, we derive structural properties that allow us to efficiently identify a solution to this a priori intractable problem. Thereafter, we show that our solution is asymptotically optimal for the stochastic system.
4.1. Structure and Computation of the Fluid Optimum

Theorem 2 first formalizes that it is possible to derive tractable optimal solutions based on a surprising structural property.

Theorem 2
For any instance of (\textsc{Fluid-Opt}), independent of $K$ and $| Ξ |$ , there exists an optimal solution $x^{⋆}$ such that $| supp (x^{⋆}) | \leq 2$ .

Theorem 2 presents a structural characterization of the fluid optimal solution which has far-reaching implications for the analytical and computational tractability of (\textsc{Fluid-Opt}). In particular, suppose we fix $r_{1}, r_{2} \in Ξ$ and assume $supp (x^{⋆}) = {r_{1}, r_{2}}$ , with $x$ being the weight placed on $r_{1}$ , and $1 - x$ the weight placed on $r_{2}$ . Then, (\textsc{Fluid-Opt}) becomes a one-dimensional problem in $x$ , and can be solved via the KKT conditions. This then implies that an optimal solution to the nonconvex problem (\textsc{Fluid-Opt}) can be found efficiently by exhaustively searching over $(\begin{matrix} | Ξ | \\ 2 \end{matrix})$ possible pairs of rewards. We provide some high-level intuition for the strategy used to prove this key structural result below and defer the proof of the theorem to Appendix EC.5.

Our proof technique is motivated by the following natural interpretation: a profit-maximizing solution must trade-off between recruiting enough agents in order to collect high revenue, all the while keeping costs relatively low. We disentangle these two competing effects by introducing a closely related budgeted supply maximization problem, which we term (Supply-Opt). At a high level, (Supply-Opt) simplifies (\textsc{Fluid-Opt}) by removing a degree of freedom: namely, how much the decision-maker can spend to retain agents. Given this added constraint, there are no longer two competing effects: the goal of the decision-maker is to simply recruit as many agents as possible. We then constructively show that, given an optimal solution to (Supply-Opt) that places positive probability mass on more than two rewards, there exists a “support reduction” procedure that takes three appropriately chosen rewards and allocates all of the weight on one reward to the other two. Crucially, this procedure is designed so that both the average reward paid out to agents and the total number of agents are maintained. We complete the proof by showing that if an optimal solution to (Supply-Opt) has a support of size no more than 2, it must be that the same holds for (\textsc{Fluid-Opt}).
4.2. Asymptotic Optimality of the Fluid Heuristic in the Stochastic System

We now analyze the performance of the fluid-based heuristic in our original stochastic system. Let $φ^{⋆}$ be the policy that draws rewards according to $x^{⋆}$ in every period. Proposition 5 first establishes that ${\tilde{Π}}^{*}$ is an upper bound on the profit of the optimal policy in the system parameterized by $θ$ .

Proposition 5
Let $v_{θ}^{⋆} = sup_{φ \in Φ} v_{θ} (φ)$ . $v_{θ}^{⋆} \leq {\tilde{Π}}^{}$ for all $θ \in N^{+}$ .

As a corollary of Propositions 1 and 5, we obtain the standard $O (\frac{1}{\sqrt{θ}})$ additive loss bound of the fluid heuristic. Under a mild additional technical condition, we further prove that $v_{θ} (φ^{⋆})$ actually converges to this upper bound at a linear* rate. Let $\bar{Λ} = \sum_{i} \frac{λ_{i}}{ℓ_{i} (r_{max})}$ and $\underline{Λ} = \sum_{i} \frac{λ_{i}}{ℓ_{i} (r_{min})}$ . Theorem 3 characterizes the performance of policy $φ^{⋆}$ relative to ${\tilde{Π}}^{}$ .
Theorem 3
Suppose $R$ is twice-continuously differentiable over $R_{> 0}$ , and that there exists a constant $α > 0$ such that $R^{″} (n) \geq - θ^{α}$ for all $n \in [\frac{1}{θ}, \bar{Λ})$ . Then, there exists a constant $C > 0$ such that $v_{θ} (φ^{⋆}) \geq {\tilde{Π}}^{} - \frac{C}{θ} .$

It is easy to check that “reasonable” concave functions such as $R (n) = n^{β}$ , $β \in (0, 1)$ , and $R (n) = \log (1 + n)$ satisfy the conditions of Theorem 3.

At a high level, our system distinguishes itself from many systems considered in the classical stochastic control literature due to the fact that it is “self-adjusting,” or, more specifically, “self-draining.” What we mean by this is that, even though $φ^{⋆}$ is a static policy, its effect on the system does in fact depend on the current state: the higher the number of agents in the system, the more agents depart from the system in the next period, on average. This directly follows from the fact that the mean of a Binomial distribution is increasing in the number of trials. Such a “self-draining” feature acts as self-regulation, preventing the system from being overloaded with agents who yield little marginal revenue relative to the reward paid out. On the other end of the spectrum, the lower bound on $R^{″}$ for small values of $x$ precludes exponentially steep functions, for which the revenue loss could be large when there are few agents in the system. A similar “self-adjustment” phenomenon leads to $O (\frac{1}{θ})$ convergence was identified in Cao et al. (2020), though for an entirely different setting.
4.2.1. Numerical Experiments

We conclude this section by complementing our theoretical results with numerical experiments. In particular, we illustrate the performance of $φ^{⋆}$ relative to two natural reward schemes a decision-maker may use: a deterministic payout in each period, and a lottery. We first describe the experimental setup.

Reward Schemes

We assume the revenue function is newsvendor-like: $R (x) = 100 min {x, 150}$ for $x \in R_{+}$ . We let $Ξ = {15, 16, \dots, 60}$ and consider three different reward schemes: (i) the fluid-based heuristic, (ii) a lottery with variance ${\hat{σ}}^{2}$ such that agents receive $μ$ in expectation, with $μ = \sum_{r} r x_{r}^{⋆}$ and $\hat{σ} = 10$ , and (iii) a fixed reward $r^{det}$ (specifically, the optimal fixed reward).

Agent Types

We let $K = 3$ , with $λ_{1} = λ_{2} = λ_{3} = 10 / 3$ , and consider the following departure probability functions, depicted in Figure EC.1a:

type 1 agents: $ℓ_{1} (r) = min {1, e^{α_{1} (- r + 15)}}$ , $α_{1} = 7 / 100$

type 2 agents: $ℓ_{2} (r) = - α_{2} r + β_{2}$ , $α_{2} = 1 / 45$ , $β_{2} = 4 / 3$

type 3 agents: $ℓ_{3} (r) = - α_{3} r^{2} + β_{3} r + γ_{3}$ , $α_{3} = (1 / 2025), β_{3} = 2 / 135, γ_{3} = 8 / 9$ .

Performance Metric

For each of the three reward schemes $φ$ described above, we compute ${\tilde{Π}}^{*} - v_{θ} (φ)$ , for $θ \in {1, \dots, 5 \cdot 10^{3}} .$

Results

We show the results (for 1,000 replications of simulations) in Figure EC.1b. These results illustrate the strong performance of the fluid heuristic for this particular problem instance. In particular, we indeed observe that the fluid heuristic converges to ${\tilde{Π}}^{*}$ , as predicted by our theoretical results. Moreover, the fluid policy outperforms the deterministic and lottery schemes, both of which have performance loss bounded away from zero, that is, they are not (asymptotically) optimal. We refer the reader to Appendix EC.6.3 for further investigations of the asymptotic convergence, including in moderately sized markets.

5. Special Cases

The tractable structure of the fluid optimum enables not only our numerical experiments, but also allows us to gain additional analytical insights into optimal policies under more structured departure probability models. We first investigate how the optimal policy depends on the convexity of the departure probability function, and subsequently consider the optimal policy when agents’ departure probability function is a type of softmax function.

5.1. Impact of Convexity of Departure Function

To understand the impact of the structure of the agents’ departure probability functions on the optimal reward scheme, we define the notions of convexity (resp., concavity) over the reward set, as well as the dispersion of a reward scheme.

Definition 3
The departure probability function of a type $i$ agent is strictly convex if:
$\begin{aligned} ℓ_{i} (r_{2}) & < ℓ_{i} (r_{1}) \cdot \frac{r_{2} - r_{3}}{r_{1} - r_{3}} + ℓ_{i} (r_{3}) (1 - \frac{r_{2} - r_{3}}{r_{1} - r_{3}}) \\ \forall r_{1} > r_{2} > r_{3} \in Ξ . \end{aligned}$
The departure probability function of a type $i$ agent is strictly concave if:
$\begin{aligned} ℓ_{i} (r_{2}) & > ℓ_{i} (r_{1}) \cdot \frac{r_{2} - r_{3}}{r_{1} - r_{3}} + ℓ_{i} (r_{3}) (1 - \frac{r_{2} - r_{3}}{r_{1} - r_{3}}) \\ \forall r_{1} > r_{2} > r_{3} \in Ξ . \end{aligned}$

Remark 2
At a high level, concave departure probabilities correspond to “risk-seeking” agents, since for these agents the expected departure probability from any convex combination of two rewards is higher than the departure probability from handing out the convex combination of these two rewards deterministically. An example of this would be people playing the lottery with minuscule odds of claiming the jackpot and (due to the cost of a ticket) a negative return in expectation. Conversely, convex types can be viewed as “loss averse,” as they are more likely to stay if paid out a reward deterministically. This behavior may be more prevalent when there is a significant cost to remaining active in a rewards program. For instance, the Chase Sapphire Reserve card costs $550 a year, and comes with a deterministic $300 travel credit (and a range of other benefits), that is renewed each year. There, we would assume that most customers prefer the certainty of their travel benefits over a small chance to win a more valuable travel credit.

We next introduce the notion of dispersion of a reward scheme.
Definition 4 (Maximal and minimal dispersion)

Consider any feasible solution $x$ to (\textsc{Fluid-Opt}), and suppose $supp (x) = {r, r^{'}}$ , with $r > r^{'}$ . $x$ is said to be maximally dispersed if $r = r_{max}$ and $r^{'} = r_{min}$ . $x$ is said to be minimally dispersed if $r$ and $r^{'}$ are consecutive rewards in $Ξ$ , or if $x = e_{r}$ for some $r \in Ξ$ .

The following proposition formalizes the intuitive idea that the dispersion of an optimal policy vastly differs depending on the convexity of $ℓ_{i} (\cdot)$ .

Proposition 6
If $ℓ_{i} (\cdot)$ is strictly convex for all $i \in [K]$ , then $x^{⋆}$ is minimally dispersed. On the other hand, $ℓ_{i} (\cdot)$ is strictly concave for all $i \in [K]$ , then $x^{⋆}$ is maximally dispersed.

Proposition 6 tells us that, if $ℓ_{i} (\cdot)$ is strictly convex for all types, in the limit where $Ξ$ is a compact convex set, it is optimal for the decision-maker to guarantee a fixed reward; in contrast, for types with strictly concave $ℓ_{i} (\cdot)$ , the decision-maker should run a lottery. The proof of Proposition 6 crucially relies on the two-reward structure of the optimal static solution derived in Section 4.1, thus highlighting its importance in terms of deriving insights into optimal compensation schemes for practical special cases of agent behavior.

We next numerically investigate whether these analytical insights when agents have homogeneous structure of departure probability functions port over to mixed populations.
5.1.1. Numerical insights

We consider a setting with two types of agents defined by the following departure probability functions:

ℓ_{1} (r) = e^{- a_{1} r}, a_{1} \in {0.5, 1, 2, 3, 4} and ℓ_{2} (r) = \frac{1 - e^{2 (r - 1)}}{1 - e^{- 2}} .

Figure EC.2 illustrates these departure probabilities.

The total arrival rate is normalized to 1, with $α$ fraction of arrivals in each period being type 1 (convex) agents, and $1 - α$ being type 2 (convex) agents. The revenue function is newsvendor-like, with $R (N) = 5 min {N, 5}$ . Finally, $Ξ = {0, 0.05, \dots, 1}$ .

Figure EC.3 shows the coefficient of variation of the optimal reward distribution for $α \in [0, 1]$ . Observe that the highest coefficient of variation occurs when $α = 0$ (i.e., all arrivals have concave departure probability function), in line with our theoretical result around maximal dispersion. Conversely, the lowest coefficient of variation occurs at $α = 1$ (i.e., all arrivals have a convex departure probability function). In between, the coefficient of variation is decreasing in the fraction of concave arrivals (modulo slight perturbations due to discretization). For $a_{1} \in {2, 3, 4}$ we observe a sharp phase transition: there exists a threshold fraction of convex arrivals past which the coefficient of variation of the optimal reward scheme is close to zero. As the convexity of $ℓ_{1}$ increases, this threshold decreases.

5.2. Linearized S-shape Departure Functions

While Section 5.1 considered the setting where types’ departure probability functions were either convex or concave, in this section, we study the setting in which agents’ departure probability functions that are similar to a softmax function. We define the notion of a “linearized S-shape” departure probability function below.

Definition 5 (Linearized S-shape)

Departure probability function $ℓ_{i} (\cdot)$ has a linearized S-shape if there exists $ϵ, v_{i} > 0$ such that

ℓ_{i} (r) = {\begin{cases} 1 & r < v_{i} - ϵ \\ \frac{1}{2} - \frac{r - v_{i}}{2 ϵ} & r \in [v_{i} - ϵ, v_{i} + ϵ] \\ 0 & r > v_{i} + ϵ \end{cases}

Remark 3
In Definition 5 we have relaxed the assumption that $ℓ_{i} (r_{max}) > 0$ for all $i \in [K]$ . This however is without loss of generality for the class of revenue functions we consider in this section, as for these functions the profit would go to $- \infty$ if the policy were to deterministically pay out any reward $r \geq v_{i} + ϵ$ .

Figure EC.4 illustrates these departure probabilities, for a fixed $v_{i} = 25$ and various values of $ϵ$ . At a high level, $ϵ$ can be seen as controlling agents’ tolerance of reward uncertainty. As $ϵ$ approaches 0, the above departure probability function mirrors that of an agent who, upon receiving reward $r$ , deterministically stays or leaves. As $ϵ$ goes to $\infty$ , we obtain $ℓ_{i} (r) \to \frac{1}{2}$ for all $r \in Ξ$ , and the decision to stay or leave becomes independent of the reward paid out; phrased differently, agents act independently of the reward they receive.

In the remainder of the section, for analytical tractability, our results pertain to the setting where $K = 1$ . However, our structural result regarding the fluid optimum allows us to gain numerical insights into settings with $K > 1$ , as we will see later on in the section. For ease of notation, we omit the dependence of all quantities on the agent type. We moreover assume that $Ξ$ is such that $v \in (r_{min}, r_{max})$ . Finally, we make the following assumption on the revenue function.
Assumption 1
$R$ is twice-continuously differentiable, with $R^{″} (x) < 0$ for all $x \in R_{+}$ .

The following proposition characterizes the optimal solution $x^{⋆} (ϵ)$ to (\textsc{Fluid-Opt}).
Proposition 7
Suppose $ϵ \leq min {v - r_{min}, r_{max} - v}$ . The optimal reward distribution $x^{⋆} (ϵ)$ to (\textsc{Fluid-Opt}) is a lottery that randomizes between $r_{min}$ and $v + ϵ$ . Let $x^{⋆} (ϵ)$ denote the weight that $x^{⋆} (ϵ)$ places on $v + ϵ$ . Then, $x^{⋆} (ϵ) < 1$ for all $ϵ > 0$ . Moreover, the following holds:
If $R^{'} (λ) \leq v$ , then $x^{⋆} (ϵ) = 0$ for all $ϵ > 0$ .

If $R^{'} (λ) > v$ , then:
for all $ϵ \leq R^{'} (λ) - v$ , $x^{⋆} (ϵ)$ is continuously decreasing in $ϵ$ , and

for all $ϵ > R^{'} (λ) - v$ , $x^{⋆} (ϵ) = 0$ .

We provide some intuition for the above result. $R^{'} (λ) - v$ can be interpreted as the marginal revenue the decision-maker obtains for an agent, in addition to the $λ$ arrivals per period, net of the opportunity cost of that additional agent as $ϵ$ approaches 0. Thus, when $R^{'} (λ) - v \leq 0$ , agents are too costly to keep in the system. On the other hand, when $R^{'} (λ) - v > 0$ , it is worthwhile to incentivize agents to stay on with some probability. However, this benefit decreases as agents make noisier decisions, and as a result, become more and more costly.

In the remainder of the section, we let $ϵ_{0} = R^{'} (λ) - v$ be the threshold past which it becomes too costly to keep agents in the system. We make the following mild assumption relating $ϵ_{0}$ , agents’ value $v$ , and the reward set $Ξ$ .
Assumption 2
$ϵ_{0} \leq min {v - r_{min}, r_{max} - v}$ .

On the one hand, $ϵ_{0} \leq r_{max} - v$ is a weak assumption that enforces that the maximum reward paid out by the decision-maker is at least the marginal revenue from the $(λ + 1)$ st agent in the system. The condition that $ϵ_{0} \leq min {v - r_{min}}$ can be viewed as a nontriviality condition: it simply imposes that $ϵ_{0}$ not be so large that the decision-maker finds it optimal to pay out the minimum possible reward with probability 1 and use the newly arriving agents “for free”.

Let $\hat{Π} (x^{⋆} (ϵ); ϵ)$ denote the decision-maker’s optimal profit as a function of $ϵ$ . We have the following fact.
Proposition 8
$\tilde{Π} (x^{⋆} (ϵ); ϵ)$ is nonincreasing in $ϵ$ .

At a high level, one would expect the decision-maker to be able to leverage the fact that agents are less sensitive to low rewards as $ϵ$ increases, and can spend less on average as a result. Proposition 8 invalidates this false intuition: as $ϵ$ increases, the fact that agents must be rewarded more frequently to guarantee they stay in the system counterbalances these potential gains, and as a result, profit cannot increase.

Figure EC.5 shows that these trends persist even in the presence of many types, for a newsvendor-like revenue function. In particular, in all settings considered, we observe that the profit exhibits a threshold structure, in which there exists $ϵ_{0}$ such that it decreases for $ϵ \leq ϵ_{0}$ , and remains constant for $ϵ > ϵ_{0}$ . The threshold $ϵ_{0}$ at which this occurs, however, depends on the specific arrival rates; the threshold decreases as the arrival rate of agents with higher values grows. This is intuitive as these agents are the most “expensive” to keep in the system. The extent to which this threshold depends on the type composition of arrivals, however, becomes more muted as $D$ grows large. This is because, for large enough $D$ , there aren’t enough cheap agents to satisfy all demand. As a result, the decision-maker must incentivize more expensive agents to stay on.
6. Conclusion

Our paper studies a decision-maker aiming to design fair incentive schemes when agents make stochastic participation decisions based on recent rewards. Fairness constraints in this setting lead to an a priori nonobvious obstacle for dynamic reward policies: when different agent types exhibit heterogeneous reactions to different rewards, dynamic policies can induce some types to be overrepresented in periods with lower rewards. Essentially, this is due to agents of these types self-selecting into those low-rewards periods based on the preceding high-reward periods. In the long run, this can result in bias in the reward distributions they experience. When dynamic policies are restricted to avoid such bias, we find that they offer no asymptotic benefit compared to static ones. Moreover, we prove, under a weak technical condition on the decision-maker’s revenue function, that the asymptotic benefits of dynamic policies vanish fast, as the static policy converges to the fluid upper bound at a linear rate. Finally, leveraging the two-reward structure of optimal solutions to the fluid upper bound, we derive insights into the type of policies that perform best for certain special cases of departure probability functions.

Supplemental Material

sj-pdf-1-pao-10.1177_10591478241273874 - Supplemental material for Fair Incentives for Repeated Engagement

Supplemental material, sj-pdf-1-pao-10.1177_10591478241273874 for Fair Incentives for Repeated Engagement by Daniel Freund and Chamsi Hssaine in Production and Operations Management

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Chamsi Hssaine

Supplemental Material

Supplemental material for this article is available online ().

Notes

How to cite this article

Freund D and Hssaine C (2025) Fair Incentives for Repeated Engagement. Production and Operations Management 34(1): 16–29.

References

Abdulaziz

Brehaut

Taljaard

, et al. (2015) National survey of physicians to determine the effect of unconditional incentives on response rates of physician postal surveys. BMJ Open 5(2): e007166.

Afèche

Araghi

Baron

(2017) Customer acquisition, retention, and service access quality: Optimal advertising, capacity level, and capacity allocation. Manufacturing & Service Operations Management 19(4): 674–691.

Aflaki

Popescu

(2014) Managing retention in service relationships. Management Science 60(2): 415–433.

Ahn

H-S

Righter

Shanthikumar

(2005) Staffing decisions for heterogeneous workers with turnover. Mathematical Methods of Operations Research 62(3): 499–514.

Allouah

Kroer

Zhang

, et al. (2022) Robust and fair work allocation. arXiv preprint arXiv:2202.05194.

Arlotto

Chick

Gans

(2014) Optimal hiring and retention policies for heterogeneous workers who learn. Management Science 60(1): 110–129.

Ascarza

Neslin

Netzer

, et al. (2018) In pursuit of enhanced customer retention management: Review, key issues, and future directions. Customer Needs and Solutions 5(1): 65–81.

Balseiro

Xia

(2022) Uniformly bounded regret in dynamic fair allocation. arXiv preprint arXiv:2205.12447.

Banerjee

Freund

(2024) Good prophets know when the end is near. Management Science. Forthcoming.

10.

Banerjee

Hssaine

Sinclair

(2023) Online fair allocation of perishable resources. ACM SIGMETRICS Performance Evaluation Review 51(1): 55–56.

11.

Barzilay

Ben-David

(2016) Platform inequality: Gender in the gig-economy. Seton Hall Law Review 47: 393.

12.

Bassamboo

Randhawa

(2016) Scheduling homogeneous impatient customers. Management Science 62(7): 2129–2147.

13.

Bastani

Harsha

Perakis

, et al. (2022) Learning personalized product recommendations with customer disengagement. Manufacturing & Service Operations Management 24(4): 2010–2028.

14.

Bateni

Chen

Ciocan

, et al. (2022) Fair resource allocation in a volatile marketplace. Operations Research 70(1): 288–308.

15.

Ben-Porat

Cohen

Leqi

, et al. (2022) Modeling attrition in recommender systems with departing bandits. In: Proceedings of the AAAI conference on artificial intelligence, Vancouver, 22 February–1 March, Vol. 36, pp. 6072–6079. Washington, DC: AAAI Publications.

16.

Cao

Kleywegt

Wang

(2020) Dynamic pricing for truckload transportation marketplaces. Available at SSRN 3700227.

17.

Cohen

Elmachtoub

Lei

(2022) Price discrimination with fairness constraints. Management Science 68(12): 8536–8552.

18.

Cohen

Miao

Wang

(2021) Dynamic pricing with fairness constraints. Available at SSRN 3930622.

19.

Corbett-Davies

Pierson

Feller

, et al. (2017) Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, Nova Scotia, 13–17 August 2017, pp. 797–806. New York, NY: ACM.

20.

De Bruecker

Van den Bergh

Beliën

, et al. (2015) Workforce planning incorporating skills: State of the art. European Journal of Operational Research 243(1): 1–16.

21.

Downs

Bardin

McsFarland

(2014) Modeling the dynamics of incentives in community drug distribution programs. Trends in Parasitology 30(7): 317–319.

22.

Dwork

Hardt

Pitassi

, et al. (2012) Fairness through awareness. In: Proceedings of the third innovations in theoretical computer science conference, Cambridge, MA, 8–10 January 2012, pp. 214–226. New York, NY: ACM.

23.

Freund

Lykouris

Paulson

Sturt

Weng

(2023) Group fairness in dynamic refugee assignment. arXiv preprint arXiv:2301.10642.

24.

Frey

Gallus

(2017) Volunteer organizations: Motivating with awards. In: Ranyard

(ed) Economic Psychology. John Wiley& Sons, 273–286.

25.

Furman

Diamant

Kristal

(2021) Customer acquisition and retention: A fluid approach for staffing. Production and Operations Management 30(11): 4236–4257.

26.

Gables Residential (2023) CAPTiVATE! Gables Loyalty Rewards. https://www.gables.com/captivate/ (accessed 15 July 2024).

27.

Gallo

(2014) The value of keeping the right customers. Harvard Business Review 29(10): 304–309.

28.

Gans

Zhou

Y-P

(2002) Managing learning and turnover in employee staffing. Operations Research 50(6): 991–1006.

29.

Grinold

(1976) Manpower planning with uncertain requirements. Operations Research 24(3): 387–399.

30.

Hall

Porteus

(2000) Customer service competition in capacitated systems. Manufacturing & Service Operations Management 2(2): 144–165.

31.

Heckert

Droste

Adams

, et al. (2002) Gender differences in anticipated salary: Role of salary estimates for others, job characteristics, career paths, and job inputs. Sex Roles 47(3): 139–151.

32.

T-H

Park

Y-H

Zhou

Y-P

(2006) Incorporating satisfaction into customer value analysis: Optimal investment in lifetime value. Marketing Science 25(3): 260–277.

33.

Lavieri

Toriello

Liu

(2016) Strategic health workforce planning. IIE Transactions 48(12): 1127–1138.

34.

Jaillet

Loke

Sim

(2022) Strategic workforce planning under uncertainty. Operations Research 70(2): 1042–1065.

35.

Kimmel

Troxel

Loewenstein

, et al. (2012) Randomized trial of lottery-based incentives to improve warfarin adherence. American Heart Journal 164(2): 268–274.

36.

Kleinberg

Ludwig

Mullainathan

, et al. (2018) Algorithmic fairness. AEA Papers and Proceedings 108: 22–27.

37.

Kumar

Durham

Lane

, et al. (2022) Randomized control trial of unconditional versus conditional incentives to increase study enrollment rates in participants at increased risk of lung cancer. Journal of Clinical Epidemiology 141: 11–17.

38.

Landry

Lange

List

, et al. (2006) Toward an understanding of the economics of charity: Evidence from a field experiment. The Quarterly Journal of Economics 121(2): 747–782.

39.

Lemmens

Gupta

(2020) Managing churn to maximize profits. Marketing Science 39(5): 956–973.

40.

Liu

Shang

(2007) Dynamic competitive newsvendors with service-sensitive demands. Manufacturing & Service Operations Management 9(1): 84–93.

41.

Manshadi

Niazadeh

Rodilitz

(2023) Fair dynamic rationing. Management Science 69(11): 6818–6836.

42.

Meeker

Knight

Childress

, et al. (2021) Combining a lottery incentive with protection against losing the lottery improves exercise adherence. Behavioral Science & Policy 7(1): 27–38.

43.

Mehrabi

Morstatter

Saxena

, et al. (2021) A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 54(6): 1–35.

44.

Ovchinnikov

Boulu-Reshef

Pfeifer

(2014) Balancing acquisition and retention spending for firms with limited capacity. Management Science 60(8): 2002–2019.

45.

Rodgers

Shook

(1986) The IBM Way: Insights Into the World’s Most Successful Marketing Organization. New York: HarperTrade.

46.

Rosoff

Werner

Clipp

, et al. (2005) Response rates to a mailed survey targeting childhood cancer survivors: A comparison of conditional versus unconditional incentives. Cancer Epidemiology and Prevention Biomarkers 14(5): 1330–1332.

47.

Salem

Gupta

Kamble

(2021) Taming wild price fluctuations: Monotone stochastic convex optimization with bandit feedback. arXiv preprint arXiv:2103.09287.

48.

Sen

Sewell

Riley

, et al. (2014) Financial incentives for home-based health monitoring: A randomized controlled trial. Journal of General Internal Medicine 29(5): 770–777.

49.

Sinclair

Jain

Banerjee

, et al. (2023) Sequential fair allocation: Achieving the optimal envy-efficiency trade-off curve. Operations Research 71(5): 1689–1705.

50.

Stevens-Simon

O’Connor

Bassford

(1994) Incentives enhance postpartum compliance among adolescent prenatal patients. Journal of Adolescent Health 15(5): 396–399.

51.

Stiftung Warentest (2004) Kombination von Zins und Lotterie. https://www.test.de/Postbank-Gewinn-Sparen-Kombination-von-Zins-und-Lotterie-1171754-0/ (accessed 15 July 2024).

52.

Volpp

John

Troxel

, et al. (2008) Financial incentive-based approaches for weight loss: A randomized trial. JAMA 300(22): 2631–2637.

53.

Volpp

Loewenstein

Troxel

, et al. (2008) A test of financial incentives to improve warfarin adherence. BMC Health Services Research 8(1): 1–6.

54.

Young

Bedford

das Nair

, et al. (2020) Unconditional and conditional monetary incentives to increase response to mailed questionnaires: A randomized controlled study within a trial (swat). Journal of Evaluation in Clinical Practice 26(3): 893–902.

55.

Young

O’Halloran

McAulay

, et al. (2015) Unconditional and conditional incentives differentially improved general practitioners’ participation in an online survey: Randomized controlled trial. Journal of Clinical Epidemiology 68(6): 693–697.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.91 MB

Fair Incentives for Repeated Engagement

Abstract

Keywords

1. Introduction

1.1. Summary of Contributions

1.1.1. Structure of the Paper

1.2. Related Work

1.2.1. Workforce Capacity Planning

1.2.2. Customer Retention

1.2.3. Empirical Work on Effectiveness of Monetary Incentives

1.2.4. Algorithmic Fairness

2. Preliminaries

2.1. Basic Setup

2.1.1. Agents

Remark 1 The exogenous arrival assumption is for ease of exposition. In Appendix EC.4.1 we show that our main result applies to an endogenous entry model in which agents choose to enter the system by comparing their long-run average earnings to a type-dependent reservation value. 2.1.2. Periods

2.1.3. Objective

2.2. Large-Market Regime

2.2.1. Deterministic Relaxation of the Stochastic System

2.3.1. Memoryless Agents

2.3.2. Time-invariance of the Revenue Function

2.3.3. Unconditional Versus Conditional Incentives

3. Optimal Policies via the Deterministic Relaxation

Definition 1 (Group-fair policy)

Reward Schemes

Agent Types

Performance Metric

Results

5. Special Cases

5.1. Impact of Convexity of Departure Function

5.2. Linearized S-shape Departure Functions

Definition 5 (Linearized S-shape)

Supplemental Material

sj-pdf-1-pao-10.1177_10591478241273874 - Supplemental material for Fair Incentives for Repeated Engagement

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Supplemental Material

Notes

How to cite this article

References

Supplementary Material

Remark 1
The exogenous arrival assumption is for ease of exposition. In Appendix EC.4.1 we show that our main result applies to an endogenous entry model in which agents choose to enter the system by comparing their long-run average earnings to a type-dependent reservation value.
2.1.2. Periods