Optimal policy for managing stochastic cash flows in a financial supply chain

Abstract

Billions of dollars are exchanged between companies through accounts payable in today's business landscape. Given its immense scale and critical role in business operations, effectively managing accounts payable and working capital is essential for organizations. In this study, we examine the financial supply chain problem, where a company seeks to minimize total payments toward accounts payable received from its upstream partners (e.g., suppliers) while leveraging cash inflows from downstream partners (e.g., distributors, wholesalers, retailers, and customers). This is accomplished by optimizing payment decisions based on payment terms and capitalizing on interest gains from cash on hand over time. Unlike prior studies, we investigate this problem in a more realistic setting where information about incoming invoices and cash inflows is uncertain. We formulate the problem as a stochastic dynamic program and identify the structural properties of an optimal policy. The optimal policy reveals payment priorities among invoices and establishes thresholds for maintaining cash on hand. We further find that payment priorities can be deterministic or stochastic, depending on the problem's state and random parameters. Additionally, we identify all instances where payment priorities are deterministic. Our study provides valuable managerial insights and practical implications derived from the structural properties of the optimal policy. Notably, some of these insights challenge well-known heuristics and seemingly intuitive practices. Lastly, we develop a simple heuristic based on the identified structural properties and demonstrate that it outperforms other widely used methods for solving large-scale practical problems.

Keywords

Financial Supply Chain Cash Flows Accounts Payable Payments Stochastic Dynamic Programming Structural Properties of an Optimal Policy

1. Introduction

Supply chains have been a focal point of study in operations management, with growing attention to the financial dimension—the flow of money that directly affects the financial performance of supply chain parties (Kouvelis et al., 2006; Pfohl and Gomm, 2009).

One of the research initiatives focusing on the financial aspects of supply chains explores the optimal management of money flow, similar to the control of material or information flow within a supply chain (Gupta and Dutta, 2011; Gupta et al., 1987; Ng et al., 2012; Rios-Solis et al., 2017). For instance, consider a simple three-layer supply chain with a company positioned in the middle layer. The company settles multiple invoices from its upstream partners (suppliers, vendors) using incoming cash inflows received from downstream partners (distributors, retailers, wholesalers, customers). Invoices come with payment terms that include discount and penalty conditions, and unused cash on hand generates interest gains. The company aims to minimize total payments by strategically timing and sizing invoice payments. We follow Gupta and Dutta (2011) in calling this the financial supply chain problem.

Prior studies of this problem (Gupta and Dutta, 2011; Gupta et al., 1987; Ng et al., 2012; Rios-Solis et al., 2017) have assumed that all invoices and cash inflows are known in advance—a deterministic setting that yields foundational structural insights but does not reflect the dynamic uncertainty firms face in practice. In reality, purchase orders and incoming cash arrive stochastically over time (Berling and Martínez-de-Albéniz, 2011; Katehakis et al., 2016). We address this gap by studying the Stochastic Dynamic Financial Supply Chain Problem (SDFSCP).

In SDFSCP, a firm receives a stream of invoices with discount and penalty terms while facing uncertain cash inflows and interest on cash holdings. At each decision epoch, the firm chooses which invoices to pay (and by how much, subject to preemptive or non-preemptive rules) to minimize total payment over a finite horizon, where unpaid invoices are valued at their outstanding balances at the end of the horizon. The setting jointly features time-variant resource capacity (cash that evolves with inflows and interest) and time-variant processing requirements (invoice balances that change with discounts and penalties) under uncertainty.

To underscore the practical importance of studying SDFSCP, we now give a stylized illustrative example and point to company evidence showing how naïve payment rules can be costly.

Because the problem is complex, practitioners often rely on simple heuristics such as snowball or avalanche (Merritt, 2024), which are known to be far from optimal (Rios-Solis et al., 2017). Consider three invoices I₁, I₂, and I₃ with terms in Table 1. The firm receives $350 per week, and unused cash earns 0.1% weekly interest. The snowball rule (ascending payment amount) pays I₂ in week 4 ($1,400, discounted), I₁ in week 14 ($3,200, normal), and I₃ in week 28 ($5,000, normal), yielding a net-present total of $9411.96—34% above the optimal $7020.77. The avalanche rule (descending penalty rate) pays I₃ in week 10 ($3500), I₁ in week 20 ($3200), and I₂ in week 25 ($2000), which is 22% above optimal. Thus, even in a small instance, intuitive heuristics can perform poorly.

Table 1.
Payment terms of the three invoices.

Payment term I₁ I₂ I₃

Face value $3200 $2000 $5000

Discounted value (30%) $2240 $1400 $3500

Penalty rate per week 0.4% 0.3% 0.5%

Discount period due (discount ends after this) 10 15 22

Payment due (penalty accrues after this) 22 30 40

Payment term	I₁	I₂	I₃
Face value	$3200	$2000	$5000
Discounted value (30%)	$2240	$1400	$3500
Penalty rate per week	0.4%	0.3%	0.5%
Discount period due (discount ends after this)	10	15	22
Payment due (penalty accrues after this)	22	30	40

Note: 1. The structure of payment terms follows prior literature (Gupta & Dutta, 2011; Gupta et al., 1987; Ng et al., 2012; Rios-Solis et al., 2017). 2. The discounted value is paid if an invoice is paid before or at its discount payment due date. If the invoice is paid after the discount period but before or at the payment due date, the face value must be paid. If the invoice is paid after the payment due date, the payment amount is subject to weekly compounded penalty rates, leading to an increased total. For example, if I₁ is paid before or at week 10, $2240 is needed to pay off the invoice; however, if it is paid after week 10 but before or at week 22, it must be paid at its face value, $3200. If it is paid after week 22, for instance, at week 24, it must be paid at the increased total of $3225.65 (=$3200*(1 + 0.004)²).

Company financial statements underscore the stakes. Accounts payable (AP) balances are sizable in absolute terms and relative to assets or retained earnings (Table 2). In 2022, Amazon reports $79.6B in AP (17.2% of total assets; 95.7% of retained earnings), Tesla $15.3B (18.6% of assets; 118.6% of retained earnings), Intel $9.6B (5.3% of assets), Alphabet $5.1B (1.4%), and Macy's $2.9B (16.5%). Given these magnitudes, AP policies materially affect costs and liquidity. These practical stakes motivate the modeling approach below and guide our main results.

Table 2.

Size of accounts payable (in billions, in 2022).

Company	Accounts payable	Total asset	Retained earnings
Amazon	79.6	462.7 (17.2%)	83.2 (95.7%)
Tesla	15.3	82.3 (18.6%)	12.9 (118.6%)
Intel	9.6	182.1 (5.3%)	70.4 (13.6%)
Alphabet	5.1	365.3 (1.4%)	195.6 (2.6%)
Macy's	2.9	17.6 (16.5%)	6.3 (46.0%)

Note: Percentages in parentheses represent the ratios of accounts payable to total assets and retained earnings. Company data is extracted from 10-K reports.

We formulate SDFSCP as a finite-horizon stochastic dynamic program and characterize structural properties of the optimal policy, including threshold-type conditions and priority ordering rules (deterministic and stochastic), with an at-most-one-split property. Leveraging these results, we design a simple heuristic that mimics the structure of the optimal policy and runs in O(nlogn) time. Extensive experiments across cash tightness, inflow skewness, and preemptive/non-preemptive settings show that the heuristic consistently and significantly outperforms widely used benchmarks (e.g., snowball and avalanche).

Our contributions are: (i) to our knowledge, the first dynamic stochastic formulation of SDFSCP that jointly models time-varying cash capacity and time-dependent invoice requirements; (ii) new structural properties—threshold conditions, priority orderings, and an at-most-one-split property—that clarify optimal behavior; (iii) a scalable, simple, structure-guided heuristic with both preemptive and non-preemptive variants; (iv) an empirical evaluation demonstrating robustness across realistic and worst-case regimes and offering actionable guidance; and (v) methodological insights that extend beyond SDFSCP to dynamic job assignment and stochastic scheduling with time-dependent resource requirements and time-dependent resource capacity.

2. Related literature

2.1. Supply chain finance, working capital, and trade credits

Supply chain finance—encompassing trade credits, buyer-led finance, and third-party financing—has been extensively studied (Babich and Kouvelis, 2018; Kouvelis, 2023). Trade credits in particular have received substantial attention from both economics and operations management perspectives, with research examining their rationale (Burkart and Ellingsen, 2004; Petersen and Rajan, 1997) and their strategic use in supply chains (Astvansh and Jindal, 2022; Kouvelis and Zhao, 2018; Lee et al., 2018; Wu et al., 2019a).

This study addresses a financial supply chain management problem closely related to trade credit, often interchangeably referred to as AP. While prior work has focused on the structure of trade credits (Kouvelis and Zhao, 2018; Wu et al., 2019a) and their supply chain impacts (Lee et al., 2018), this study centers on the operational payment decisions themselves: how a company strategically settles AP to minimize costs by leveraging payment terms and interest on unused cash.

2.2. Financial supply chain problem: multiple accounts payable optimization

Despite the extensive literature on trade credits and supply chain management, there is limited research addressing the financial supply chain problem that optimizes a company's payment decisions for multiple AP. Gupta et al. (1987) is the first study that introduces this financial supply chain problem. They define the problem, examine some unique characteristics of optimal payment strategies, and propose a branch and bound algorithm to solve the problem optimally. Although Gupta et al. (1987) open up this new problem category, their optimal solution approach is limited due to the computational complexity of the problem. Gupta and Dutta (2011) extend the work of Gupta et al. (1987). They show that the financial supply chain problem belongs to the class of NP-hard problems, provide some heuristics, and show that the heuristics perform well in a certain range of problem instances using computational experiments. Ng et al. (2012) address a simpler version of the financial supply chain problem by assuming a constant rate of cash inflows and no interest gains of cash on hand over time. They formulate this simplified version as a single machine scheduling problem, convert it into a continuous non-linear optimization problem, and provide an approximate solution using linear programming approximation. Rios-Solis et al. (2017) examine the financial supply chain problem from a more practical perspective. They conduct a computational experiment to show that widely-used practical heuristics, including the highest interest debt method, where invoices are paid in descending order of their penalty rates, and the debt snowball method, where invoices are paid in descending order of their invoice amounts, are far from the optimal solution.

Similar problems have been studied. For the two-layer supply chain, Devalkar and Krishnan (2019) consider the working capital financing problem with cash flows between the bank, supplier, and buyer. Similarly, Peng and Zhou (2019) deal with the optimal deploying of working capital in a supply chain with one supplier and one retailer who faces uncertain demand for maximization of the profits. Huang (2022) also studies dyadic supply chain financing, considering advance payment with a tailored discount rate and an extended payment timeline for the balance due. With consideration of a discount rate, Wu et al. (2019b) examine the influence of three supply chain finance schemes: early payment, delayed payment, and reverse factoring on the financial performance of the supplier and retailer. Semaa et al. (2020) study the deterministic version of the financial supply chain for a three-layer supply chain with supplier and customer invoices. The invoices have a given payment date and the corresponding discount and penalty rates for early and late payments, respectively. In the same vein, Zhu et al. (2022) investigate the optimum operational schedule and accounts receivable financing in a supply chain.

The above studies collectively establish the deterministic structure of the problem, but all assume invoices and cash inflows are known in advance. Gupta and Dutta (2011) partially relax this by sketching a stochastic extension, but stop short of a full stochastic formulation and do not characterize an optimal policy. Our work closes this gap: we formulate SDFSCP as a stochastic dynamic program, characterize structural properties of the optimal policy, and propose a structure-guided heuristic.

2.3. Related problems in stochastic dynamic optimization

Related problems exist in the literature. One example is the stochastic dynamic job assignment problem, as studied by Akcay et al. (2010) and Kang et al. (2016). These problems are similar to SDFSCP in that they involve dynamic decision-making to allocate resources (cash on hand in SDFSCP) to complete incoming jobs (invoices in SDFSCP). While both problems address dynamic decisions under uncertainty regarding incoming tasks and resource availability, they differ significantly in several respects. One key distinction is that, unlike the job assignment problem, where the maximum resource capacity is fixed, in SDFSCP, the level of cash on hand—analogous to resource capacity—is dynamic and may increase or decrease over time. Additionally, while the resource required to complete a given job in the job assignment problem is fixed, in SDFSCP, the cash required to pay off an invoice is not fixed but increases over time due to payment terms involving discounts and penalties.

Another related problem is the scheduling problem with time-dependent process times (e.g., Cheng et al., 2004). A simplified version of the financial supply chain problem can be conceptualized as a scheduling problem where job processing times increase as a function of their start times, often taking a specific non-linear form (Ng et al., 2012). However, there are fundamental differences between these two problems. In scheduling problems, machine capacity remains constant or unaffected by idle periods. By contrast, in the financial supply chain problem, cash on hand—analogous to machine capacity—increases over time through interest gains if left unused or fluctuates based on dynamic cash inflows. Prior studies have also explored stochastic scheduling problems in queuing systems, which assume dynamic and stochastic incoming jobs for scheduling (e.g., Pinedo, 1983; Shanthikumar and Yao, 1992).

In both dynamic job assignment and scheduling problems, while not identical, settings similar to SDFSCP can be observed. For instance, in manufacturing and service scheduling, delays in processing jobs can increase resource requirements due to degradation or complications (Browne and Yechiali, 1990; Mosheiov, 1994; Oron, 2014). Conversely, server capacities often improve after periods of inactivity due to recovery mechanisms. Workers regain efficiency with rest, machines perform better after cooling or maintenance, and battery-powered systems recharge during idle times, enhancing overall performance.

However, to the best of our knowledge, no prior research directly addresses the specific problem examined in this study—one that considers both time-variant capacity and time-variant processing time under uncertainty while analyzing the structural properties of its optimal policy. Our work, therefore, goes beyond application-specific insights. Methodologically, we introduce a dynamic stochastic programming formulation that explicitly incorporates both evolving capacity (cash inflows and reserves) and time-dependent processing requirements (invoice balances with discounts and penalties). We characterize structural properties of the optimal policy, including threshold-type decision rules and priority orders that we classify into deterministic and stochastic types. Finally, we design a heuristic that leverages these properties, demonstrating how structural analysis can inform practical, implementable decision rules. Taken together, these methodological innovations extend beyond the financial supply chain context and contribute to the broader literature on dynamic job assignment and stochastic scheduling, where problems with both stochastic arrivals and time-dependent resource requirements are increasingly relevant.

3. Problem definition and formulation

In this section, we formally define SDFSCP as a finite-horizon, discrete-time, stochastic, dynamic program. All notations are explained whe3.n they are first introduced. We also provide a summary of the main notations and their descriptions in the e-companion. All proofs are also in the e-companion.

3.1. Stochastic dynamic financial supply chain problem (SDFSCP)

The decision horizon of SDFSCP is composed of T time intervals, each having a uniform length (e.g., day or week). Interval inv(t) is defined as [t, t + 1), where t = 0,1,2,…,T − 1. Within each interval, new invoices from upstream partners and cash inflows from downstream partners arrive at the end of each interval. At the start of each interval, the company uses available cash to decide which existing invoices to pay.

Consider an invoice k, that arrives at the end of inv(t − 1) (i.e., at time t). We set its availability time to $a_{k} = t$ . The invoice offers a discounted payment $L_{k}$ if paid by the end of the discount period $b_{k}$ ( $\geq a_{k}$ ). If paid after $b_{k}$ but no later than the due date $d_{k}$ ( $\geq b_{k}$ ), the amount becomes $L_{k} α_{k}$ , with $α_{k} \geq 1$ (the discount conversion rate). If paid after $d_{k}$ , a per-interval penalty factor $γ_{k}$ (≥1) applies, so payment at time $t_{pay} > d_{k}$ equals $L_{k} α_{k} (γ_{k})^{t_{pay} - d_{k}}$ . This payment term structure follows prior work (Gupta and Dutta, 2011; Gupta et al., 1987; Rios-Solis et al., 2017).

Cash on hand accrues interest at factor per interval r (≥1): cash c at time $t_{0}$ grows to $c r^{t_{1} - t_{0}}$ by $t_{1} (> t_{0})$ if unused. We assume $r < γ_{k}$ for all k to avoid degenerate incentives to delay payments beyond due dates. Following prior studies (Gupta and Dutta, 2011; Gupta et al., 1987; Ng et al., 2012; Rios-Solis et al., 2017), the company's objective is to minimize total payments over the horizon, valuing any unpaid invoices at their outstanding amounts at the terminal time.

3.2. State variables

For an invoice k with arrival time $a_{k} < t$ , define the remaining discount- and normal-period lengths at decision time $t > 0$ as

\begin{aligned} u_{k, t} = {\begin{array}{lc} b_{k} - t + 1 & i f t \leq b_{k} \\ 0 & i f t > b_{k} \end{array} \\ s_{k, t} = {\begin{array}{ll} d_{k} - b_{k} & i f t \leq b_{k} \\ d_{k} - t + 1 & i f b_{k} < t \leq d_{k} \\ 0 & i f t > d_{k} \end{array} \end{aligned}

u_{k, t} (\geq 0)

is the number of intervals remaining to pay the discounted amount

L_{k}

. Similarly,

s_{k, t} (\geq 0)

is the number of intervals remaining to pay the normal amount

L_{k} α_{k}

. Consequently,

u_{k, t} + s_{k, t}

is the total remaining no-penalty window. We bound these by integers B and D:

u_{k, t} \in {0, 1, 2, \dots, B}

and

s_{k, t} \in {0, 1, 2, \dots, D}

For notation and aggregation, we group invoices into types by their discount conversion rate $(α)$ and their late payment penalty rate $(γ)$ . Let $l \in {1, 2, \dots, I}$ index types, with parameters $α_{l}$ and $γ_{l}$ . $x_{l, t}^{u, s}$ denotes the total dollar amount of type-l invoice that has u discount intervals and s normal intervals at time t. Let $x_{t} = (x_{l, t}^{u, s})_{l = 1, 2, \dots, I; u = 0, 1, 2, \dots, B; \; and\; s = 0, 1, 2, \dots, D}$ . Let $β_{t}$ represent the amount of cash on hand at time t. The system state is $(x_{t}, β_{t})$ , where all $x_{l, t}^{u, s}, β_{t} \geq 0$ , capturing outstanding invoices and available cash.

3.3. Decision variables

Payment decisions are made at the beginning of each interval (D1 in Figure 1). Let $p_{l, t}^{u, s} \geq 0$ denote the payment applied at time t to invoice $x_{l, t}^{u, s}$ , and let $p_{t} = (p_{l, t}^{u, s})_{l, u, s}$ collect all payments. Following Powell (2007) and Kang et al. (2016), we use post-decision states. After applying $p_{t}$ , the post-decision state is $(y_{t}, λ_{t})$ , where $y_{l, t}^{u, s} = x_{l, t}^{u, s} - p_{l, t}^{u, s} \geq 0, λ_{t} = β_{t} - \sum_{l, u, s} p_{l, t}^{u, s} = β_{t} - \sum_{l, u, s} (x_{l, t}^{u, s} - y_{l, t}^{u, s}) \geq 0.$ Here, $y_{t} = (y_{l, t}^{u, s})_{l, u, s}$ and $λ_{t}$ represent the remaining invoices and cash balance immediately after payment. Consistent with prior work (Ng et al., 2012; Rios-Solis et al., 2017) and industry practice (McMillan, 2025; Resolve, 2025a), we assume preemptive payment with partial settlements allowed, that is, $0 \leq p_{l, t}^{u, s} \leq x_{l, t}^{u, s}$ .

Figure 1.

SDFSCP event diagram.

3.4. Random processes and transitions

Let $Q_{t + 1}$ denote the total cash inflow received during inv(t), and let $Z_{t + 1} = (Z_{l, t + 1}^{u, s})_{l, u, s}$ denote the vector of newly arriving invoices (E1 and E2 in Figure 1). Both become available at time t + 1. The state at t + 1, $(x_{t + 1}, β_{t + 1})$ , reflects new arrivals and updated existing quantities. Outstanding invoices $x_{t + 1}$ equal new arrivals plus prior unpaid balances. For existing invoices, the balance remains at the discounted level if still within the discount period (T1a in Figure 1), increases to the normal amount if the discount expires (T1b in Figure 1), or grows by the penalty factor if past the due date (T1c in Figure 1). The cash balance $β_{t + 1}$ equals unused cash from t, accrued with interest, plus $Q_{t + 1}$ (T2 in Figure 1). We denote this transition from the post-decision state $(y_{t}, λ_{t})$ by $(x_{t + 1}, β_{t + 1}) = Π (y_{t}, λ_{t}; Z_{t + 1}, Q_{t + 1}),$ where $Π (\cdot)$ captures transitions T1 and T2 in Figure 1.

Both $Q_{t + 1}$ and $Z_{t + 1}$ are unknown until realized. Following prior studies (e.g., Akçay et al., 2010; Kang et al., 2016), we do not impose parametric distributional assumptions on $Q_{t + 1}$ and $Z_{t + 1}$ ; our formulation accommodates general stochastic processes, provided that expectations appearing in the Bellman recursion are well-defined (e.g., finite first moments). These mild regularity conditions ensure that the value function and structural properties are properly defined while preserving general applicability across firms and settings. For clarity, we provide a concrete example of SDFSCP in the e-companion.

3.5. Stochastic dynamic program

We formulate SDFSCP as a stochastic dynamic program that minimizes expected total payments over the horizon, with payments valued in end-of-horizon units. A dollar paid at time t is scaled by $r^{T - t}$ . At T, unpaid invoices and unused cash are valued at their outstanding amounts. The value function satisfies the Bellman equation: $υ_{t} (x_{t}, β_{t}) = min_{(y_{t}, λ_{t})} {r^{T - t} \sum_{l, u, s} (x_{l, t}^{u, s} - y_{l, t}^{u, s}) + E [υ_{t + 1} (Π (y_{t}, λ_{t}; Z_{t + 1}, Q_{t + 1}))]}$

\begin{aligned} s . t . y_{l, t}^{u, s} \leq x_{l, t}^{u, s}, l = 1, 2, \dots, I, u = 0, 1, 2, \dots, \\ B and s = 0, 1, 2, \dots, D, \end{aligned}

(C1)

\sum_{l, u, s} y_{l, t}^{u, s} \geq \sum_{l, u, s} x_{l, t}^{u, s} - β_{t}, and

(C2)

\begin{aligned} y_{l, t}^{u, s} \geq 0, l = 1, 2, \dots, I, u = 0, 1, 2, \dots, B, and \\ s = 0, 1, 2, \dots, D \end{aligned}

(C3)

The first term of the objective function represents the end-of-horizon value of payments made at time t; the second term is the expected minimal payment from t + 1 to T. The first and third constraints ensure that the payment for an invoice cannot be less than zero and exceed the outstanding invoice amount, respectively. The second constraint enforces that the total amount of payment does not exceed the available cash on hand, which also implies $λ_{t} \geq 0$ .

For structural analysis, it is convenient to define the post-decision objective $ϕ_{t} (y_{t}, λ_{t}) = - r^{T - t} \sum_{l, u, s} y_{l, t}^{u, s} + E [υ_{t + 1} (Π (y_{t}, λ_{t}; Z_{t + 1}, Q_{t + 1}))],$ so that $υ_{t} (x_{t}, β_{t}) = r^{T - t} \sum_{l, u, s} x_{l, t}^{u, s} + min_{(y_{t}, λ_{t})} ϕ_{t} (y_{t}, λ_{t}) .$ Let $(y_{t} * (x_{t}, β_{t}), λ_{t} * (x_{t}, β_{t}))$ represent an optimal payment decision given state $(x_{t}, β_{t})$ . Then an optimal policy for SDFSCP can be found by solving the following optimization P1 for all $(x_{t}, β_{t})$ and at every time t.

\begin{matrix} ((C1)--(C3)) & P 1 : (y_{t} * (x_{t}, β_{t}), λ_{t} * (x_{t}, β_{t})) = \underset{(y_{t}, λ_{t})}{argmin} ϕ_{t} (y_{t}, λ_{t}) s . t . \end{matrix}

Computing an exact optimal policy for SDFSCP is computationally prohibitive due to the curse of dimensionality (Powell, 2007). Accordingly, we focus on characterizing structural properties of the optimal policy and the managerial insights they imply. These properties provide a foundation for efficient, implementable heuristics; leveraging them, we propose a new heuristic for SDFSCP.

4. Structural properties of an optimal policy

To characterize the structural properties of an optimal policy, we analyze the objective function of P1, $ϕ_{t} (y_{t}, λ_{t})$ , and establish three key properties: convexity, subconvexity, and the directional property. Convexity guarantees the existence of an optimal policy (Section 4.1). Subconvexity provides insight into how much payment should be made and how much cash should be reserved for future periods (Section 4.2.1). The directional property identifies which invoice should be prioritized for payment (Section 4.2.2).

4.1. Existence of an optimal policy

The constraint sets (C1)–(C3) in P1 form a closed and bounded convex set. Hence, if the objective function, $ϕ_{t} (y_{t}, λ_{t})$ , is continuous and jointly convex in $(y_{t}, λ_{t})$ , then there always exists an optimal policy by the convex optimization theory (Rockafellar, 1970).

Lemma 1.
$ϕ_{t} (y_{t}, λ_{t})$ is continuous and jointly convex in $(y_{t}, λ_{t})$ for $t = 0, 1, 2, \dots, T$ .

Lemma 1 proves the existence of an optimal policy of SDFSCP.
4.2. Structure of an optimal policy in two-dimensional spaces

We examine the structural properties of an optimal policy. Because the problem's state spaces are multidimensional, characterizing the global structure of an optimal policy is challenging (Zhuang and Li, 2012). Following prior work (Kang et al., 2016; Zhuang and Li, 2012), we address this difficulty by first analyzing the two-dimensional cases (Sections 4.2.1 and 4.2.2). Insights gained there are then used to inform and characterize the structure of the optimal policy in the original multidimensional setting (Section 4.3).

4.2.1. Optimal policy in $(x_{l, t}^{u, s}, β_{t})$

Consider a firm that holds a single outstanding invoice and some cash on hand. The central question is simple: how much of the available cash should be used to pay the invoice now, and how much should be kept in reserve for future obligations? Paying more now reduces the invoice balance and avoids potential penalties, but it leaves less cash available for upcoming invoices or cash shortfalls. Keeping more cash in reserve preserves flexibility but may increase the cost of the current invoice over time. The optimal policy must strike the right balance between these two competing pressures.

This trade-off has a clean geometric interpretation. In the two-dimensional space $(x_{l, t}^{u, s}, β_{t})$ , where the first coordinate is the remaining invoice balance and the second is the remaining cash, any feasible payment moves the state along a 45° line: one dollar paid toward the invoice reduces both the invoice balance and cash on hand by exactly one dollar. The optimal decision is the point on this 45° line that minimizes total expected future cost. As illustrated in Figure 2(a), this optimal point traces a downward-sloping curve (the dashed line) that partitions the state space into two regions: region (A), where it is optimal to make a payment and move toward the curve, and region (B), where it is optimal to retain cash and make no payment. To characterize this structure formally, we use subconvexity, which captures how the optimal point on a 45° line shifts as the state changes.

Figure 2.

Structure of the $(x_{l, t}^{u, s}, β_{t}) - policy$ and the $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}; δ) - policy$ .

We now formalize this structure. We introduce subconvexity (Koole, 1998; Zhuang and Li, 2012) and its monotone structure, represented by the notation $({\vec{x}}_{1}^{*} (x^{1}), {\vec{x}}_{2}^{*} (x^{1}))$ (Lemma 2). We then show that the objective function of P1, $ϕ_{t} (y_{t}, λ_{t})$ , is subconvex (Theorem 3), which yields the key structural property illustrated in Figure 2(a).

Definition

(subconvexity). A real function $f (x_{1}, x_{2})$ on $R^{2}$ is subconvex in $(x_{1}, x_{2})$ if, for every $(x_{1}, x_{2}) \in R^{2}$ and all $δ \geq 0$ , $δ_{1} \geq 0$ , the following two inequalities hold:

\begin{aligned} f (x_{1}, x_{2}) + f (x_{1} + δ, x_{2} + δ + δ_{1}) \geq f (x_{1}, x_{2} + δ_{1}) \\ + f (x_{1} + δ, x_{2} + δ) \end{aligned}

\begin{aligned} f (x_{1}, x_{2}) + f (x_{1} + δ + δ_{1}, x_{2} + δ) \geq f (x_{1} + δ_{1}, x_{2}) \\ + f (x_{1} + δ, x_{2} + δ) \end{aligned}

Notation.

For a given base point $x^{1} = (x_{1}^{1}, x_{2}^{1}) \in R^{2}$ define

({\vec{x}}_{1}^{*} (x^{1}), {\vec{x}}_{2}^{*} (x^{1})) = \underset{(x_{1}, x_{2})}{argmin} f (x_{1}, x_{2}) s .t . x_{2} = (x_{1} - x_{1}^{1}) + x_{2}^{1} .

That is, $({\vec{x}}_{1}^{*} (x^{1}), {\vec{x}}_{2}^{*} (x^{1}))$ is a minimizer of f on the 45° line passing through $x^{1}$ . If f is subconvex on $R^{2}$ , the minimizers on such 45° lines satisfy the following structural monotonicity.

Lemma 2

(Monotone property of subconvexity) . Let f be subconvex on $R^{2}$ . For any $ε > 0$ and two points $x^{1} = (x_{1}^{1}, x_{2}^{1})$ and $x^{2} = (x_{1}^{2}, x_{2}^{2})$ satisfying $x_{1}^{2} - x_{2}^{2} + ε = x_{1}^{1} - x_{2}^{1}$ , their corresponding minimizers on the 45° line through these points obey ${\vec{x}}_{1} * (x^{1}) - ε \leq {\vec{x}}_{1} * (x^{2}) \leq {\vec{x}}_{1} * (x^{1})$ and ${\vec{x}}_{2} * (x^{1}) + ε \geq {\vec{x}}_{2} * (x^{2}) \geq {\vec{x}}_{2} * (x^{1})$ .

Lemma 2 establishes monotonicity of $({\vec{x}}_{1}^{*} (x_{1}, x_{2}), {\vec{x}}_{2}^{*} (x_{1}, x_{2}))$ with respect to the intercept $x_{2} - x_{1}$ . Concretely, when $x_{2} - x_{1}$ increases by one unit, ${\vec{x}}_{1}^{*} (x_{1}, x_{2})$ decreases and ${\vec{x}}_{2}^{*} (x_{1}, x_{2})$ increases, each by at most one unit. The locus of points $({\vec{x}}_{1}^{*} (x_{1}, x_{2}), {\vec{x}}_{2}^{*} (x_{1}, x_{2}))$ therefore traces a downward-sloping curve in the $(x_{1}, x_{2})$ plane (see the dashed lines in Figure 2(a) and Figure OA6(b) in the e-companion). This structural result describes optimal decisions that correspond to diagonal moves at 45° in the two-dimensional state space: consuming one unit of a resource to complete one unit of a task. Such diagonal assignment decisions commonly arise in resource-task assignment problems (Kang et al., 2016), and Lemma 2 quantifies how the optimal assignment shift as the line intercept $x_{2} - x_{1}$ changes.

Subconvexity (and the related notion of the directional property used in Lemma 4) is closely related to the more familiar concepts of supermodularity (and its dual, submodularity), which are widely used in optimization to characterize monotone behavior of optimal solutions (Kang et al., 2016; Powell, 2007). Whereas supermodularity describes how an optimal solution varies with respect to an individual variable (e.g., in $x_{1}$ or in $x_{2}$ ), subconvexity (or the directional property) characterizes how the optimal solution varies with respect to the line intercept (e.g., in $x_{2} - x_{1}$ or $x_{2} + x_{1}$ ). Additional discussion and formal comparisons of these concepts are provided in the e-companion.

Theorem 3 establishes the subconvexity of the two-dimensional objective function for t = 1, 2, …, T.

Let $(\vec{x} {_{l, t}^{u, s}}^{*} (x_{l, t}^{u, s}, β_{t}), {\vec{β}}_{t}^{*} (x_{l, t}^{u, s}, β_{t})) = \underset{(y_{l, t}^{u, s}, λ_{t})}{argmin} ϕ_{t} (y_{l, t}^{u, s}, λ_{t})$ s.t. $λ_{t} = (y_{l, t}^{u, s} - x_{l, t}^{u, s}) + β_{t}$ . Then, $(\vec{x} {_{l, t}^{u, s}}^{*} (x_{l, t}^{u, s}, β_{t}), {\vec{β}}_{t}^{*} (x_{l, t}^{u, s}, β_{t}))$ is the minimizer of $ϕ_{t} (y_{l, t}^{u, s}, λ_{t})$ along the 45° line passing through the state $(x_{l, t}^{u, s}, β_{t})$ . By Lemma 2, this structure yields the $(x_{l, t}^{u, s}, β_{t}) - policy$ stated in Theorem 3, which specifies the optimal between invoice payment and retained cash at a given state $(x_{l, t}^{u, s}, β_{t})$ , and describes how that balance changes with the invoice amount $x_{l, t}^{u, s}$ and the cash level $β_{t}$ .

Theorem 3.

Let the $(x_{l, t}^{u, s}, β_{t}) - policy$ be as follows:

\begin{aligned} (y_{l, t}^{u, s}, λ_{t}) = {\begin{array}{lc} (\vec{x}_{l, t}^{u, s} * (x_{l, t}^{u, s}, β_{t}), {\vec{β}}_{t} * (x_{l, t}^{u, s}, β_{t})) & i f \vec{x}_{l, t}^{u, s} * (x_{l, t}^{u, s}, β_{t}) < x_{l, t}^{u, s} a n d {\vec{β}}_{t} * (x_{l, t}^{u, s}, β_{t}) < β_{t} (a r e a (A)) \\ (x_{l, t}^{u, s}, β_{t}) & o t h e r w i s e (a r e a (B)), \end{array} \end{aligned}

where both

x_{l, t}^{u, s}, β_{t} \geq 0

. Then

ϕ_{t} (y_{l, t}^{u, s}, λ_{t})

is subconvex in

(y_{l, t}^{u, s}, λ_{t})

, and the

(x_{l, t}^{u, s}, β_{t}) - p o l i c y

is optimal.

The structure of the $(x_{l, t}^{u, s}, β_{t}) - policy$ is shown in Figure 2(a). The dashed curve in the figure denotes the locus of minimizers $({\vec{x}}_{{l, t}^{u, s}}^{*} (x_{l, t}^{u, s}, β_{t}), {\vec{β}}_{t}^{*} (x_{l, t}^{u, s}, β_{t}))$ . Since $ϕ_{t} (y_{l, t}^{u, s}, λ_{t})$ is subconvex in $(y_{l, t}^{u, s}, λ_{t})$ , Lemma 2 implies that this locus is monotone in the intercept $x_{l, t}^{u, s} - β_{t}$ (equivalently $β_{t} - x_{l, t}^{u, s}$ ): when the intercept increases by one unit, ${\vec{x}}_{{l, t}^{u, s}}^{*}$ decreases and ${\vec{β}}_{t} *$ increases, each by at most one unit. Consequently, the minimizing curve shifts in a controlled, downward direction. Geometrically, the dashed line is downward-sloping (i.e., it makes an angle between 0∘ and −90∘with the positive $x_{l, t}^{u, s}$ axis), partitioning the feasible first quadrant into regions (A) and (B). A feasible payment corresponds to a downward movement along the 45° line passing through the state $(x_{l, t}^{u, s}, β_{t})$ that stays within this feasible region.

The optimal policy is a switching-curve policy that partitions the state space into regions (A) and (B), as illustrated in Figure 2(a). The boundary, $(\vec{x} {_{l, t}^{u, s}}^{*} (x_{l, t}^{u, s}, β_{t}), {\vec{β}}_{t}^{*} (x_{l, t}^{u, s}, β_{t}))$ , represents this optimal balance between the remaining invoice amount and the cash to be retained after payment. The solid arrows indicate the optimal action in each region. In region (A), it is optimal to move diagonally toward the minimizing curve, $(\vec{x} {_{l, t}^{u, s}}^{*} (x_{l, t}^{u, s}, β_{t}), {\vec{β}}_{t}^{*} (x_{l, t}^{u, s}, β_{t}))$ . If available cash is sufficient to reach the minimizing curve and fully settle the invoice (i.e., $β_{t} \geq x_{l, t}^{u, s}$ and no additional cash needs to be retained), the invoice is fully paid off (arrow a1). If cash permits movement toward the minimizing curve but some balance must be preserved for future payments, the invoice is partially paid (arrow a2). If cash is insufficient to reach the minimizing curve, all available cash is used to partially pay the invoice, and the state moves along the 45° line until $β_{t} = 0$ (arrow a3). In all three cases, payment moves the system to the optimal feasible point. In region (B), the optimal action is to remain at the current state (arrow b1), since additional payment would move the system away from the optimal balance between invoice reduction and cash retention.

4.2.2. Optimal policy in

(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}})

Now suppose the firm has two outstanding invoices and limited cash that cannot cover both. The central question shifts: which invoice should be paid first, and should cash ever be split between them? Intuitively, splitting cash proportionally might seem like a reasonable hedge. But as we show, this is never optimal—the firm should always fully settle one invoice before allocating any cash to the other.

This trade-off has a clean geometric interpretation. In the two-dimensional space of remaining invoice balances, any feasible payment moves the state along a −45° line: one dollar more toward one invoice means one dollar less toward the other. The optimal decision is the point on this −45° line that minimizes total expected cost. As shown in Figure 2(b), the unconstrained minimizer lies at the origin (0, 0), since reducing both balances is always preferable, and the locus of constrained minimizers (the dashed curve) is downward-sloping with a slope between −45° and −90°. This geometry implies that the optimal point always lies at one of the two endpoints of the feasible −45° segment: the firm pays as much as possible toward one invoice before touching the other. This is the at-most-one-split property, which defines a strict payment priority between any two invoices. To characterize which invoice takes priority, we use the directional property—the two-invoice analog of subconvexity.

We next consider allocating available cash between two invoices, $x_{l, t}^{u, s}$ and $x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ , that both require payment. We now formalize this structure using the directional property (Kang et al., 2016) and its monotone structure, represented by the notation $({\overset{\leftarrow}{x}}_{1}^{*} (x^{1}), {\overset{\leftarrow}{x}}_{2}^{*} (x^{1}))$ (Lemma 4). We then show that the objective function of P1, $ϕ_{t} (y_{t}, λ_{t})$ , satisfies the directional property (Theorem 5).

Definition
(Superconvexity/Subconcavity). A real function $f (x_{1}, x_{2})$ on $R^{2}$ is superconvex (subconcave) in $x_{1}$ if, for every $(x_{1}, x_{2}) \in R^{2}$ and all $δ \geq 0$ , $δ_{1} \geq 0$ , the following inequality holds (Koole, 1998):
$\begin{aligned} f (x_{1}, x_{2}) + f (x_{1} + δ + δ_{1}, x_{2} - δ) \\ \geq (\leq) f (x_{1} + δ, x_{2} - δ) \\ + f (x_{1} + δ_{1}, x_{2}), \end{aligned}$

Definition (Directional Property).
A real function $f (x_{1}, x_{2})$ on $R^{2}$ has the directional property, if $f (x_{1}, x_{2})$ is superconvex in $x_{1}$ and subconcave in $x_{2}$ (Kang et al., 2016).
Notation.
For a given base point $x^{1} = (x_{1}^{1}, x_{2}^{1}) \in R^{2}$ define
$({\overset{\leftarrow}{x}}_{1}^{} (x^{1}), {\overset{\leftarrow}{x}}_{2}^{} (x^{1})) = \underset{(x_{1}, x_{2})}{argmin} f (x_{1}, x_{2}) s .t . x_{1} + x_{2} = x_{1}^{1} + x_{2}^{1} .$

That is, $({\overset{\leftarrow}{x}}_{1}^{} (x^{1}), {\overset{\leftarrow}{x}}_{2}^{} (x^{1}))$ is a minimizer of f on the −45° line passing through $x^{1}$ . If f has the directional property, the minimizers on such −45° lines satisfy the following structural monotonicity.
Lemma 4
(Monotone property of the directional property) . Let $f (x_{1}, x_{2})$ has the directional property in $(x_{1}, x_{2})$ . For any $ε > 0$ , and two points $x^{1} = (x_{1}^{1}, x_{2}^{1})$ and $x^{2} = (x_{1}^{2}, x_{2}^{2})$ satisfying when $x_{1}^{2} + x_{2}^{2} = x_{1}^{1} + x_{2}^{1} + ε$ , ${\overset{\leftarrow}{x}}_{1} * (x^{1}) - {\overset{\leftarrow}{x}}_{1} * (x^{2}) \leq ε$ and ${\overset{\leftarrow}{x}}_{2} * (x^{2}) - {\overset{\leftarrow}{x}}_{2} * (x^{1}) \geq ε$ .

Lemma 4 establishes monotonicity of $({\overset{\leftarrow}{x}}_{1}^{} (x_{1}, x_{2}), {\overset{\leftarrow}{x}}_{2}^{} (x_{1}, x_{2}))$ with respect to the intercept $x_{2} + x_{1}$ . Concretely, when $x_{2} + x_{1}$ increases by one unit, ${\overset{\leftarrow}{x}}_{1}^{} (x_{1}, x_{2})$ decreases by at most one unit and ${\overset{\leftarrow}{x}}_{2}^{} (x_{1}, x_{2})$ increases by at least one unit. Hence, the locus of points $({\overset{\leftarrow}{x}}_{1}^{} (x_{1}, x_{2}), {\overset{\leftarrow}{x}}_{2}^{} (x_{1}, x_{2}))$ traces a downward-sloping continuous curve in the $(x_{1}, x_{2})$ plane whose slope lies between −45° and −90° with respect to the positive $x_{1}$ axis (see the dashed line in Figure 2(b) and Figure OA6(c) in the e-companion). This structural result describes optimal decisions that correspond to diagonal moves at −45° in the two-dimensional state space: reducing one unit of one resource and reallocating that unit to the other resource. Such reallocation decisions commonly arise in resource-assignment problems (Kang et al., 2016), and Lemma 4 quantifies how the optimal reallocations shift as the line intercept $x_{2} + x_{1}$ changes.

In our problem, when two invoices $x_{l, t}^{u, s}$ and $x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ must be prioritized, the payment priority, (i.e., which invoice to pay first) can be studied by examining this −45° diagonal movement in the $(y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ plane. Consider the −45° line through the current state $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}})$ . If the optimal reallocation point on that line lies to the left-up from the current state, then reallocating the invoice amount from $x_{l, t}^{u, s}$ to $x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ , if it is allowed, reduces the total cost, so $x_{l, t}^{u, s}$ has higher priority. If the optimal reallocation point on that line lies to the right-down from the state, then $x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ has higher priority. Although such reallocation, a movement along a −45° line, is not feasible in our problem, this information reveals the relative marginal value of one dollar devoted to each invoice. Thus the −45° line analysis provides a ranking (priority) between the two invoices even when a literal diagonal reallocation cannot be executed.

To analyze payment priority between invoices $x_{l, t}^{u, s}$ and $x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ , we define the two-dimensional function $ϕ_{t} (y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ by projecting the original function to the $(y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ plane. We next examine the directional property of $ϕ_{t} (y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ . Theorem 5 establishes by induction that $ϕ_{t} (y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ has the directional property for every t. Building on this result, Theorem 5 characterizes the payment priority between two invoices $x_{l, t}^{u, s}$ and $x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ when a fixed amount $δ$ (>0) of cash must be allocated.

Let $(\overset{\leftarrow}{x}_{l, t}^{u, s} * (x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}), \overset{\leftarrow}{x}_{\hat{l}, t}^{\hat{u}, \hat{s}} * (x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}})) = \underset{(y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})}{argmin} ϕ_{t} (y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ s.t. $y_{l, t}^{u, s} + y_{\hat{l}, t}^{\hat{u}, \hat{s}} = x_{l, t}^{u, s} + x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ . Then $(\overset{\leftarrow}{x}_{l, t}^{u, s} * (x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}), \overset{\leftarrow}{x}_{\hat{l}, t}^{\hat{u}, \hat{s}} * (x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}))$ is the minimizer of $ϕ_{t} (y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ along the −45° line passing through the state $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}})$ . Lemma 4 yields the $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}; δ) - policy$ specified in Theorem 5, which prescribes the optimal priority rule for this allocation problem.
Theorem 5.
Let the $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}; δ) - policy$ be as follows:
$(y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}}) = {\begin{array}{ll} (x_{l, t}^{u, s} - δ, x_{\hat{l}, t}^{\hat{u}, \hat{s}}) & i f x_{l, t}^{u, s} \geq δ (a r e a (A)) \\ (0, x_{\hat{l}, t}^{\hat{u}, \hat{s}} - (δ - x_{l, t}^{u, s})) & i f x_{l, t}^{u, s} < δ a n d x_{l, t}^{u, s} + x_{\hat{l}, t}^{\hat{u}, \hat{s}} \geq δ (a r e a (B)) \\ (0, 0) & i f x_{l, t}^{u, s} < δ a n d x_{l, t}^{u, s} + x_{\hat{l}, t}^{\hat{u}, \hat{s}} < δ (a r e a (C)) \end{array}$
, where $x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}, δ \geq 0$ . Then $ϕ_{t} (y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ has the directional property in ( $y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}}$ ), and the $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}; δ) - p o l i c y$ is optimal.

The structure of the $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}; δ) - policy$ is shown in Figure 2(b). The black dot marks the unconstrained minimizer of $ϕ_{t} (y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ subject to $y_{l, t}^{u, s} \geq 0$ and $y_{\hat{l}, t}^{\hat{u}, \hat{s}} \geq 0$ ; this point is (0, 0), which is intuitive since the reduced P1 is a minimization. The dashed curve in Figure 2(b) represents the locus of minimizers $(\overset{\leftarrow}{x}_{l, t}^{u, s} * (x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}), \overset{\leftarrow}{x}_{\hat{l}, t}^{\hat{u}, \hat{s}} * (x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}))$ when $x_{l, t}^{u, s} + x_{\hat{l}, t}^{\hat{u}, \hat{s}} = 0$ . By Lemma 4, this locus passes through (0, 0) and is downward-sloping with angle between −45° and −90° relative to the positive $x_{l, t}^{u, s}$ axis.

The $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}; δ) - policy$ governs payment choices when an amount $δ$ of cash must be paid. Feasible post-payment states lie on the line $y_{l, t}^{u, s} + y_{\hat{l}, t}^{\hat{u}, \hat{s}} = x_{l, t}^{u, s} + x_{\hat{l}, t}^{\hat{u}, \hat{s}} - δ$ , subject to $0 \leq y_{l, t}^{u, s} \leq x_{l, t}^{u, s}$ and $0 \leq y_{\hat{l}, t}^{\hat{u}, \hat{s}} \leq x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ ((C1)–(C3)). Geometrically this is the closed −45∘ segment connecting the left-up endpoint $(max (0, x_{l, t}^{u, s} - δ), x_{\hat{l}, t}^{\hat{u}, \hat{s}})$ and right-down endpoint $(x_{l, t}^{u, s}, max (0, x_{\hat{l}, t}^{\hat{u}, \hat{s}} - δ))$ . Because Lemma 4 characterizes the dashed locus of unconstrained minimizers and this locus passes through (0,0), the optimal feasible payment is the left–upmost point on the feasible −45° segment that is closest to the dashed locus. In other words, the best payment is the most left-up feasible point along that −45° closed line.

The optimal policy forms a switching-curve policy. The policy divides the state space into three adjacent areas: areas (A), (B), and (C). The solid arrows represent the optimal decision in each area. According to the policy, in area (A), all $δ$ cash is paid for $x_{l, t}^{u, s}$ , but $x_{l, t}^{u, s}$ is not completely paid off (arrow a1). In areas (B) and (C), $x_{l, t}^{u, s}$ is paid off first, and the remaining cash, $δ - x_{l, t}^{u, s}$ , is used to pay $x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ . $x_{\hat{l}, t}^{\hat{u}, \hat{s}}$ is fully paid off in area (C) (arrow c1), while it is not in area (B) (arrow b1).

The $(x_{l, t}^{u, s}, x_{\hat{l}, t}^{\hat{u}, \hat{s}}; δ) - policy$ holds for any given $δ$ and state values. Consequently, in an optimal policy, when allocating cash between two invoices, one invoice must be fully paid before the other receives any payment; cash should not be split across invoices without fully settling one. Theorem 5 implies that, for any pair of invoices, one invoice has priority over the other, defining the payment priority between them. Note that in Theorem 5, $ϕ_{t} (y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ has the directional property in $(y_{l, t}^{u, s}, y_{\hat{l}, t}^{\hat{u}, \hat{s}})$ , assuming that the invoice with payment priority is $y_{l, t}^{u, s}$ . The determination of priority for any invoice pair is characterized in Section 4.4.

Theorems 3 and 5 provide structural properties based on the location of an optimal solution, which is typically unknown due to the computational challenges posed by the curse of dimensionality. However, such properties often yield valuable insights even in the absence of direct knowledge of the actual optimal solution (Kang et al., 2016; Smith and McCardle, 2002). Theorem 3 offers a key managerial insight: managers should avoid increasing cash on hand as invoice amounts rise, regardless of the optimal solution's location. Similarly, Theorem 5 suggests that one invoice should be fully paid before initiating payment on another.
4.3. Structure of an overall optimal policy in $(x_{t}, β_{t})$

Theorems 3 and 5 provide key structural insight into an overall optimal policy for SDFSCP within the original multidimensional state space $(x_{t}, β_{t})$ .

Corollary 6.
Under an optimal policy $(x_{t}, β_{t})$ , outstanding invoices at time t can be ordered by payment priority. This priority ordering is state-dependent and is induced by the optimal value function; when all pairwise priorities are deterministic (Propositions 8–11), the ordering is additionally state-independent, corresponding to a total order based on the monotone dominance of J-functions. Let $i_{t} (n)$ denote the invoice with the n-th highest priority. Payments follow this order strictly—invoice $i_{t} (n + 1)$ is considered only after $i_{t} (n)$ is fully settled. There exists a threshold $β_{t} * (x_{t}) \leq β_{t}$ , reserved for future use; payments continue in priority order until cash reaches $β_{t} * (x_{t})$ , after which no further payments are made at time t.
4.4. Structure of optimal payment priorities

Corollary 6 and Theorem 5 show that the optimal policy for SDFSCP is governed by payment priorities among invoices. We therefore characterize the structure of these priorities, distinguishing between deterministic priority, which is state-independent, and stochastic priority, which depends on the state and random parameters. We first identify conditions for deterministic priority and when it becomes stochastic. The analysis proceeds from a single invoice (Section 4.4.1), to two invoices (Section 4.4.2), and then to the multi-invoice case, where we revisit the overall policy structure (Section 4.4.3).

4.4.1. Single invoice

For a single invoice, we characterize the optimal payment timing independent of other invoices. By converting all payment amounts into end-of-horizon values, we define a projected cost function that allows time-invariant comparison across payment dates. We show that each invoice has a well-defined optimal payment time determined by the trade-off between discount, interest, and penalty effects, and Lemma 7 establishes that in an optimal policy an invoice is paid only at (or after) its optimal timing threshold, never before.

Consider invoice k. Let $L_{k} (t)$ represent the invoice amount to be paid if invoice k is paid off at time t.

L_{k} (t) = {\begin{array}{ll} L_{k} & if t \leq b_{k} \\ L_{k} α_{k} & if b_{k} < t \leq d_{k} \\ L_{k} α_{k} {γ_{k}}^{t - d_{k}} & if d_{k} < t \leq T \end{array}

We assume the value of money changes over time according to the interest rate r. Hence, we define $A_{k} (t)$ which converts the value of invoice at time t to its equivalent value at time T. Let $A_{k} (t)$ represent the projected value of $L_{k} (t)$ at time T: $A_{k} (t) = L_{k} (t) r^{T - t} A_{k} (t)$ . provides a convenient way to compare invoice values irrespective of time. Throughout this paper, we use the term “projected” to refer to this time-invariant monetary value evaluated at time T. Additionally, let $I_{k} (t) = A_{k} (t) / L_{k} . I_{k} (t)$ represents the relative projected amount of payment required at time t to pay off one dollar of invoice k. Let ${\hat{t}}_{k} = \underset{t}{argmin} I_{k} (t)$ . Then ${\hat{t}}_{k}$ is defined as follows.

{\hat{t}}_{k} = {\begin{matrix} b_{k} if α_{k} > r^{d_{k} - b_{k}} \\ d_{k} if α_{k} \leq r^{d_{k} - b_{k}} \end{matrix}

${\hat{t}}_{k}$ represents the optimal payment time to minimize the projected payment amount of invoice, assuming there is sufficient cash to pay the invoice regardless of other outstanding invoices. This result implies that, to minimize the projected payment amount of invoice k, it must be paid at the end of discount period $(b_{k})$ if the discount amount $(α_{k} - 1)$ exceeds the interest gain $(r^{d_{k} - b_{k}} - 1)$ accumulated during the normal payment period $(d_{k} - b_{k})$ . Otherwise, it must be paid at the end of normal payment period $(d_{k})$ .

Let ${\underline{I}}_{k} (t) = min_{t \leq t^{'} \leq T} I_{k} (t^{'}) . {\underline{I}}_{k} (t)$ represents the minimum projected payment—evaluated at the end of the decision horizon—that the company can make at or after time t to pay off one dollar of invoice k's original discounted amount. Since ${\underline{I}}_{k} (t)$ is non-decreasing in time t, this implies that the optimal projected amount that the company needs to pay on an invoice does not decrease as time progresses.

${\underline{I}}_{k} (t)$ provides valuable managerial insights into the optimal timing for paying an invoice. To minimize the payment amount, the company has to wait until just before ${\underline{I}}_{k} (t)$ increases (e.g., at time $b_{k}$ or $d_{k}$ ). However, if for any reason (e.g., insufficient cash) the invoice is not paid at that moment and ${\underline{I}}_{k} (t)$ has already increased, the company has to wait again and pay invoice k just before the next increase in ${\underline{I}}_{k} (t)$ (e.g., at or after time $d_{k}$ ). Thus, past values of ${\underline{I}}_{k} (t)$ and their changes over time do not influence the optimal payment decision at given time t. Instead, only the current and future values of ${\underline{I}}_{k} (t)$ do.

Let ${\hat{t}}_{k} (t) = \underset{t \leq t^{'} < T}{max (argmin} {\underline{I}}_{k} (t^{'})) .$ Then ${\hat{t}}_{k} (t)$ represents the optimal time at or after t when the payment for invoice k should be made. Suppose the current time is t. If ${\hat{t}}_{k} (t) = t$ , the payment should be made immediately. Otherwise, the payment should be deferred until $\tilde{t} = {\hat{t}}_{k} (t) > t$ . If the payment is not made at $\tilde{t}$ , the new optimal payment time, ${\hat{t}}_{k} (\tilde{t})$ , is recalculated.

Lemma 7.
In an optimal policy, if invoice k is paid at time $t$ , then $t = {\hat{t}}_{k} (t)$ and $t \geq {\hat{t}}_{k}$ .

Lemma 7 provides meaningful managerial insights into the timing of invoice payments. If there is sufficient cash on hand to pay all existing invoices and no new invoices arrive, then every invoice can be paid at its optimal payment time ${\hat{t}}_{k}$ , and this becomes an optimal policy. However, when cash is insufficient, some invoices may not be paid at their respective ${\hat{t}}_{k}$ . Also, even if there is enough cash, certain invoices may not be paid at their ${\hat{t}}_{k}$ because some cash may need to be reserved for expected future invoices. Lemma 7 implies that in even such cases, no invoice should be paid before its ${\hat{t}}_{k}$ ; instead, it must be paid only after ${\hat{t}}_{k}$ , specifically when $t = {\hat{t}}_{k} (t)$ . Furthermore, Lemma 7 and ${\hat{t}}_{k} (t)$ suggest that in an optimal policy, invoice k should be paid at time $b_{k}$ , $d_{k}$ , or after $d_{k}$ if $α_{k} > r^{d_{k} - b_{k}}$ . Otherwise, it should be paid only at or after $d_{k}$ . Beyond its managerial implications, Lemma 7 also reduces the computational complexity of finding an optimal policy.
4.2.2. Two invoices: pairwise comparisons

We next characterize payment priorities between two invoices. Using the relative projected cost ratio $J_{k} (t; t_{0})$ , Propositions 8–11 identify conditions under which one invoice has a deterministic priority over another, independent of the state and random parameters. When these monotone dominance conditions fail and the projected cost paths intersect (Proposition 12), the priority becomes stochastic, depending on the system state and uncertainty.

Let $J_{k} (t; t_{0}) = {\underline{I}}_{k} (t) / {\underline{I}}_{k} (t_{0})$ , where $t \geq t_{0}$ . $J_{k} (t; t_{0})$ . $J_{k} (t; t_{0})$ is non-decreasing in t and plays an important role in determining the optimal payment priorities between two different invoices. Suppose there are two different invoices, i and j. Then the payment priority—which invoice should be paid off first—between these two invoices is determined by Propositions 8–12.

Proposition 8.
If ${\hat{t}}_{i} (t_{0}) = {\hat{t}}_{j} (t_{0}) = t_{0}$ and $J_{i} (t; t_{0}) < J_{j} (t; t_{0})$ for any $t \geq t_{0}$ , then paying invoice j has priority over paying invoice i at time $t_{0}$ .

Under the conditions of Proposition 8, it is preferrable for the company to pay immediately both invoices i and j because ${\hat{t}}_{i} (t_{0}) = {\hat{t}}_{j} (t_{0}) = t_{0}$ . However, if the company does not have sufficient cash on hand, it must decide which invoice to pay first. Proposition 8 establishes a condition under which invoice j takes precedence, implying that invoice i should be paid only after invoice j is fully settled—if it must be paid at time $t_{0}$ , as dictated by Theorem 5.
Proposition 9.
If ${\hat{t}}_{j} (t_{0}) = t_{0}$ , ${\hat{t}}_{i} (t_{0}) > t_{0}$ , and $J_{i} (t; t_{0}) < J_{j} (t; t_{0})$ for any $t > t_{0}$ , then at time $t_{0}$ , paying invoice j has priority over reserving cash to pay invoice i at ${\hat{t}}_{i} (t_{0}) > t_{0}$ .

While the company should pay invoice j at time $t_{0}$ because ${\hat{t}}_{j} (t_{0}) = t_{0}$ , it may choose to postpone the payment of invoice i at time $t_{0}$ until ${\hat{t}}_{i} (t_{0})$ because ${\hat{t}}_{i} (t_{0}) > t_{0}$ . However, it is not always optimal for the company to pay invoice j immediately and defer invoice i. The company may instead choose to reserve funds for invoice i, rather than using the same funds to pay invoice j right away. Nevertheless, if $J_{i} (t; t_{0}) < J_{j} (t; t_{0})$ for any $t > t_{0}$ , paying invoice j takes priority, which implies that the company should only begin reserving cash for invoice i at ${\hat{t}}_{i} (t_{0})$ , but only after fully paying off invoice j.
Proposition 10.
If ${\hat{t}}_{j} (t_{0}) > t_{0}$ , ${\hat{t}}_{i} (t_{0}) > t_{0}$ , and $J_{i} (t; t_{0}) < J_{j} (t; t_{0})$ for any $t > t_{0}$ , then at time $t_{0}$ , reserving cash to pay invoice j at ${\hat{t}}_{j} (t_{0})$ has priority over reserving cash to pay invoice i at ${\hat{t}}_{i} (t_{0}) \leq {\hat{t}}_{j} (t_{0})$ .

Because ${\hat{t}}_{j} (t_{0}) > t_{0}$ and ${\hat{t}}_{i} (t_{0}) > t_{0}$ , the company may choose to delay payment for both invoices i and j at time $t_{0}$ . However, it may still need to determine the payment priority between the two invoices to allocate funds accordingly for future payments. Under the condition of Proposition 10, at time $t_{0}$ , reserving cash for invoice j has priority. This means that the company should only begin setting aside funds for invoice i only after it secures enough cash to fully pay off invoice j. Additionally, Proposition 10 suggests that invoice j should be paid before invoice i, implying that ${\hat{t}}_{i} (t_{0}) \geq {\hat{t}}_{j} (t_{0})$ .

Propositions 8 and 9 define time-dependent payment priority rules that hold at a specific point in time but may not necessarily remain valid beyond that moment. For instance, if invoices i and j satisfy the conditions of Proposition 8 or 9 and invoice j is not fully paid at time $t_{0}$ , then the priority of paying invoice j over invoice i may no longer hold for time $t > t_{0}$ . In contrast, Proposition 10 establishes a priority rule that applies over a certain period. The priority rule specified in Proposition 10 holds from time $t_{0}$ until ${\hat{t}}_{j} (t_{0})$ . However, this rule may not necessarily remain valid after ${\hat{t}}_{j} (t_{0})$ . Next, we introduce a stronger payment priority rule, which ensures that one invoice consistently takes priority over another from a certain point in time until the end of the decision horizon. Let $Δ {\underline{I}}_{k} (t) = ({\underline{I}}_{k} (t + 1) - {\underline{I}}_{k} (t)) / {\underline{I}}_{k} (t) . Δ {\underline{I}}_{k} (t)$ represents the marginal rate of change in ${\underline{I}}_{k} (t)$ .
Proposition 11.
If $Δ {\underline{I}}_{i} (t) \leq Δ {\underline{I}}_{j} (t)$ for any $t \geq t_{0}$ , paying invoice j has priority over paying invoice i at any $t \geq t_{0}$ .

Propositions 8, 9, 10, and 11 provide valuable insights into the payment priority rules between two invoices. These rules hold consistently regardless of other random parameters, such as the arrival rates of new invoices and cash inflows. For this reason, we refer to these payment priorities as deterministic optimal payment priorities. However, not all invoice pairs can be prioritized using these propositions. In some cases, the payment priority between two existing invoices depends on the problem state and random parameters. We refer to this as the stochastic optimal payment priority. Proposition 12 illustrates an example of this stochastic optimal payment priority.
Proposition 12.
If there exist $t_{1}$ and $t_{2}$ , where $t_{0} < t_{1} < T$ , $t_{0} < t_{2} < T$ , and $t_{1} \neq t_{2}$ , which satisfy $(J_{i} (t_{1}; t_{0}) - J_{j} (t_{1}; t_{0})) \times (J_{i} (t_{2}; t_{0}) - J_{j} (t_{2}; t_{0})) < 0$ , then at time $t_{0}$ , invoices i and j do not have a deterministic optimal payment priority but a stochastic optimal payment priority.

Figure 3 shows an example when the optimal payment priority between invoices i and j becomes stochastic. The figure shows ${\hat{t}}_{i} (a) = {\hat{t}}_{j} (a) = a$ , meaning that both invoices need to be paid at time a if there is sufficient cash on hand. However, suppose only one invoice can be paid due to limited cash availability. In this case, none of Propositions 8, 9, 10, or 11 applies. As shown in Figure 3, $J_{i} (t; a)$ and $J_{j} (t; a)$ intersect at $t = d_{j}$ . If invoice j is paid at time a and invoice i is paid later at $t > a$ , the total relative payment amount is $1 + J_{i} (t; a)$ . Conversely, if invoice i is paid at time a and invoice j is paid later at $t > a$ , the total payment is $1 + J_{j} (t; a)$ . Notably, $J_{i} (t; a) < J_{j} (t; a)$ when $t < d_{j}$ but $J_{i} (t; a) > J_{j} (t; a)$ when $t > d_{j}$ . This means if the invoice deferred at time $a$ is likely to be paid before time $d_{j}$ , then paying invoice j at time a minimizes the expected total payment for both invoices. Conversely, if the deferred invoice is more likely to be paid after $d_{j}$ , then paying invoice i at time a is the better choice. In this scenario, the optimal payment priority cannot be determined using deterministic rules alone. Instead, it requires computing an exact optimal policy based on the problem's random parameters—such as the arrival rates of invoices and cash inflows—as well as the current state of the system.

Figure 3.
An example of the stochastic optimal payment priority.
4.2.3. Multiple invoices

We now understand both the overall structure of an optimal policy and the payment priorities between two invoices. Building on these insights, we extend the structure to characterize an optimal policy for multiple invoices.

Let $s_{l} *$ be s that satisfies $α_{l} = r^{s}$ . We refer to $s_{l} *$ as the type-l invoice's discount-interest gain threshold. This threshold represents the number of time intervals required for the cumulative interest gain to equal the discount benefit obtained by paying the invoice before the end of the discount period. Similarly, let $u_{l} *$ be u that satisfies $(γ_{l} / r)^{u} = α_{l}$ . We refer to $u_{l} *$ as the discount-penalty threshold. This threshold represents the number of time intervals required for the cumulative penalty to equal the discount benefit. Given $α_{l}$ , $γ_{l}$ , and r, both $s_{l} *$ and $u_{l} *$ can be easily computed.

We consider two different types of invoices: l and $l^{'}$ . Let $s *$ (or $u *$ ) and $s^{' *}$ (or $u^{' *}$ ) be the discount-interest gain threshold (or discount-penalty threshold) for l and $l^{'}$ , respectively. Let s and $s^{'}$ be the remaining discount payment periods, let u and $u^{'}$ be the remaining normal payment period, let $γ_{l}$ and $γ_{l^{'}}$ be the penalty rate, and let $α_{l}$ and $α_{l^{'}}$ be the discount rate for l and $l^{'}$ , respectively. Using the payment priorities introduced in Section 4.4.2, we further develop the conditions when $l^{'}$ has a deterministic payment priority over l at time $t_{0}$ .

Rule (1) represents the case when both invoices $l^{'}$ and l should be paid at the end of the normal payment period—regardless of whether the discount period remains—because the discount gain is not greater than the interest gain ( $s > s *$ and $s^{'} > s^{' *}$ ), or when the normal payment period has already ended ( $s = 0$ and $s^{'} = 0$ ). If the penalty rate of invoice $l^{'}$ is greater than that of invoice $l (γ_{l^{'}} > γ_{l})$ , one might assume that invoice $l^{'}$ always takes payment priority over invoice l to mitigate the risk of incurring more penalties—an approach aligned with the basic principle of the avalanche method. However, this is only true if the total remaining time of invoice $l^{'}$ until the end of its normal payment is shorter than that of invoice $l (u^{'} + s^{'} \leq u + s)$ , which satisfies the condition of Proposition 10. In such cases, invoice $l^{'}$ has a deterministic payment priority over invoice l (see rule (1.1)), and this priority remains valid at any time until T due to Proposition 11. Otherwise, they have a stochastic payment priority according to Proposition 12. Similarly, when invoice l has a higher penalty rate $(γ_{l^{'}} \leq γ_{l})$ , one might assume that invoice l should be prioritized over invoice $l^{'}$ to reduce the risk of higher penalties. However, if the relative, minimal, projected amount of invoice $l^{'}$ at time T is greater than that of invoice $l (J_{l^{'}} (T; t_{0}) \geq J_{l} (T; t_{0}))$ , then invoice $l^{'}$ has a deterministic payment priority over invoice l (see rule (1.2)). This situation can occur when the normal payment deadline for invoice l is significantly later than that of invoice $l^{'} (u^{'} + s^{'} \leq u + s)$ . However, even in this case, at the due for the normal payment of invoice $l^{'} (u^{'} = 1, u > 1)$ , if the projected amount of invoice $l^{'}$ is lower than that of invoice l at the end of the decision horizon $(J_{l^{'}} (T; t_{0}) < J_{l} (T; t_{0}))$ , paying invoice $l^{'}$ immediately may not be the optimal decision. In this case, the priority is no longer deterministic due to Proposition 12, and reserving funds for a future payment toward invoice l may be preferable, depending on the problem state and stochastic parameters.

Rule (2) addresses the cases when both invoices benefit from being paid at discounted prices rather than normal prices, as the discount gains exceed the interest gains that can be earned during the normal periods ( $0 < s \leq s *$ and $0 < s^{'} \leq s^{' *}$ ). The priority rules in this scenario become more complex, as the conditions under which invoice $l^{'}$ has a deterministic payment priority over invoice l depend on the relative values of their penalty and discount rates. When both the penalty and discount rates of invoice $l^{'}$ are higher than those of invoice l, one might assume that invoice $l^{'}$ always takes payment priority over invoice l as it appears to offer higher discount savings or reduce penalties more effectively. However, this is true only if the remaining time of the discount period of invoice $l^{'}$ is shorter than that of invoice $l (u^{'} \leq u)$ and the relative minimal projected amount of invoice $l^{'}$ is greater than that of invoice l at the end of the normal period of invoice $l^{'} (J_{l^{'}} (t_{0} + u^{'} + s^{'}; t_{0}) \geq J_{l} (t_{0} + u^{'} + s^{'}; t_{0}))$ . If these conditions are not met, it has a stochastic priority (see rule (2.1)). On the contrary, when both the penalty and discount rates of invoice $l^{'}$ are smaller than those of invoice l, invoice $l^{'}$ still can take payment priority over invoice l if the relative minimal projected amount of invoice $l^{'}$ is greater than that of invoice l at both the end of the discount period of invoice $l (t = u)$ and the end of the decision horizon $(t = T)$ (see rule (2.4)). This can happen when invoice l has a sufficiently long remaining discount period, even longer than the total of remaining discount and normal periods of invoice $l^{'} (u^{'} + s^{'} \leq u)$ . When invoice $l^{'}$ has an advantage of payment priority over invoice l in penalty $(γ_{l^{'}} > γ_{l})$ but not in discount $(α_{l^{'}} \leq α_{l})$ , only under the condition when the relative minimal projected amount of invoice $l^{'}$ is greater than that of invoice l at the end of the discount period of invoice l, invoice $l^{'}$ has a deterministic payment priority over invoice l (see rule (2.2)). On the other hand, when invoice $l^{'}$ has an advantage in discount $(α_{l^{'}} > α_{l})$ but not in penalty $(γ_{l^{'}} \leq γ_{l})$ , the necessary condition becomes $u^{'} \leq u$ and $J_{l^{'}} (T; t_{0}) \geq J_{l} (T; t_{0})$ (see rule (2.3)).

Rule (3) addresses the cases when invoice $l^{'}$ is more advantageous to be paid at the normal price, while invoice l benefits more from being paid at the discounted price. When the penalty rate of invoice $l^{'}$ is greater than invoice $l (γ_{l^{'}} > γ_{l})$ , invoice $l^{'}$ has a payment priority only if the relative minimal projected amount of invoice $l^{'}$ is greater than that of invoice l at the end of the discount period of invoice $l (t = u)$ (see rule (3.1)). In fact, this condition ensures that invoice $l^{'}$ always maintains payment priority over invoice l after $t = u$ , as established by Proposition 11. However, when the penalty rate of invoice $l^{'}$ is smaller than that of invoice $l (γ_{l^{'}} \leq γ_{l})$ , an additional condition, $J_{l^{'}} (T; t_{0}) \geq J_{l} (T; t_{0})$ , must also be satisfied to have a deterministic payment priority over invoice l (see rule (3.2)).

Rule (4) represents the opposite cases of rule (3), showing situations where invoice $l^{'}$ is more advantageous to be paid at the discounted price, while invoice l benefits more from being paid at the normal price. For invoice $l^{'}$ to take priority over invoice l, first, its remaining time until the end of the discount period $(u^{'})$ must be shorter than the total remaining time of invoice l until the end of its normal period $(u + s)$ . Additionally, an extra condition must be met. When the penalty rate of invoice $l^{'}$ is greater than that of invoice $l (γ_{l^{'}} > γ_{l})$ , the relative minimal projected amount of invoice $l^{'}$ must be greater than that of invoice l at the end of the normal period of invoice $l^{'}$ (see rule (4.1)). When $γ_{l^{'}} \leq γ_{l}$ , the relative minimal projected amount of invoice $l^{'}$ must be greater than that of invoice l at $t = T$ (see rule (4.2)).

Theorem 13.
$l^{'}$ has a deterministic optimal payment priority over l under the following conditions:

(1) For $s > s $ (or $s = 0$ ) and $s^{'} > s^{' }$ (or $s^{'} = 0$ )

(1.1) When $γ_{l^{'}} > γ_{l}$ , $u^{'} + s^{'} \leq u + s$ .

(1.2) When $γ_{l^{'}} \leq γ_{l}$ , $J_{l^{'}} (T; t_{0}) \geq J_{l} (T; t_{0})$ .

(2) For $0 < s \leq s $ and $0 < s^{'} \leq s^{' }$ ,

(2.1) When $γ_{l^{'}} > γ_{l}$ and $α_{l^{'}} > α_{l}$ , $u^{'} \leq u$ and $J_{l^{'}} (t_{0} + u^{'} + s^{'}; t_{0}) \geq J_{l} (t_{0} + u^{'} + s^{'}; t_{0}) .$

(2.2) When $γ_{l^{'}} > γ_{l}$ and $α_{l^{'}} \leq α_{l}$ , $J_{l^{'}} (t_{0} + u; t_{0}) \geq J_{l} (t_{0} + u; t_{0})$ .

(2.3) When $γ_{l^{'}} \leq γ_{l}$ and $α_{l^{'}} > α_{l}$ , $u^{'} \leq u$ and $J_{l^{'}} (T; t_{0}) \geq J_{l} (T; t_{0})$ .

(2.4) When $γ_{l^{'}} \leq γ_{l}$ and $α_{l^{'}} \leq α_{l}$ , $J_{l^{'}} (t_{0} + u; t_{0}) \geq J_{l} (t_{0} + u; t_{0})$ and $J_{l^{'}} (T; t_{0}) \geq J_{l} (T; t_{0})$ .

(3) For $0 < s \leq s $ and $s^{'} > s^{' }$ (or $s^{'} = 0$ )

(3.1) When $γ_{l^{'}} > γ_{l}$ , $J_{l^{'}} (t_{0} + u; t_{0}) \geq J_{l} (t_{0} + u; t_{0})$ .

(3.2) When $γ_{l^{'}} \leq γ_{l}$ , $J_{l^{'}} (t_{0} + u; t_{0}) \geq J_{l} (t_{0} + u; t_{0})$ and $J_{l^{'}} (T; t_{0}) \geq J_{l} (T; t_{0})$ .

(4) For $s > s $ (or $s = 0$ ) and $0 < s^{'} \leq s^{' }$ ,

(4.1) When $γ_{l^{'}} > γ_{l}$ , $u^{'} \leq u + s a n d J_{l^{'}} (t_{0} + u^{'} + s^{'}; t_{0}) \geq J_{l} (t_{0} + u^{'} + s^{'}; t_{0}) .$

(4.2) When $γ_{l^{'}} \leq γ_{l}$ , $u^{'} \leq u + s$ and $J_{l^{'}} (T; t_{0}) \geq J_{l} (T; t_{0})$ .

Theorem 13 characterizes when one invoice has a deterministic optimal payment priority over another. When these conditions hold, the ordering is stable and state-independent; otherwise, priority becomes state-dependent and stochastic. For invoices of the same type, additional refinements are provided in the e-companion. Together, Propositions 8–12 and Theorem 13 clarify the structure of the optimal policy. Once parameters are specified, deterministic priorities can be identified at each state, substantially reducing computational complexity in both exact optimization and heuristic design.

Importantly, Theorem 13 shows that common managerial instincts can be misleading. Consider rule (4.1), where invoice $l^{'}$ is at the last discount opportunity $(u^{'} = 1)$ while invoice l still has time remaining $(u > 1, s > 1)$ . A natural reaction is to pay $l^{'}$ immediately—both to capture the expiring discount and to avoid its higher penalty rate $(γ_{l^{'}} > γ_{l})$ . However, the optimal decision depends not only on rates and deadlines but also on the projected future burden. If the projected minimal amount of $l^{'}$ at the end of its normal period is not greater than that of l, then paying $l^{'}$ immediately is not necessarily optimal. In some states, preserving cash and allocating it later to invoice l leads to a lower expected total cost. Thus, simple rules such as “use the last discount” or “pay the highest penalty first” are not universally optimal.

Although the complete payment rules are intricate and state-dependent, the structural results yield several high-level managerial takeaways:
Pay only when the invoice reaches its economic trigger ( Lemma 7 ). An invoice should be paid when further delay increases its projected payment burden—typically at the end of the discount period if the discount benefit exceeds the interest gain, or at/after the normal due date otherwise. Paying earlier than this trigger destroys liquidity without reducing long-run cost.

When discount benefits are weak relative to interest, prioritize urgency and downside risk ( Propositions 10 and 11 ; Theorem 13 ). If holding cash is more valuable than capturing the discount, priority should be driven by remaining time to due date and the potential growth of penalties. Invoices closer to penalty escalation or with faster projected cost growth should be paid first.

When discounts are economically meaningful, compare overall projected burden—not just rates ( Proposition 12 ; Theorem 13 ). An invoice with a higher discount or penalty rate does not automatically deserve priority. When projected cost paths intersect, priority becomes state-dependent. In such cases, invoices with the larger cumulative projected burden over the remaining horizon should be prioritized.

These principles guide the design of our heuristic, which operationalizes the structural ordering and pay-or-reserve rules in a computationally tractable manner.
5. Heuristic development and computational experiments

To demonstrate the effectiveness of the structural properties of the optimal policy, we develop a simple heuristic that leverages these properties. We then compare its performance to existing heuristics, specifically the snowball and the avalanche methods. These two methods are most widely used in practice and implemented in many commercial software for SDFSCP (Merritt, 2024; Rios-Solis et al., 2017).

5.1. A new heuristic: optimal-policies-based heuristic

Our proposed heuristic, the optimal-policies-based heuristic (OPBH), operationalizes the structural priority rules (Propositions 8–12) and the pay-or-reserve logic of Lemma 7. It first applies all deterministic priority conditions. When priority is stochastic (Proposition 12), invoices are ranked using a second-order dominance approximation: we compare the areas under their projected cost ratio curves, $\sum_{t^{'} = t}^{T} J_{k} (t^{'}; t)$ , over the remaining planning horizon, and prioritize the invoice with the larger aggregate projected burden (see Section 4.4.2 for details on $J_{k} (t^{'}; t)$ ). The simplified logic of OPBH is as follows (see the e-companion for full details). For each period $t = 0, 1, \dots, T - 1$ , update cash and invoice balances as in Figure 1, then:

Step 1 (Ordering). Apply deterministic priority rules when their conditions hold (Propositions 8–11); otherwise rank invoices using the area-based rule (Proposition 12).

Step 2 (Pay-or-reserve). Process invoices sequentially. If invoice k satisfies the pay-now condition (Lemma 7), allocate cash fully if sufficient, otherwise partially if preemptive payment is allowed (or stop if non-preemptive). If not, reserve the cash. Proceed to $t + 1$ once cash is exhausted or all invoices are processed.

5.2. Experimental design

We consider a company that receives invoices from upstream partners and cash inflows from downstream partners on a weekly basis, making payment decisions over a 1-year horizon (52 weeks). Parameter values reflect common business practices: early payment discounts of 1% to 5% (PYMNTS, 2022), late penalties of 0.3% to 1.3% per week (Resolve, 2025b), and a 0.1% weekly interest rate (about 5.3% annually). These settings align with prior literature (Gupta and Dutta, 2011; Gupta et al., 1987; Rios-Solis et al., 2017) and provide a realistic basis to evaluate the effectiveness of our heuristic. The parameters used in this experiment are detailed in Table 3. For each combination of parameters, we randomly generate 30 problem instances.

Table 3.
Parameters used in the experiments.

Parameter Values Parameter Values Parameter Values

$n$ 100 (base) to100,000 $α_{k}$ $U (100 / 99, 100 / 95)$ $L_{k}$ $U (1000, 5000)$

$T$ 52 weeks $γ_{k}$ $U (1.003, 1.013)$ $r$ 1.001

$a_{k}$ $U (- T / 4, T / 4)$ $b_{k}$ $a_{k} + U (T / 6, T / 3)$ $d_{k}$ $b_{k} + U (T / 6, T / 3)$

$Q_{t}$ $Q_{t} = U (0.9, 1.1) \sum L_{k} α_{k} * t i g h t n e s s / T, \; where\; t i g h t n e s s = {0.4, 0.7, 1 (base), 1.3, 1.6}$

Parameter	Values	Parameter	Values	Parameter	Values
$n$	100 (base*) to100,000	$α_{k}$	$U (100 / 99, 100 / 95)$	$L_{k}$	$U (1000, 5000)$
$T$	52 weeks	$γ_{k}$	$U (1.003, 1.013)$	$r$	1.001
$a_{k}$	$U (- T / 4, T / 4)$	$b_{k}$	$a_{k} + U (T / 6, T / 3)$	$d_{k}$	$b_{k} + U (T / 6, T / 3)$
$Q_{t}$	$Q_{t} = U (0.9, 1.1) * \sum L_{k} α_{k} * t i g h t n e s s / T, \; where\; t i g h t n e s s = {0.4, 0.7, 1 (base), 1.3, 1.6}$

Note. *For the main analysis, we use the base values for n (number of invoices) and the tightness to control cash inflow $(Q_{t})$ .

To demonstrate the effectiveness of our proposed heuristic, we compare it against two established approaches: the snowball method and the avalanche method (Merritt, 2024; Rios-Solis et al., 2017). Specifically, we implement four variations tailored to our problem structure: (a) snowball with the smallest face value (SF), (b) snowball with the smallest current debt amount (SC), (c) avalanche with the highest penalty rate (AP), and (d) avalanche with the highest discount rate (AD). Detailed algorithmic descriptions are provided in the e-companion. We use the same objective as the analytical model: total payment—the amount paid toward all invoices received during the horizon; any unpaid invoices at the end are valued at their outstanding balance and included. All heuristics and the problem generator are implemented in Python, and the experiments are conducted on a system with an Intel i9-10900KF processor and 64 GB of RAM.

5.3. An illustrative sample path: why avalanche and snowball fail

Before presenting aggregate results, we provide a small illustrative sample path to clarify why simple benchmark policies can perform poorly relative to our OPBH heuristic. Consider $T = 52$ weeks with weekly interest $r = 1.001$ . Two invoices arrive at $t = 0$ . Invoice A has $L_{A} = 3000$ , $α_{A} = 100 / 97$ , $γ_{A} = 1.012$ , $b_{A} = 0$ , and $d_{A} = 2$ ; invoice B has $L_{B} = 3000$ , $α_{B} = 100 / 95$ , $γ_{B} = 1.013$ , $b_{B} = 11$ , and $d_{B} = 30$ . Initial cash is $β_{0} = 3000$ , with inflow $Q_{10} = 4000$ ; partial payment is allowed. Benchmark heuristics (SF/SC/AD/AP) prioritize B because it has a smaller balance, a larger discount, and a higher penalty rate. A is then settled at $t = 10$ for 3402.47, yielding a total end-of-horizon payment of 6708.38. In contrast, OPBH applies the pay-now screening (Lemma 7) and the projected-cost comparison (Proposition 9), pays A at $t = 0$ , and defers B to $t = 10$ , reducing the total payment to 6288.66, which is approximately 6.7% lower. The economic logic is straightforward: Invoice A has a discount period ending at $t = 0$ and incurs a 1.2% weekly penalty thereafter, making immediate payment urgent. Invoice B, by contrast, retains its discounted price until $t = 11$ , so deferring it costs nothing for 11 weeks while the firm earns interest on the reserved cash. The benchmark heuristics miss this asymmetry because they rank invoices by surface attributes—balance size, discount rate, or penalty rate—without accounting for the time value of delay. OPBH, guided by Lemma 7's pay-now trigger and Proposition 9's projected-cost comparison, correctly identifies that the urgency differential between A and B justifies a different sequencing than any of the benchmarks would prescribe. We next evaluate whether these structural advantages persist systematically across the full experimental design.

5.4. Experimental results

Table 4 reports the main results. It illustrates the comparative effectiveness of our heuristic against the others, measured by gap performance, defined as (other heuristic's total payment – our heuristic's total payment)/our heuristic's total payment × 100. Each cell in the table provides descriptive statistics of gap performance in the format “average (standard deviation) [min – max].” OPBH consistently outperforms all alternatives: the average gap is 4.01%, statistically significant (t-test, p < 0.001). It consistently outperforms the other methods in every problem instance (i.e., minimum gap > 0). The source of this systematic advantage is structural: OPBH leverages two properties that benchmark heuristics ignore. First, the pay-now trigger (Lemma 7) prevents premature payments that sacrifice liquidity without reducing long-run cost—benchmark heuristics frequently pay invoices before their economic trigger, locking up cash that could earn interest or cover higher-urgency invoices arriving later. Second, the projected-cost priority (Propositions 8–11) sequences invoices by their time-adjusted burden rather than by static attributes such as balance size or penalty rate, capturing the compounding dynamics of discounts and penalties over the remaining horizon. These advantages are most pronounced when cash is tight (tightness = 0.4–0.7), where the pay-now discipline prevents costly early commitments, and when the horizon is long, where priority misordering by benchmarks compounds over many periods. When cash is abundant (tightness = 1.6), all invoices can be paid at their optimal times regardless of sequencing, so the structural advantage shrinks—though OPBH still dominates in every instance.

Table 4.
Gap performance.

SF SC AP AD Overall

4.69 (0.72) [3.5–6.51] 5.46 (0.78) [4.04–7.33] 1.65 (0.27) [1.2–2.23] 4.24 (0.53) [3.36–5.52] 4.01 (1.56) [1.20–7.33]

SF	SC	AP	AD	Overall
4.69 (0.72) [3.5–6.51]	5.46 (0.78) [4.04–7.33]	1.65 (0.27) [1.2–2.23]	4.24 (0.53) [3.36–5.52]	4.01 (1.56) [1.20–7.33]

Table 5 presents a sensitivity analysis on cash-on-hand tightness and the number of invoices. Cash tightness materially affects gap performance. The problem becomes close to trivial when cash is abundant: firms can pay all discounted invoices and avoid late penalties, so differences among heuristics narrow—yet our heuristic still outperforms others by better exploiting payment terms. When cash is scarce, the problem is harder because selecting which invoices to pay is critical; our heuristic is designed for this setting and continues to perform strongly. By contrast, gap performance is largely unaffected by the number of invoices, indicating that the heuristic performs consistently well across problem sizes. The heuristic runs in $O (n l o g n)$ , where n is the number of invoices, making it suitable for large instances. In our experiments, 100,000-invoice cases (∼2000 new invoices per week) average 437 s of computation time.

Table 5.

Sensitivity analysis on cash-on-hand tightness and the number of invoices.

Tightness	SF	SC	AP	AD	Overall
0.4	4.73 (0.59) [3.56–5.8]	5.06 (0.62) [3.84–6.31]	1.78 (0.43) [0.97–2.69]	4.52 (0.73) [3.21–6.04]	4.02 (1.44) [0.97–6.31]
0.7	5.72 (0.58) [4.53–7.06]	6.31 (0.6) [5.2–7.47]	1.93 (0.34) [1.42–2.91]	5.41 (0.61) [4.21–6.56]	4.84 (1.80) [1.42–7.47]
1 (base)	4.69 (0.72) [3.5–6.51]	5.46 (0.78) [4.04–7.33]	1.65 (0.27) [1.2–2.23]	4.24 (0.53) [3.36–5.52]	4.01 (1.56) [1.20–7.33]
1.3	2.78 (0.62) [1.77–4.55]	3.28 (0.7) [2.1–5.06]	1.43 (0.21) [1.03–2.02]	2.55 (0.43) [1.88–3.77]	2.51 (0.86) [1.03–5.06]
1.6	1.92 (0.31) [1.3–2.89]	2.17 (0.34) [1.38–3.05]	1.17 (0.17) [0.9–1.58]	1.72 (0.28) [1.19–2.56]	1.74 (0.46) [0.90–3.05]
n	SF	SC	AP	AD	Overall
100 (base)	4.69 (0.72) [3.5–6.51]	5.46 (0.78) [4.04–7.33]	1.65 (0.27) [1.2–2.23]	4.24 (0.53) [3.36–5.52]	4.01 (1.56) [1.20–7.33]
1000	4.63 (0.22) [4.25–5.0]	5.44 (0.25) [5.02–5.85]	1.72 (0.08) [1.53–1.87]	4.44 (0.19) [4.0–4.81]	4.06 (1.42) [1.53–5.85]
10,000	4.61 (0.1) [4.43–4.78]	5.42 (0.1) [5.26–5.57]	1.71 (0.03) [1.65–1.79]	4.41 (0.1) [4.26–4.57]	4.04 (1.40) [1.65–5.57]
100,000	4.63 (0.06) [4.51–4.76]	5.44 (0.06) [5.31–5.57]	1.7 (0.01) [1.68–1.73]	4.43 (0.06) [4.31–4.58]	4.05 (1.42) [1.68–5.57]

5.5. Worst case and additional analyses

The proposed heuristic, OPBH, utilizes both optimal deterministic and stochastic payment priorities identified in Section 4.4.2 and the insights from Theorems 3 and 5. While deterministic priorities ensure optimality, computing the optimal stochastic priority is computationally infeasible due to its complexity. Instead, OPBH employs a simple approximation by comparing the under-area of the $J_{k} (t^{'}; t)$ curve to prioritize invoices under stochastic priority conditions. $J_{k} (t^{'}; t)$ represents the increased cost of an invoice when its payment is deferred to a future time $t^{'}$ rather than being made immediately at t (see Section 4.4.2). The intuition behind comparing under-areas is based on the assumption that all possible future payments at every $t^{'}$ have an equal probability of occurring. However, this assumption may not hold in scenarios where cash availability is highly skewed, particularly when cash inflows are heavily left-skewed—that is, concentrated toward the end of the decision horizon.

To assess OPBH's performance under such conditions, we test it in scenarios with highly skewed cash inflows, as shown in Table 6. As expected, the effectiveness of OPBH decreases as the skewness of cash inflows increases. For instance, when 90% of cash inflows are concentrated in the last five periods (∼10% of the horizon), the average performance gap decreases by 2.35%.

Table 6.
Sensitivity analysis on the skewedness of cash inflow.

Skewed SF SC AP AD Overall

Even(base) 4.69 (0.72) [3.5–6.51] 5.46 (0.78) [4.04–7.33] 1.65 (0.27) [1.2–2.23] 4.24 (0.53) [3.36–5.52] 4.01 (1.56) [1.20–7.33]

70% 4.34 (0.43) [3.53–5.3] 4.64 (0.48) [3.76–5.65] 1.54 (0.3) [1.04–2.2] 4.17 (0.54) [2.77–5.2] 3.67 (1.32) [1.04–5.65]

90% 1.94 (0.21) [1.46–2.36] 2.1 (0.2) [1.6–2.43] 0.64 (0.19) [0.3–1.11] 1.95 (0.35) [1.31–2.54] 1.66 (0.64) [0.30–2.54]

Skewed	SF	SC	AP	AD	Overall
Even(base)	4.69 (0.72) [3.5–6.51]	5.46 (0.78) [4.04–7.33]	1.65 (0.27) [1.2–2.23]	4.24 (0.53) [3.36–5.52]	4.01 (1.56) [1.20–7.33]
70%	4.34 (0.43) [3.53–5.3]	4.64 (0.48) [3.76–5.65]	1.54 (0.3) [1.04–2.2]	4.17 (0.54) [2.77–5.2]	3.67 (1.32) [1.04–5.65]
90%	1.94 (0.21) [1.46–2.36]	2.1 (0.2) [1.6–2.43]	0.64 (0.19) [0.3–1.11]	1.95 (0.35) [1.31–2.54]	1.66 (0.64) [0.30–2.54]

Note. Seventy percent (70%) or ninety percent (90%) indicates that 70% or 90% of the total cash inflow is concentrated in the last five periods, which represent approximately 10% of the decision horizon.

According to the sensitivity analysis of problem parameters, OPBH's gap is minimized when cash is abundant (tightness = 1.6). Combined with this disadvantageous parameter setting, highly skewed cash inflows (90%) constitute a worst-case configuration in our experiments. In this worst case, the average gap is 2.30%, and the difference is statistically significant. OPBH still outperforms all benchmark heuristics in every instance (see Table 7).

Table 7.

Worst case scenario analysis.

SF	SC	AP	AD	Overall
2.7 (0.28) [2.19–3.24]	2.91 (0.28) [2.38–3.56]	1.0 (0.28) [0.55–1.63]	2.59 (0.29) [2.15–3.29]	2.30 (0.81) [0.55–3.56]

We further evaluate OPBH across a range of settings. Although our main experiments use a weekly decision epoch and a one-year horizon, the heuristic applies equally to other intervals (including daily decisions). Additional experiments show that changing the decision epoch does not materially affect performance. The decision horizon can also be lengthened or shortened; the observed performance gap tends to widen as the horizon increases, suggesting that OPBH's advantages accumulate over time and yield greater payment efficiency. We also test the effect of invoice arrival concentration and find that concentration alone does not materially alter OPBH's performance. Finally, we conduct sensitivity analyses over discount, penalty, and interest rates; OPBH continues to perform robustly. Detailed results are reported in the e-companion.

In summary, across all problem instances, including those under worst-case or extreme scenarios, our heuristic consistently outperforms all other heuristics. Given the relative size of AP compared to retained earnings or total assets (see Table 2), even a small percentage improvement in total payment savings can translate into millions or even billions of dollars, directly enhancing profit margins.

5.6. Non-preemptive case analysis

Partial invoice payments are increasingly common in supply chain and financial management. According to Resolve (2025a), over 30% of B2B transactions now involve partial payment arrangements. For instance, Amazon allows its business customers and AWS users to make partial payments, providing greater flexibility in managing cash flows (McMillan, 2025). Similarly, Buy Now, Pay Later (BNPL) providers, such as Klarna, Affirm, and Afterpay, enable consumers to make partial payments on their purchases (Investopedia, 2025). Many enterprise systems, such as SAP, natively support partial invoice payments, enabling firms to record and manage split settlements in both accounts receivable and AP processes. Prior research has similarly assumed preemptive invoice structures that allow partitioned payments (Ng et al., 2012; Rios-Solis et al., 2017). These practices and prior studies support our assumption that invoices are preemptive, allowing partitioning for partial settlements to support dynamic cash flow management.

While our analysis focuses on preemptive full payments, non-preemptive full payments are also widely used in practice. The structural properties of an optimal preemptive policy do not guarantee optimality in non-preemptive settings, due to stricter indivisibility and sequencing constraints (Correa et al., 2012; Lawler and Labetoulle, 1978). Nevertheless, key insights, such as priority ordering and threshold-based decision rules, can often be adapted as effective heuristics in non-preemptive environments (Correa et al., 2012; Pinedo, 2016). Although deriving tight theoretical performance bounds when applying preemptive-based strategies to non-preemptive contexts is challenging, assessing whether these structural insights yield good practical outcomes remains feasible. Motivated by this, we modify the heuristic and compare its performance to that of existing heuristics in non-preemptive scenarios.

For this analysis, we enforce a non-preemptive constraint in OPBH: if available funds are insufficient to fully pay an invoice, payment is skipped rather than partially allocated. We likewise adjust all benchmark heuristics to follow the same non-preemptive rules for a fair comparison. The pseudocode appears in the e-companion. Table 8 reports results under the same parameter settings as the preemptive analysis. In the non-preemptive setting, OPBH consistently outperforms existing heuristics, with the average performance gap of 4.08%. As in the preemptive setting, OPBH dominates in every instance, reinforcing the robustness of the structural insights derived from the preemptive model. The average performance difference between the non-preemptive and preemptive OPBHs is 0.07%. While not a theoretical bound, this suggests that the loss of optimality from enforcing non-preemption may be small.

Table 8.
Gap performance—non-preemptive case.

SF SC AP AD Overall

4.77 (0.73) [3.57–6.62] 5.57 (0.79) [4.14–7.47] 1.66 (0.28) [1.25–2.3] 4.3 (0.55) [3.41–5.65] 4.08 (1.59) [1.25–7.47]

SF	SC	AP	AD	Overall
4.77 (0.73) [3.57–6.62]	5.57 (0.79) [4.14–7.47]	1.66 (0.28) [1.25–2.3]	4.3 (0.55) [3.41–5.65]	4.08 (1.59) [1.25–7.47]

6. Academic contributions and managerial implications

This study investigates the SDFSCP, where a company seeks to minimize total invoice payments under uncertain cash inflows, payment terms, and interest on cash holdings. Our contributions are fourfold. First, we introduce the first dynamic stochastic programming formulation of SDFSCP. The model explicitly incorporates both time-variant processing requirements (invoice balances that evolve through discounts and penalties) and time-variant capacity (cash inflows and reserves), extending beyond prior deterministic or simplified approaches (Gupta and Dutta, 2011). Second, we identify structural properties of the optimal policy. These properties, which reveal threshold-like decision rules and priority-ordering patterns, reduce the effective complexity of the problem and provide new theoretical insights into stochastic dynamic optimization with evolving capacity and requirements. Third, we develop a heuristic grounded in these structural properties. Computational experiments show that it consistently outperforms widely used strategies such as the snowball and avalanche methods. Importantly, we extend the analysis to non-preemptive settings, where partial payments are disallowed, and demonstrate that our heuristic continues to outperform benchmarks across all problem instances. This robustness underscores the practical value of the structural results, even under stricter real-world constraints. Fourth, our findings contribute to broader literatures on scheduling and dynamic job assignment. SDFSCP shares features with problems in which both resource capacity and job requirements evolve dynamically—contexts rarely addressed jointly in prior research. Methodologically, our formulation and structural results, including diagonal analysis and structural properties using subconvexity and the directional property (Kang et al., 2016), may inform the design of algorithms and heuristics in domains such as healthcare scheduling, cloud computing, and energy management, where time-dependent capacities and processing requirements play a central role. From a managerial perspective, our findings demonstrate that intuitive practices—such as paying invoices with higher penalty rates or smaller amounts—may be systematically suboptimal. The structural properties identified provide actionable guidance for designing payment strategies that achieve significant cost savings while maintaining computational efficiency.

7. Limitations and future research

This study has several limitations that also create opportunities for future research. For tractability, the main model assumes preemptive invoices, that is, partial payments are allowed. While this assumption aligns with many modern enterprise systems, not all financial settings permit such flexibility. We therefore extend our heuristic to non-preemptive settings and show experimentally that it remains robust, consistently outperforming existing heuristics. A formal structural analysis of the non-preemptive case, or theoretical bounds on the performance gap between preemptive and non-preemptive policies, represents a promising and challenging direction.

Computing an exact optimal policy for SDFSCP remains difficult due to the curse of dimensionality (Powell, 2007). Although we characterize structural properties of the optimal policy that guide efficient algorithm design, solving the full model exactly is often computationally intractable. Consequently, some structural insights arise from optimal solutions that cannot be directly computed in large-scale instances. Nonetheless, these properties provide valuable guidance for heuristic design and deepen qualitative understanding of how optimal decisions respond to parameter changes (Smith and McCardle, 2002).

Beyond these limitations, our findings highlight several avenues for future research. The structural properties we identify—threshold-type reservation rules and priority orderings—together with the analytical features underlying them—directional modular properties such as subconvexity and superconcavity, and the directional property (Kang et al., 2016)—suggest broader methodological opportunities for stochastic dynamic resource allocation and scheduling, particularly when resource capacity and requirements fluctuate over time.

These properties may inform approximation schemes (e.g., rollout policies or index-based scoring rules), structure-guided decompositions (e.g., Lagrangian relaxations with efficiently solvable subproblems and column generation), instance-specific performance bounds via dual–primal gaps, and learning-based approaches that embed structural constraints to improve sample efficiency and interpretability. Potential application domains include patient scheduling in healthcare (where treatment complexity evolves with delay), task allocation in cloud computing (where server availability and task requirements fluctuate), and energy storage management (where capacity both depletes and replenishes dynamically).

To illustrate how these structural insights can be operationalized, we describe two concise methodological “templates” that future researchers may adopt.

Structure-Guided Decomposition Template. When dynamic resource allocation problems exhibit diagonal structure (e.g., subconvexity along 45° directions) and pairwise directional properties that induce strict priority ordering, a natural approach is to relax the shared resource constraint via Lagrangian multipliers and decompose the problem into single-entity dynamic programs. The relaxed subproblems inherit threshold-type structures similar to Theorems 3 and 5 and can be solved efficiently using monotone or switching-curve policies, while coordination is achieved through price updates on the shared resource. This template is effective when resource coupling is the primary source of computational complexity and each subproblem preserves the underlying diagonal structure.

Structure-Constrained Learning Template. When exact dynamic programming is computationally intractable, structural properties such as threshold reservation levels and priority orderings can be embedded directly into the policy class. A practical pipeline is to (i) restrict the policy space to threshold-based reservation rules and priority-index rankings consistent with deterministic structural results (e.g., Corollary 6 and Theorem 13), and (ii) estimate remaining parameters via simulation-based optimization or reinforcement learning. Enforcing monotonicity and priority consistency reduces the effective dimensionality of the policy space and improves interpretability. This template is most effective when the structural form of the optimal policy is robust across instances but the stochastic primitives are high-dimensional or partially unknown.

Finally, while the proposed heuristic performs strongly in our experiments, formal optimality guarantees remain open. Deriving theoretical performance bounds or certificates is an important avenue for future research. In addition, empirical validation using proprietary, transaction-level AP and cash flow data would further test generalizability and enable more precise calibration.

Supplemental Material

sj-pdf-1-pao-10.1177_10591478261460124 - Supplemental material for Optimal policy for managing stochastic cash flows in a financial supply chain

Supplemental material, sj-pdf-1-pao-10.1177_10591478261460124 for Optimal policy for managing stochastic cash flows in a financial supply chain by Keumseok Kang, Sushil Gupta, Inkyoung Hur and Sungbum Jun in Production and Operations Management

Footnotes

ORCID iDs

Keumseok Kang

Sungbum Jun

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the IITP-ITRC grant (IITP-2026-RS-2021-II211816), the IITP-Global Data-X Leader HRD program grant (IITP-RS-2024-00440626), and the NRF grants (2024S1A5A2A0303904513; RS-2022-NR068758; RS-2024-00341647; RS-2025-16072058) funded by the Korea Government.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online (DOI: ).

How to cite this article

Kang K , Gupta S, Hur I, Jun S (2026) Optimal Policy for Managing Stochastic Cash Flows in a Financial Supply Chain. \textit{Production and Operations Management} XX(XX): 1–21.

References

Akçay

Balakrishnan

(2010) Dynamic assignment of flexible service resources. Production and Operations Management 19(3): 279–304.

Astvansh

Jindal

(2022) Differential effects of received trade credit and provided trade credit on firm value. Production and Operations Management 31(2): 781–798.

Babich

Kouvelis

(2018) Introduction to the special issue on research at the interface of finance, operations, and risk management (iFORM): Recent contributions and future directions. Manufacturing & Service Operations Management 20(1): 1–18.

Berling

Martínez-de-Albéniz

(2011) Optimal inventory policies when purchase price and demand are stochastic. Operations Research 59(1): 109–124.

Browne

Yechiali

(1990) Scheduling deteriorating jobs on a single processor. Operations Research 38(1): 495–498.

Burkart

Ellingsen

(2004) In-kind finance: A theory of trade credit. American Economic Review 94(3): 569–590.

Cheng

Ding

Lin

(2004) A concise survey of scheduling with time-dependent processing times. European Journal of Operational Research 152(1): 1–13.

Correa

Skutella

Verschae

(2012) The power of preemption on unrelated machines and applications to scheduling orders. Mathematics of Operations Research 37(2): 379–398.

Devalkar

Krishnan

(2019) The impact of working capital financing costs on the efficiency of trade credit. Production and Operations Management 28(4): 878–889.

10.

Gupta

Dutta

(2011) Modeling of financial supply chain. European Journal of Operational Research 211(1): 47–56.

11.

Gupta

Kunnathur

Dandapani

(1987) Optimal repayment policies for multiple loans. Omega 15(4): 323–330.

12.

Huang

(2022) Financing disruptive suppliers: Payment advance, timeline, and discount rate. Production and Operations Management 31(3): 1115–1134.

13.

Investopedia (2025) Buy now, pay later (BNPL): What it is, how it works, pros and cons, investopedia.com.

14.

Kang

Shanthikumar

Altinkemer

(2016) Postponable acceptance and assignment: A stochastic dynamic programming approach. Manufacturing & Service Operations Management 18(4): 493–508.

15.

Katehakis

Melamed

Shi

(2016) Cash-flow based dynamic inventory management. Production and Operations Management 25(9): 1558–1575.

16.

Koole

(1998) Structural results for the control of queueing systems using event-based dynamic programming. Queueing Systems 30(3–4): 323–339.

17.

Kouvelis

(2023) OM forum—Supply chain finance redefined: A supply chain-centric viewpoint of working capital, hedging, and risk management. Manufacturing & Service Operations Management 25(6): 2074–2084.

18.

Kouvelis

Chambers

Wang

(2006) Supply chain management research and production and operations management: Review, trends, and opportunities. Production and Operations Management 15(3): 449–469.

19.

Kouvelis

Zhao

(2018) Who should finance the supply chain? Impact of credit ratings on supply chain decisions. Manufacturing & Service Operations Management 20(1): 19–35.

20.

Lawler

Labetoulle

(1978) On preemptive scheduling of unrelated parallel processors by linear programming. Journal of the ACM 25(4): 612–619.

21.

Lee

H-H

Zhou

Wang

(2018) Trade credit financing under competition and its impact on firm performance in supply chains. Manufacturing & Service Operations Management 20(1): 36–52.

22.

McMillan

(2025) How amazon business & melio are streamlining invoice payment. Procurement Magazine.

23.

Merritt

(2024) Debt avalanche vs. debt snowball: Which is best for you? Investopedia. Retrieved January 19, 2025, from https://www.investopedia.com/articles/personal-finance/080716/debt-avalanche-vs-debt-snowball-which-best-you.asp

24.

Mosheiov

(1994) Scheduling jobs under simple linear deterioration. Computers & Operations Research 21(6): 653–659.

25.

Ding

Cheng

, et al. (2012) Preemptive repayment policy for multiple loans. Annals of Operations Research 192(1): 141–150.

26.

Oron

(2014) Scheduling controllable processing time jobs in a deteriorating environment. Journal of the Operational Research Society 65(1): 49–56.

27.

Peng

Zhou

(2019) Working capital optimization in a supply chain perspective. European Journal of Operational Research 277(3): 846–856.

28.

Petersen

Rajan

(1997) Trade credit: Theories and evidence. The Review of Financial Studies 10(3): 661–691.

29.

Pfohl

H-C

Gomm

(2009) Supply chain finance: Optimizing financial flows in supply chains. Logistics Research 1(3–4): 149–161.

30.

Pinedo

(1983) Stochastic scheduling with release dates and due dates. Operations Research 31(3): 559–572.

31.

Pinedo

(2016) Scheduling: Theory, Algorithms, and Systems. Cham, Switzerland: Springer.

32.

Powell

(2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality (Vol. 703). Hoboken, NJ: John Wiley & Sons.

33.

PYMNTS (2022). Average SMB offers 4.1% early payments discount. PYMNTS.com.

34.

Resolve (2025a) 8 statistics that outline the prevalence of partial payments in B2B. ResolvePay Blog.

35.

Resolve (2025b) 17 statistics on late-payment penalties across wholesale contracts. ResolvePay Blog.

36.

Rios-Solis

Saucedo-Espinosa

Caballero-Robledo

(2017) Repayment policy for multiple loans. PloS One 12(4): e0175782.

37.

Rockafellar

(1970) Convex Analysis. Princeton, NJ: Princeton Univ. Press.

38.

Semaa

Hou

Fadili

, et al. (2020) Design of an efficient strategy for optimization of payment induced by a rational supply chain process: A prerequisite for maintaining a satisfactory level of working capital. Procedia Computer Science 170: 881–886.

39.

Shanthikumar

Yao

(1992) Multiclass queueing systems: Polymatroidal structure and optimal scheduling control. Operations Research 40(3-supplement-2): S293–S299.

40.

Smith

McCardle

(2002) Structural properties of stochastic dynamic programs. Operations Research 50(5): 796–809.

41.

Zhang

Baron

(2019a) A trade credit model with asymmetric competing retailers. Production and Operations Management 28(1): 206–231.

42.

Wang

, et al. (2019b) Collect payment early, late, or through a third party’s reverse factoring in a supply chain. International Journal of Production Economics 218: 245–259.

43.

Zhu

Cao

, et al. (2022) Optimum operational schedule and accounts receivable financing in a production supply chain considering hierarchical industrial status and uncertain yield. European Journal of Operational Research 302(3): 1142–1154.

44.

Zhuang

(2012) Monotone optimal control for a class of Markov decision processes. European Journal of Operational Research 217(2): 342–350.

Optimal policy for managing stochastic cash flows in a financial supply chain

Abstract

Keywords

1. Introduction

Table 1. Payment terms of the three invoices. Payment term I1 I2 I3 Face value $3200 $2000 $5000 Discounted value (30%) $2240 $1400 $3500 Penalty rate per week 0.4% 0.3% 0.5% Discount period due (discount ends after this) 10 15 22 Payment due (penalty accrues after this) 22 30 40

2.1. Supply chain finance, working capital, and trade credits

2.2. Financial supply chain problem: multiple accounts payable optimization

2.3. Related problems in stochastic dynamic optimization

3. Problem definition and formulation

3.1. Stochastic dynamic financial supply chain problem (SDFSCP)

3.2. State variables

3.3. Decision variables

3.5. Stochastic dynamic program

4.1. Existence of an optimal policy

Lemma 1. ϕ t ( y t , λ t ) is continuous and jointly convex in ( y t , λ t ) for t = 0 , 1 , 2 , … , T . Lemma 1 proves the existence of an optimal policy of SDFSCP. 4.2. Structure of an optimal policy in two-dimensional spaces

4.2.1. Optimal policy in ( x l , t u , s , β t )

4.4.1. Single invoice

5.1. A new heuristic: optimal-policies-based heuristic

5.2. Experimental design

5.4. Experimental results

Table 4. Gap performance. SF SC AP AD Overall 4.69 (0.72) [3.5–6.51] 5.46 (0.78) [4.04–7.33] 1.65 (0.27) [1.2–2.23] 4.24 (0.53) [3.36–5.52] 4.01 (1.56) [1.20–7.33]

Table 8. Gap performance—non-preemptive case. SF SC AP AD Overall 4.77 (0.73) [3.57–6.62] 5.57 (0.79) [4.14–7.47] 1.66 (0.28) [1.25–2.3] 4.3 (0.55) [3.41–5.65] 4.08 (1.59) [1.25–7.47]

7. Limitations and future research

Supplemental Material

sj-pdf-1-pao-10.1177_10591478261460124 - Supplemental material for Optimal policy for managing stochastic cash flows in a financial supply chain

Footnotes

ORCID iDs

Funding

Declaration of conflicting interests

Supplemental material

How to cite this article

References

Table 1.
Payment terms of the three invoices.

Payment term I₁ I₂ I₃

Face value $3200 $2000 $5000

Discounted value (30%) $2240 $1400 $3500

Penalty rate per week 0.4% 0.3% 0.5%

Discount period due (discount ends after this) 10 15 22

Payment due (penalty accrues after this) 22 30 40

Lemma 1.
$ϕ_{t} (y_{t}, λ_{t})$ is continuous and jointly convex in $(y_{t}, λ_{t})$ for $t = 0, 1, 2, \dots, T$ .

Lemma 1 proves the existence of an optimal policy of SDFSCP.
4.2. Structure of an optimal policy in two-dimensional spaces

4.2.1. Optimal policy in $(x_{l, t}^{u, s}, β_{t})$

Table 4.
Gap performance.

SF SC AP AD Overall

4.69 (0.72) [3.5–6.51] 5.46 (0.78) [4.04–7.33] 1.65 (0.27) [1.2–2.23] 4.24 (0.53) [3.36–5.52] 4.01 (1.56) [1.20–7.33]

Table 8.
Gap performance—non-preemptive case.

SF SC AP AD Overall

4.77 (0.73) [3.57–6.62] 5.57 (0.79) [4.14–7.47] 1.66 (0.28) [1.25–2.3] 4.3 (0.55) [3.41–5.65] 4.08 (1.59) [1.25–7.47]