Three Methods for Anticipating and Understanding Uncertainty of Outputs from Transportation and Land Use Models

Abstract

This study demonstrates three methods for uncertainty propagation in transportation and land-use models (LUMs): Local Sensitivity Analysis with Interaction (LSAI), Monte Carlo (MC), and Bayesian Melding (BM). Two case-study settings are used to illustrate how these methods work, allowing for inter-method comparisons. LSAI can provide the sign of change implied by changes in model inputs, the relative importance of changes in different inputs, and a decomposition of changes in outputs due to the impact of inputs’ individual and interactive. LSAI is limited to relatively small-size problems because its computing time rises exponentially with the number of (groups of) inputs. Moreover, LSAI obtains only point estimates, while MC and BM methods can deliver entire distributions of each output through an understanding of the uncertainty in all model inputs and parameters. MC delivers each output’s distribution and requires hundreds of samples, especially for more accurate results. Fortunately, MC methods are especially useful for high-dimensional problems because convergence rates are not a function of model dimensionality and errors depend only on sample size and input uncertainties. BM delivers posterior distributions for model outputs, using prior probability distributions and likelihoods of inputs and parameters, along with validation of/comparison to intermediate model outputs. A BM approach can be extremely expensive, in terms of computing time, since it requires several hundred model runs.

Scientific computing involves large-scale simulations to represent real-world phenomena, like the evolution of cities and their traffic patterns. With improvements in computing capabilities and more efficient algorithms, results can be more accurately simulated to represent real-world conditions. Models allow for accurate forecasts over longer prediction periods. Predictions are typically affected by uncertainties in input data and model parameters, and by incomplete knowledge of underlying behaviors ( 1 ). Models (and the systems they represent) are often explicitly stochastic ( 2 ), with random components being generated and used throughout the predictive process. It is very important for system design optimization and policy-making to capture and represent this uncertainty information appropriately in model results. Much work has been done in land-use forecasting (3 –5).

Few studies have begun to examine uncertainty propagation in the context of the integrated land-use transportation modeling framework because of the required integration of metropolitan and statewide transportation plans with land-use plans by the Intermodal Surface Transportation Efficiency Act (ISTEA) of 1991 ( 6 ).

Obviously, one must first identify sources of uncertainty ( 7 , 8 ), to carry out a probabilistic analysis of a system. Pradhan and Kockelman ( 3 ) reviewed the literature on the sources of uncertainty in land-use transportation models. Later, Sevcikova et al. ( 9 ) reviewed the key sources of uncertainty for UrbanSim land-use modeling outputs.

Uncertainty quantification is the process of representing imperfectly known or understood inputs and parameters, propagating this variability through the model system and then characterizing the uncertainty in the model’s results. The outcome usually comes with attached “error bars” to indicate uncertainty ranges. Generally, the uncertainty is usually represented by interval mathematics ( 10 ), fuzzy theory ( 11 ), and probabilistic analysis ( 8 ). In the probabilistic analysis approach, uncertainties described by probability distributions are associated with model inputs to estimate the outputs’ probability distributions by using UrbanSim ( 3 ). However, the methods to research uncertainty propagation in the integrated land-use transportation modeling framework still attract attentions.

This work demonstrates and compares methods for uncertainty propagation in complex transportation and land-use models (LUMs). The following sections provide a literature review, introduction and two test examples for uncertainty propagation in transportation and LUMs, along with a summary of findings.

Literature Review

Many researchers have examined different methods for anticipating and understanding output uncertainties. This paper introduces three methods for forecasting uncertainty in land-use and transportation model outputs: Local Sensitivity Analysis with Interaction (LSAI), Monte Carlo (MC) method, and Bayesian Melding (BM) method. Two reasonably transparent transportation settings are then used to illustrate and review these methods.

LSAI

Sensitivity analysis is the study of how uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs ( 12 ). There are many different ways of conducting sensitivity analyses; however, the various analyses may not produce identical results in answering these question. Thus, Hamby ( 13 ) summarized many available methods to conduct sensitivity analyses by presenting details of the types of sensitivity analyses utilized for various modeling situations. Hamby ( 14 ) compared the assessment of several methods and intended to demonstrate calculation rigor and parameter sensitivity rankings resulting from various sensitivity analysis techniques.

Local sensitivity analysis is the assessment of the local impact of inputs’ variation on the model response by concentrating on the sensitivity in the vicinity of a set of input values. Such sensitivity is often evaluated through gradients or partial derivatives of the output functions at these inputs, so other inputs’ values are held constant when studying the local sensitivity of a specific input. Kockelman ( 15 ) ran a gravity-based land-use model (G-LUM) over three alternative scenarios, with total employment counts (EMP), household counts (HH) and link impedances or travel times (TT) increasing by 50%, utilizing each set individually.

To overcome the limitations of local methods, Saltelli et al. ( 12 ) used the global sensitivity analysis in contrast to local sensitivity analysis to consider the entire range of input variations.

Besides the joint sensitivity of generic model outputs for small changes in all inputs (including parameters) addressed in ( 16 , 17 ), Saltelli and Tarantola ( 18 ) defined group sensitivity indices in the context of a global sensitivity analysis. As far as finite changes are concerned, Borgonovo ( 19 ) introduced sensitivity measures for individual exogenous variables. To find sensitivity measures for specific input factor sets (typically, sets of parameters), Borgonovo and Peccati ( 20 ) proved that a change in model output is decomposed as a function of factors with the same structure as the parameter decomposition and introduced factor finite-change sensitivity indices (FCSI) for model parameters to investigate the relationship between the factor FCSIs and parameter FCSIs. Then, Borgonovo et al. ( 21 ) used the G-LUM by Kockelman ( 15 ) to illustrate LSAI techniques and found that the outputs respond almost additively to variations in model inputs over the given scenarios. Wang and Kockelman ( 22 ) applied LSAI to evaluate random–utility-based multiregional input-output models by producing FCSI for the variation of inputs under different scenarios.

MC Method

MC techniques have been widely used for uncertainty propagation due to their conceptual simplicity and ease of implementation. In MC techniques, one samples random variables, runs an ensemble of simulations, and simply presents the distribution of outputs, for all uncertainty statistics. A reasonable result usually requires many ensemble runs, which is computationally expensive for large-scale systems. More efficient approaches need to be developed to represent and propagate uncertainties in large-scale simulations ( 23 ).

Zhao and Kockelman ( 24 ) conducted a study of the propagation of uncertainty in four-step travel demand models by using MC simulation to quantify variability in model outputs. Krishnamurthy and Kockelman ( 25 ) researched propagation of uncertainty in transportation LUMs through multivariate MC sampling of 200 scenarios. Harvey and Deakin ( 26 ) conducted a similar study to consider uncertainty in population growth, fuel price and household income levels in the Los Angeles region. Thompson et al. ( 27 ) examined the impact of higher-than-projected population estimates on emission trends for metropolitan regions in California. Saadi et al. ( 28 ) integrated Markov Chain MC simulation and profiling-based methods to capture the behavioral complexity and the great heterogeneity of agents of the true population for large-scale microsimulation scenarios of transportation and urban systems. Clay and Johnston ( 29 ) aimed to research which sources of uncertainty have the largest impact on outputs in a fully integrated land-use and transportation forecasting models. They ran all possible combinations of each variable at each level of uncertainty to analyze the impacts of uncertainty on the model’s outputs. Clay et al. ( 30 ) used point-estimate inputs to research how uncertainty affects the Large Zone Economic Module outputs by tracking the effects of this uncertainty through the various submodels to the model outputs.

BM Method

Pradhan and Kockelman ( 3 ) examined the propagation of uncertainty in the context of UrbanSim. However, this study analyzed only the sensitivity to a small sample of selected input values and explored the effect of these changes on outputs, in addition to simple stochastic simulation error from variation in random seeds.

Another approach to extrapolate prediction accuracy for LUMs was taken in ( 31 ). Models calibrated at different time points were used to simulate the present land-cover change and estimate how accurately the model will predict the future through a measure derived by a validation with empirical data. Sevcikova et al. ( 9 ) developed and then applied BM by extending an earlier method to agent-based stochastic models for calibrating a stochastic model system with respect to uncertainty. This method encodes all available information about model inputs and outputs in terms of prior probability distributions and likelihoods, and used Bayes’ theorem to obtain a posterior distribution for any quantity. However, BM can be very computationally intensive since it requires several hundred runs of the model. Thus, they provided various ways to reduce the required time ( 9 ). Based on work ( 9 , 32 ), Sevcikova et al. ( 2 ) described the first incorporation of uncertainty assessment into the development of an official land-use forecast published by a metropolitan planning organization. They demonstrated how BM for assessing uncertainty could be used to support the application of an academically founded land-use model.

Introduction to Three Methods

Introduction to LSAI

Mathematical models are used to denote input-output mappings as follows:

y = f (x), f : Ω_{x} \to R

(1)

where $y$ is the output of interest, $Ω_{X} \subseteq R^{K}$ and $x = (x_{1}, x_{2}, \dots, x_{K})$ , $x \in Ω_{X}$ is the vector of inputs, and K is the number of inputs whose variations are of interest. Here, the components $x_{i} (i = 1, 2, \dots, K)$ are supposed to be independent.

Therefore, the base-case output of the simulation $y^{0} = f (x^{0})$ can be obtained by the simulation with inputs to a base-case scenario, $x^{0}$ . Similarly, different outputs depend on alternative values of inputs, $y^{s} = f (x^{s})$ , where $s = 1, 2, \dots, S$ . The analyst knows the response of the output in each scenario, although they have no information about the sources of change ( 21 ). The change from scenario 0 to scenario 1 of the inputs induces the change $Δ y = y^{1} - y^{0}$ in the output. And the change can be decomposed by using by a multivariate Taylor expansion of $Δ y$ when supposing that $f (x)$ is $r$ (r≤K) times differentiable at $x^{0}$ ( 3 , 14 , 20 ):

\begin{matrix} Δ y = y^{1} - y^{0} = \sum_{k_{1} = 1}^{K} f_{k_{1}}^{'} (x^{0}) Δ x_{k_{1}} + \sum_{k_{1} < k_{2}}^{K} f_{k_{1}, k_{2}}^{″} (x^{0}) Δ x_{k_{1}} Δ x_{k_{2}} + \dots \\ + \sum_{k_{1} < k_{2} < \dots < k_{r}}^{K} f_{k_{1}, k_{2}, \dots, k_{r}}^{r} (x^{0}) Δ x_{k_{1}} Δ x_{k_{2}} \dots Δ x_{k_{r}} + \dots + o ({| | h | |}^{r}) \end{matrix}

(2)

$h = ma x_{1 \leq i \leq r} Δ x_{k_{i}}$ , and o( ${| | h | |}^{r}$ ) denotes the infinitely small quantity of $| | h | |$ .

Then, $f (x)$ . is $K$ times differentiable at $x^{0}$ , and one can obtain the following:

Δ y = f (x^{1}) - f (x^{0}) = \sum_{k_{1} = 1}^{K} Δ_{k_{1}} f + \sum_{k_{1} < k_{2}}^{K} Δ_{k_{1}, k_{2}} f + \sum_{k_{1} < k_{2} < k_{3}}^{K} Δ_{k_{1}, k_{2}, k_{3}} f \dots + Δ_{1, 2, \dots, K} f

(3)

where

{\begin{matrix} Δ_{k_{1}} f = f (x_{k_{1}}^{1}, x_{~ k_{1}}^{0}) - f (x^{0}) \\ Δ_{k_{1}, k_{2}} f = f (x_{k_{1}}^{1}, x_{k_{2}}^{1}, x_{~ (k_{1}, k_{2})}^{0}) - Δ_{k_{1}} f - Δ_{k_{2}} f - f (x^{0}) \\ Δ_{k_{1}, k_{2}, k_{3}} f = f (x_{k_{1}}^{1}, x_{k_{2}}^{1}, x_{k_{3}}^{1}, x_{~ (k, k_{2}, k_{3})}^{0}) - Δ_{k_{1}, k_{2}} f \\ - Δ_{k_{1}, k_{3}} f - Δ_{k_{2}, k_{3}} f - Δ_{k_{1}} f - Δ_{k_{2}} f - Δ_{k_{3}} f - f (x^{0}) \end{matrix}

(4)

and where $(x_{k_{1}}^{1}, x_{~ k_{1}}^{0})$ denotes that the $k_{1}$ th element of the x vector, $x_{k_{1}}^{1}$ is set at its Scenario 1, while all other variables are at their Scenario 0 values. Based on such a decomposition, FCSI can be computed as follows:

φ_{k_{1,} k_{2,} \dots, k_{r}}^{r} = Δ_{k_{1,} k_{2,} \dots, k_{r}} f

(5)

where $k_{1,} k_{2,} \dots, k_{r}$ denote a group of r indices and $φ_{k_{1,} k_{2,} \dots, k_{r}}^{r}$ is the portion of $Δ y$ due to the interaction of inputs corresponding to the selected indices.

For the $k_{i}$ th ( $k_{i} = 1, 2, \dots, K$ ) element of the x vector, $x_{k_{i}}$ , the first-order FCSI are $φ_{k_{i}}^{1} = Δ_{k_{i}} f$ and the total-order indices are $φ_{k_{i}}^{T} = Δ_{k_{i}} f + \sum_{k_{i} < k_{2}}^{K} Δ_{k_{i}, k_{2}} f + \dots + Δ_{1, 2, \dots, K} f$ , where $φ_{k_{i}}^{T}$ is the total contribution of $x_{k_{i}}$ to $Δ y$ , and is the sum of the individual contribution of $x_{k_{i}}$ , plus all the contributions due to the interaction of $x_{k_{i}}$ with the remaining inputs. Thus, the index $φ_{k_{i}}^{I} = φ_{k_{i}}^{T} - φ_{k_{i}}^{1}$ represents the effect of interactions associated with $x_{k_{i}}$ .

According to $φ_{k_{i}}^{T}$ ’s definition, it can be computed as follows:

φ_{k_{i}}^{T} = f (x^{1}) - f (x_{k_{i}}^{0}, x_{~ k_{i}}^{1})

(6)

where $f (x^{1})$ is the value of the output in scenario 1 and $f (x_{k_{i}}^{0}, x_{~ k_{i}}^{1})$ is the point obtained with $x_{k_{i}}$ that remains at the base-case scenario but all other inputs at scenario 1.

As discussed in the literature ( 18 , 33 ), the sign of the first-order indices $(φ_{k_{i}}^{1})$ is the sign change in y due to the individual change in $x_{k_{i}}$ . The sign of $φ_{k_{1,} k_{2,} \dots, k_{r}}^{r}$ is the sign of the interaction between the inputs $x_{k_{1}}$ , $x_{k_{2}}$ and $x_{k_{r}}$ . The total-order indices $(φ_{k_{i}}^{T})$ are the appropriate sensitivity measures, since they deliver not only the individual effects of inputs, but also account for interaction effects of inputs. The magnitudes of $φ_{k_{1,} k_{2,} \dots, k_{r}}^{r}$ provide the natural sensitivity measures.

All FCSI can be computed by use of 2^K simulations if there are K (or group of) exogenous variables whose variations are of interest. The triplet $(φ_{k_{i}}^{1}, φ_{k_{i}}^{I}, φ_{k_{i}}^{T})$ can be computed at the cost of 2K simulations, instead of 2^K. This computational burden reduction result makes the sensitivity measures applicable also to complex simulation codes.

Introduction to MC Methods

The MC method is a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. MC is useful for simulating phenomena with significant uncertainty in inputs and systems with many coupled degrees of freedom ( 34 , 35 ), especially for high-dimensional problems because the convergence and convergence rate are not concerned with the dimensions of the problems. The error of MC is defined by $ϵ = \frac{λ_{α} σ}{\sqrt{n}}$ , where $n$ is the number of samples, $σ$ is the standard deviation, $λ_{α}$ is standard normal deviate relative with the confidence level $α$ . MC involves random sampling from the distribution of inputs, and successive model runs until a statistically significant distribution of outputs is obtained. It can be used to solve problems with probability structures, or non-probabilistic problems such as finding the area under a curve. However, this kind of method requires many samples. To achieve computational efficiency, methods that sample the input distribution in an efficient manner have been introduced. One such variant of the standard MC method is the Latin Hypercube (LH) sampling method. In this method, the range of probable values for each input parameter is divided into ordered segments such that the parameter space, consisting of all uncertain parameters, is partitioned into cells with equal probabilities. Thus, parameter estimates are sampled in an efficient manner, since each parameter is sampled only once from each of its possible segments. The advantage of this approach is that it allows representation of the extremes in the probability distribution of the outputs.

Introduction to BM Method

Sevcikova et al. ( 2 ) presented a figure to show the basic concept of BM developed for deterministic models. There is a prior distribution of model inputs $q (X)$ from which one draws input values $X_{i}$ for $i = 1, \dots, I$ . The model runs I times from the base year to the present year and for each input $X_{i}$ . It produces as output the quantity of interest, $Φ_{i}$ . The model can be viewed as a mapping, M, from the space of inputs to the space of outputs, which is denoted by $Φ = M_{Φ} (X)$ . The ‘present’ time is defined as a time point with observed data available. The observed data is denoted by $ψ$ and used to compute a weight $ω_{i}$ for each input $X_{i}$ : $ω_{i} = L (Φ_{i}),$ where $Φ_{i} = M (X_{i})$ . Here, $L (Φ_{i})$ is the likelihood of the model outputs given the observed data $ψ$ , $L (Φ_{i}) = p (ψ | Φ_{i}) \propto p (ψ | X_{i}) = \sum_{k = 1}^{K} p (ψ_{k} | X_{i})$ . For each of the I runs, the model is run forward until a future time when making a prediction. The results, calculated by running the ith model, are denoted by $Y_{i}$ . The posterior distribution of $Y$ is approximated by a discrete distribution with values $Y_{i}$ having probabilities proportional to $ω_{i}$ .

The primary BM stages or steps are as follows ( 9 ):

Draw a sample { $X_{1}, X_{2}, \dots, X_{I}$ } of values of the inputs from the prior distribution $q (X)$ .

For each $X_{i}$ , run the model to obtain $Φ_{i}$ .

Compute weights $ω_{i} = L (Φ_{i})$ . Here, an approximate posterior distribution of inputs with values { $X_{1}, X_{2}, \dots, X_{I}$ } and probabilities proportional to { $ω_{1}, ω_{2}, \dots, ω_{I}$ } are obtained.

The posterior distribution of $Φ$ is no longer approximated by the set { $ω_{i}$ } but now has a finite mixture distribution of the following form:

π (Φ) = Π_{i = 1}^{I} w_{i} p (Φ | X_{i})

(7)

In Equation 7, the conditional distribution $p (Φ | X_{i})$ has an assumed parametric form that reflects the additional sources of variation. The posterior distribution of $Y$ has a similar form, namely:

π (Y) = Π_{i = 1}^{I} w_{i} p (Y | X_{i})

(8)

Two Test Examples

To illustrate these different methods for uncertainty propagation in transportation and LUMs, Example 1 offers LSAI and MC applications and Example 2 offers a BM application.

Example 1. Figure 1’s small test network enables application of a simple travel demand model (TDM), with three nodes (with node 1 as the origin, and 2 and 3 as destinations) and four links. (*,*) denotes the free-flow travel time and capacity on each link.

Figure 1.

The test network.

This TDM’s equations are as follows:

Z = x_{1} x_{2} + x_{3}

(9)

Z_{12} = Z^{*} \frac{e^{x_{4^{*} G C_{12}}}}{e^{x_{4^{*} G C_{12}}} + e^{x_{4^{*} G C_{13}}}}

(10)

Z_{13} = Z - Z_{12} = Z^{*} \frac{e^{x_{4^{*} G C_{13}}}}{e^{x_{4^{*} G C_{12}}} + e^{x_{4^{*} G C_{13}}}}

(11)

G C_{12} = \min {L C_{1}, L C_{2}}

(12)

G C_{13} = \min {L C_{3}, L C_{4}}

(13)

L C_{i} = γ^{*} tim e^{i} + {Toll}^{*} {t_{f}}^{i}

(14)

where Z denotes number of trips generated, Z_1j denotes number of trips going from origin node 1 to destination node j (j = 2, 3), GC_1j denotes generalized cost going from node 1 to destination nodes j (j = 2,3), and LCi denotes generalized cost of using link i (i=1,2,3,4). This assumes the travelers have the same value of time (VOT), which is $γ = $ 6 / hour$ ( 36 ). Tolls vary by link, with a toll of $ 0.55/mile on links 1 and 3, and tolls of $ 0.20/mile on links 2 and 4, as used in ( 36 ). The four model inputs $x_{1}, x_{2}, x_{3}, x_{4}$ follow lognormal distributions, and their summary statistics are as follows:

$x_{1}$ = population of origin zone 1, with mean = 2000 persons, standard deviation (SD) = 200 persons, and thus coefficient of variation (CoV = SD/mean) = 0.10;

$x_{2}$ = average trip-making rate per person per day, with mean = 2.303 trips/day/person, SD = 0.2303, and CoV = 0.10;

$x_{3}$ = trips generated by visitors, with mean = 1000 trips/day, SD = 100, and CoV = 0.10;

$x_{4}$ = impedance parameter for generalized cost of accessing each destination, with mean = -0.02, SD = 0.002, and CoV = 0.10 (and negative values simply obtained after drawing from the non-negative lognormal distribution).

Equation 9 describes trip generation per zone, and Equations 10 and 11 describe trip distribution. There is no mode split because only one mode is used in this example. Route choice is obtained via traffic assignment based to the network, assuming shortest-path user equilibrium ( 37 ) and feedback of chosen routes and TT to trip distribution.

The Bureau of Public Roads (BPR) function is used here for TT on each link i, with $tim e^{i} = {t_{f}}^{i} (1 + α^{*} {(\frac{v_{i}}{c_{i}})}^{β})$ , where ${t_{f}}^{i}$ is link i’s free-flow travel time (i=1,2,3,4), $v_{i}$ is the assigned traffic flow on link i, $c_{i}$ is link i’s capacity flow rate, and α and β are link-congestion parameters. The National Cooperative Highway Research Program (NCHRP) Report 365 ( 38 ) suggests values of α = 0.84 and β = 5.5, when using true capacity values (c_i).

LSAI for Example 1

LSAI is applied on the test example by increasing all inputs by 10%. The change of each model’s outputs (traffic flow on links) from $x^{0}$ to $x^{1}$ can be decomposed into 15 terms that account for the individual changes in $x_{1}$ , $x_{2}$ , $x_{3}$ , $x_{4}$ , to their interactions in pairs, and in the residual term that contains their overall and residual interactions. Here, $2^{4} = 16$ model simulations are needed. If we compute the first-order indices, the total-order indices and the effect of interactions associated with every exogenous variable, only $2 * 4 = 8$ simulations are needed. Figure 2 illustrates the ith-order indices as well as their sum and the total-order indices. Figure 3 shows the elasticity of the first-order indices and the total-order indices, where elasticities are computed as $\frac{Δ v / v}{Δ x / x}$ .

Figure 2.

The ith-order indices as well as their sum and the total-order indices using LSAI: (a) first-order indices; (b) second-order indices; (c) third-order indices; (d) fourth-order indices; (e) sum of ith-order indices; (f) total-order indices; (g) effect of interactions associated with inputs.

Figure 3.

Elasticity of first-order and total-order indices of link flows using LSAI: (a): elasticity of first-order index; (b): the elasticity of total-order index.

Figure 2a shows that increasing inputs $x_{1}$ , $x_{2}$ and $x_{3}$ by 10% will lead to the largest increase of traffic flows on link 4. Moreover, increasing $x_{4}$ by 10% will lead to a decrease of traffic flows on links 1 and 2, and an increase of traffic flows on links 3 and 4. The individual effect of $x_{1}$ is the most important for link 3 and 4 traffic flows. The individual effect of $x_{4}$ is the most important for link 1 and 2 traffic flows. Figure 2b shows how interaction effects between inputs x₂ and x₃ deliver the largest increase of traffic flows on all links among all second-order interactions. Other interactions (with the exception of the interaction between x₁ and x₃, which is 29.2) are all very small (with all absolute values under 15). Signs of some second-order indices are negative, such as the interactions between $x_{2}$ and $x_{4}$ , and $x_{3}$ and $x_{4}$ on links 1 and 2, the interaction between $x_{1}$ and $x_{4}$ on link 3, the interactions between $x_{1}$ and $x_{2}$ , $x_{1}$ and $x_{3}$ , and $x_{1}$ and $x_{4}$ on link 4.

Figure 2c shows that almost all third-order indices deliver negative effects on all links’ traffic flows with the exception of third-order interactions among $x_{1}$ , $x_{2}$ and $x_{4}$ , and among $x_{1}$ , $x_{3}$ and $x_{4}$ , which have positive effects on link 3 and 4 traffic flows. Interactions among $x_{1}$ , $x_{2}$ and $x_{3}$ , and among $x_{2}$ , $x_{3}$ and $x_{4}$ have almost the same effect on all links’ traffic flows. The interactions among $x_{1}$ , $x_{2}$ and $x_{4}$ , and among $x_{1}$ , $x_{3}$ and $x_{4}$ have very small (positive or negative) effects on all links’ traffic flows.

Figure 2d shows how the fourth-order index has positive effects on all links, with its largest effect on link 4’s traffic flow and its smallest effect on the traffic flow of links 1 and 3. Figure 2e shows that the sum of the third-order index has non-positive effects on all links (zero for link 1, negative for other links). The sums of other order indices positively affect all links. Among all sums of the order indices with positive effects on all links, the sum of the fourth-order has the biggest effect on the traffic flow on links 1, 2, and 3, while the sum of the first-order has the biggest effect on the traffic flow on link 4. Figure 2f shows that the total-order indices positively affect all links. Inputs $x_{1}$ and $x_{3}$ have almost identical and the most impactful effects on all links’ traffic flows, with x₂ coming next and $x_{4}$ last.

Figure 2g shows the effect of interactions associated with $x_{1}$ , $x_{2}$ , $x_{3},$ and $x_{4}$ . The effect of interactions associated with every input is almost the same on the traffic flows on each link. Compared with the first-order indices in Figure 2a, the effect of first-order indices is less than that of interaction on the traffic flows on links 1, 2, and 3, while the effect of first-order indices is less than that of interaction on the traffic flows on link 4. The effect of interactions associated with $x_{2}$ and $x_{4}$ is almost same as that of the total-order indices on traffic flows on all links.

Finally, Figure 3 provides elasticities of first-order and total-order indices. They have the same trend with the first-order index and the total-order index in Figure 2a and f , which resulted from the same increasing of inputs of this test example.

MC for Example 1

MC was applied with the test example by randomly generating 16 different sets of model inputs and parameter values, and then solving the TDM for its equilibrium. In general, the number of simulation runs needs to be large enough to obtain robust and accurate results. Here, 16 different sets of inputs were randomly chosen, using the Example 1 distributions described earlier (for x₁ through x₄). Final link flows were obtained from the converged user equilibrium (UE) assignment results. The mean, SD and CoV for four links’ traffic flows are computed as follows:

Link 1’s mean (veh/day), SD and CoV values are 1006, 58.83 and 0.0584.

Link 2’s mean (veh/day), SD and CoV values are 1279, 165.6 and 0.1294.

Link 3’s mean (veh/day), SD and CoV values are 1569, 21.87 and 0.0139.

Link 4’s mean (veh/day), SD and CoV values are 1307, 386.4 and 0.2957.

The CoVs of link 2 and 4 traffic flows are larger than the inputs’ starting CoV values (of 0.10), suggesting that final flow uncertainties/variations can be compounded and end higher than input uncertainties. CoV values for link 1 and 3 flows are smaller than 0.06, which is lower than any inputs’ uncertainty (as measured using CoV, or SD/mean). Moreover, the ratios of traffic flow versus capacity (v/c ratios) on links 1 through 4 are computed to be 0.84, 0.71, 0.78 and 0.44, respectively. The flow uncertainty appears not to have a strong relation with congestion, which is consistent with the result in ( 24 ).

For better understanding and interpretation of the results, ordinary least squares (OLS) regression was used to identify model inputs that are key contributors to uncertainty in model output (Table 1), assuming simple linear relationships between inputs (or combinations and transformations of model inputs, if one desires) and outputs. Only 16 input samples were used here, yet t-statistics and R2 values are very high in this toy network, as shown in Table 1.

Table 1.

OLS Parameter Estimates for Traffic Flows on Links (n = 16 Simulations)

	v₁		v₂		v₃		v₄
Traffic flow	Coefficients	t Stat	Coefficients	t Stat	Coefficients	t Stat	Coefficients	t Stat
Intercept	285.1	12.3	−775.5	−14.6	1290	37.3	−3815	−93.9
x ₁	0.1796	31.5	0.5088	38.9	0.0608	7.14	1.199	120.0
x ₂	0.0916	9.76	0.2507	11.7	0.0549	3.92	0.6049	36.8
x ₃	156.5	31.5	443.3	38.9	52.95	7.14	1045	120.0
x ₄	−2934	−6.25	−7267	−6.76	−493.8	−0.705	−3899	−4.74
Adjusted R²	0.9925		0.9950		0.8789		0.9995

BM for Example 2

Example 2. To illustrate how BM works, an integrated land-use and transportation model is presented, based on the following model equations:

x_{1}^{t_{j}} = {(1 + δ)}^{Δ t} x_{1}^{t_{j - 1}} (j = 1, 2, \dots), where Δ t = t_{j} - t_{j - 1}

(15)

Z^{t_{j}} = x_{1}^{t_{j}} x_{2} + x_{3}

(16)

Z_{12}^{t_{j}} = {Z^{t_{j}}}^{*} \frac{e^{x_{4^{*} G C_{12}}}}{e^{x_{4^{*} G C_{12}}} + e^{x_{4^{*} G C_{13}}}}

(17)

Z_{13}^{t_{j}} = Z^{t_{j}} - Z_{12}^{t_{j}} = {Z^{t_{j}}}^{*} \frac{e^{x_{4^{*} G C_{13}}}}{e^{x_{4^{*} G C_{12}}} + e^{x_{4^{*} G C_{13}}}}

(18)

{GC}_{12}^{t_{j}} = \min {{LC}_{1}^{t_{j}}, {LC}_{2}^{t_{j}}}

(19)

{GC}_{13}^{t_{j}} = \min {{LC}_{3}^{t_{j}}, {LC}_{4}^{t_{j}}}

(20)

L {C_{i}}^{t_{j}} = γ^{*} tim e^{i} + {Toll}^{*} {t_{f}}^{i}

(21)

where $Z^{t_{j}}$ denotes number of trips generated at time $t_{j}$ , $Z_{1 k}^{t_{j}}$ denotes number of trips going from origin node 1 to the destination node k (k = 2,3) at time $t_{j}, {GC}_{1 k}^{t_{j}}$ denotes generalized cost going from the node 1 to k at time $t_{j}$ , ${LC}_{i}^{t_{j}}$ denotes generalized costs of each link i (i = 1,2,3,4) at time $t_{j}$ , and ${x_{1}}^{t_{j}}$ denotes zone 1’s population at time $t_{j}$ . ${x_{1}}^{t_{0}}$ denotes the population of origin zone 1 at time $t_{0}$ , with mean = 1000 persons, SD = 100 persons, and CoV = 0.10. $δ$ denotes the annual percentage increase in population, with a mean of 2%, SD = 0.2% and CoV = 0.1. Other variables and value assumptions are the same as those provided above, in Example 1.

In this Example 2, Equation 15 is a simple LUM, while Equations 16–21 describe a simple TDM. The LUM reacts to travel network changes, and vice versa. However, the link from TDM to LUM is not as strong as the link from LUM to TDM because it is difficult to move one’s home (and/or business) and costly to construct new buildings. Thus, here we investigate only the forward effect of land-use changes on the TDM outputs, rather than allow a feeding back of TDM outputs to LUM decisions. In other words, TT from the TDM’s traffic assignment step feed forward into the subsequent year’s LUM equation.

Here, 16 samples of random input values were chosen, and the model was run three times for each set of inputs (including all uncertain parameters). Therefore, I = 16, J = 3, and K = 4. Starting back in year t₀ = 2010, t₁ = 2015 served as the “present” year (for output validation, or comparisons to measured/regionally observed values), and t₂ = 2020 served as the final prediction year. According to BM, $ω_{i}$ can be computed using $p (ψ_{k} | X_{i})$ , where $(ψ_{k} | X = X_{i}) = μ_{ik} + a + ε_{ik}$ , and where $ε_{ik}$ is independent and identically distributed (i.i.d.) as normal variable $N (0, σ_{i}^{2})$ . Here, $μ_{ik}$ denotes the expected output, $ε_{ik}$ denotes model errors, and $a$ denotes the model’s overall bias. $μ_{ik}$ , ${\hat{σ}}_{δ}^{2}$ , ${\hat{σ}}_{i}^{2}$ , and $a$ can be estimated using approximate maximum likelihood methods ( 39 ). Thus, $ψ_{k} | X_{i} ~ N (\hat{a} + {\hat{μ}}_{ik}, v_{i})$ with $v_{i} = {\hat{σ}}_{i}^{2} + \frac{{\hat{σ}}_{δ}^{2}}{J}$ , where ${\hat{μ}}_{ik} = \frac{1}{J} \sum_{j} Φ_{ijk}$ , ${\hat{σ}}_{δ}^{2} = \frac{1}{IJK} \sum_{ijk} {(Φ_{ijk} - {\hat{μ}}_{ik})}^{2}$ , $\hat{a} = \frac{1}{IK} \sum_{ik} (ψ_{k} - {\hat{μ}}_{ik})$ , and ${\hat{σ}}_{i}^{2} = \frac{1}{K} \sum_{k} {(ψ_{k} - \hat{a} - {\hat{μ}}_{ik})}^{2}$ . Therefore, $ω_{i}$ can be computed as follows:

\begin{matrix} ω_{i} \propto p (Φ | X_{i}) = Π_{k = 1}^{K} p (ψ_{k} | X_{i}) \\ = Π_{k = 1}^{K} \frac{1}{\sqrt{2 π v_{i}}} \exp [- \frac{1 / 2 {(y_{k} - \hat{a} - \hat{μ_{ik}})}^{2}}{v_{i}}] \end{matrix}

(22)

where $\hat{a}$ and ${\hat{σ}}_{δ}^{2}$ are computed to be 0.18 and 6.65, respectively. Table 2 provides estimates of $ω_{i}$ and ${\hat{σ}}_{i}^{2}$ . To compute the posterior distribution of link-level traffic flows, the propagation factors $b_{a}$ and $b_{v}$ were set to 5 years and 5 years, or $\frac{2015 - 2010}{2020 - 2015}$ .

Table 2.

${\hat{σ}}_{i}^{2}$ and $ω_{i}$ Over All i (n = 16)

$i$	${\hat{σ}}_{i}^{2}$	$ω_{i}$	$i$	${\hat{σ}}_{i}^{2}$	$ω_{i}$
1	5,832	0.0017	9	3,433	0.0052
2	6,336	0.0014	10	3,748	0.0042
3	6,032	0.1630	11	1,0140	0.0014
4	5,419	0.3250	12	9,270	0.0197
5	2,182	0.0226	13	1,947	0.2650
6	2,303	0.0163	14	1,862	0.1490
7	13,883	0.0045	15	20,352	0.0014
8	12,827	0.0058	16	19,022	0.0017

Note: ${\hat{σ}}_{i}^{2}$ = the variance of model errors $ε_{ik}$ ; $ω_{i}$ = the weight of conditional distribution.

Posterior distributions of link-level traffic flows (Y_k values) are given by a mixture of normal distributions, as follows:

\begin{matrix} π (Y_{k}) = \sum_{i = 1}^{I} ω_{i} N (\hat{a} b_{a} + m_{ik}, ({\hat{σ}}_{i}^{2} + \frac{{\hat{σ}}_{δ}^{2}}{J}) b_{v}) \\ = N (\sum_{i = 1}^{I} ω_{i} (\hat{a} b_{a} + m_{ik}), \sum_{i = 1}^{I} ω_{i} ({\hat{σ}}_{i}^{2} + \frac{{\hat{σ}}_{δ}^{2}}{J}) b_{v}) \end{matrix}

(23)

where $m_{ik} = \frac{1}{J} \sum_{j = 1}^{J} Y_{ijk}$ .

Therefore, link k’s traffic flow outputs will follow normal distribution: $N (μ_{k}, σ^{2})$ , where $μ_{k} = \sum_{i = 1}^{I} ω_{i} (\hat{a} b_{a} + m_{ik})$ and $σ^{2} = \sum_{i = 1}^{I} ω_{i} ({\hat{σ}}_{i}^{2} + \frac{{\hat{σ}}_{δ}^{2}}{J}) b_{v}$ . Using these values, one can obtain the following results: $σ^{2} = 1110, μ_{1} = 912, μ_{2} = 862, μ_{3} = 1540, and μ_{4} = 646 .$

In this way BM methods can provide model outputs (future-year predictions) by using present-year inputs. BM assesses output uncertainties for a land-use transport model by combining all available information on model inputs and outputs in a Bayesian way, to provide a posterior distribution of outputs.

Conclusion

In this paper, two small travel demand modeling examples (one with a land-use equation) were used to demonstrate how LSAI, MC, and BM methods of uncertainty characterization function and to illuminate their strengths and weaknesses. LSAI can provide the sign of change implied by the changes of inputs, the relative importance of change of inputs and the decomposition of change of output into the individual and interaction changes of inputs. Although LSAI needs just about 2 ×K simulations to obtain first-order impact indices, a model’s total-order indices and interactive effects associated with K (groups of) inputs require 2^K simulations, thus rising exponentially in respect to the number of exogenous (input) variables, to obtain the individual effects and all interaction effects.

Thus, LSAI will be effective only for problems with a small number of exogenous variables because it needs thousands of simulations for models with more than 10 inputs. Moreover, LSAI provides only point estimates, while MC and BM methods can provide distributions of needed outputs. MC involves random sampling of the distribution of inputs and successive model runs until obtaining a statistically significant distribution of outputs. MC can be used to solve problems with probability structures or non-probabilistic problems (such as finding the area under a curve), and can be used with time-series models (like land-use and travel changes, forward in time) or one-time-point models, where no validation data are needed. It is straightforward to obtain output distributions via MC random sampling of the distribution of inputs and parameters. However, such activities require hundreds of samples, even for small scale problems, to obtain accurate results. MC is especially useful for high-dimensional problems because its convergence and convergence rate do not depend on the problem’s dimensionality and prediction errors depend only on the number of samples and the standard deviations of inputs.

While LSAI can only address deterministic problems, MC and BM methods are useful in solving both deterministic and stochastic problems. BM is a way of putting analysis of simulation models on a solid statistical basis. Its advantage is that it can obtain the posterior distribution of all model outputs from prior probability distributions and likelihoods of all model inputs (which include model parameters). Users are provided with probability intervals around forecasts with calibrated uncertainty statements, which add value to model validation, scenario comparison and external review and comment procedures. However, BM can be extremely computationally expensive, since it requires several hundred runs of the model. Moreover, intermediate outputs must also be known for intermediate validation. As a result, MC appears to be the best way for complex system modelers and planners to anticipate uncertainty in our urban systems’ futures.

Footnotes

Acknowledgements

This paper was financially supported by the National Natural Science Foundation of China (71471167). The authors of this paper wish to thank Scott Schauer-West for his editing and administrative support, and several anonymous reviewers for their helpful suggestions.

Author Contributions

The authors confirm paper contributions as follows: study conception and design: KK and GW; data collection/model specifications: GW; analysis and interpretation of results: KK and GW; draft manuscript preparation: GW. All authors reviewed the results and approved the final version of the manuscript.

The Standing Committee on Transportation Demand Forecasting (ADB40) peer-reviewed this paper (18-00488).

References

Rodier

C. J.

Johnston

R. A.

Uncertain Socioeconomic Projections Used in Travel Demand and Emissions Models: Could Plausible Errors Result in Air Quality Nonconformity?

Transportation Research Part A: Policy and Practice, Vol. 36, No. 7, 2002, pp. 613–631. https://doi.org/10.1016/S0965-8564(01)00026-X.

Sevcikova

Simonson

Jensen

Assessing and Integrating Uncertainty into Land-Use Forecasting. Journal of Transport & Land Use, Vol. 8, No. 3, 2015, pp. 57–70. http://dx.doi.org/10.5198/jtlu.2015.614.

Pradhan

Kockelman

Uncertainty Propagation in an Integrated Land Use-Transport Modeling Framework: Output Variation via UrbanSim. Transportation Research Record: Journal of the Transportation Research Board, 2002. 1805: 128–135.

Waddell

UrbanSim: Modeling Urban Development for Land Use, Transportation, and Environmental Planning. Journal of the American Planning Association, Vol. 68, No. 3, 2002, pp. 297–314. http://dx.doi.org/10.1080/01944360208976274.

Waddell

Borning

Noth

Freier

Becke

Ulfarsson

Microsimulation of Urban Development and Location Choices: Design and Implementation of UrbanSim. Networks and Spatial Economics, Vol. 3, No. 1, 2003, pp. 43–67. https://doi.org/10.1023/A:1022049000877.

Miller

E. J.

Kriger

D. S.

Hunt

J. D.

Research and Development Program for Integrated Urban Models. Transportation Research Record: Journal of the Transportation Research Board, 1999. 1685: 161–170.

Dubus

I. G.

Brown

C. D.

Beulke

Sources of Uncertainty in Pesticide Fate Modelling. Science of the Total Environment, Vol. 317, No. 1–3, 2003, p. 53. https://doi.org/10.1016/S0048-9697(03)00362-0.

Papoulis

Pillai

S. U.

Probability, Random Variables, and Stochastic Processes. McGraw-Hill Europe, 2002.

Sevcikova

Raftery

A. E.

Waddell

P. A.

Assessing Uncertainty in Urban Simulations using Bayesian Melding. Transportation Research Part B Methodological, Vol. 41, No. 6, 2007, pp. 652–669. https://doi.org/10.1016/j.trb.2006.11.001.

10.

Moore

R. E.

Interval Analysis. Prentice Hall, Englewood Cliffs, NJ, 1966.

11.

Klir

J. G.

Yuan

Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, Englewood Cliffs, NJ, 1995.

12.

Saltelli

Ratto

Andres

Campolongo

Cariboni

Gatelli

Saisana

Tarantola

Global Sensitivity Analysis: The Primer. John Wiley & Sons Ltd, New York, NY, 2008.

13.

Hamby

D. M.

A Review of Techniques for Parameter Sensitivity Analysis of Environmental Models. Environmental Monitoring & Assessment, Vol. 32, No. 2, 1994, pp. 135–154. https://doi.org/10.1007/BF00547132.

14.

Hamby

D. M.

A Comparison of Sensitivity Analysis Techniques. Health Physics, Vol. 68, No. 2, 1995, pp. 195–204.

15.

Kockelman

Gravity-Based Land Use Model (G-LUM) Website. http://www.ce.utexas.edu/prof/kockelman/G-LUM_Website/homepage.htm. Accessed July 10, 2017.

16.

Borgonovo

Apostolakis

G. E.

A New Importance Measure for Risk-Informed Decision Making. Reliability Engineering & System Safety, Vol. 72, No. 2, 2001, pp. 193–212. https://doi.org/10.1016/S0951-8320(00)00108-3.

17.

Borgonovo

Differential Importance and Comparative Statics: An Application to Inventory Management. International Journal of Production Economics, Vol. 111, No. 1, 2008, pp. 170–179. https://doi.org/10.1016/j.ijpe.2007.01.008.

18.

Saltelli

Tarantola

On the Relative Importance of Input Factors in Mathematical Models: Safety Assessment for Nuclear Waste Disposal. Journal of the American Statistical Association, Vol. 97, No. 459, 2002, pp. 702–709. http://dx.doi.org/10.1198/016214502388618447.

19.

Borgonovo

Sensitivity Analysis with Finite Changes: An Application to Modified EOQ Models. European Journal of Operational Research, Vol. 200, No. 1, 2010, pp. 127–138. https://doi.org/10.1016/j.ejor.2008.12.025.

20.

Borgonovo

Peccati

Managerial Insights from Service Industry Models: A New Scenario Decomposition Method. Annals of Operations Research, Vol. 185, No. 1, 2011, pp. 161–179. https://doi.org/10.1007/s10479-009-0617-1.

21.

Borgonovo

Percoco

Polizzi

Kockelman

Cavalli

Sensitivity Analysis of a Gravity-Based Land Use Model: The Importance of Scenarios. marcopercoco.files.wordpress.com/2014/11/glum-final.pdf. Accessed July 10, 2017.

22.

Wang

G. M.

Kockelman

Local Sensitivity Analysis of Forecast Uncertainty in a Random-Utility-Based Multi-Regional Input-Output Model. Journal of the Transportation Research Forum, Vol. 55, No. 2, 2016, pp. 49–70.

23.

Cheng

Uncertainty Quantification and Uncertainty Reduction Techniques for Large-scale Simulations. PhD dissertation. Department of Computer Science at Virginia Polytechnic Institute and State University, Blacksburg, VA, 2009.

24.

Zhao

Kockelman

K. M.

The Propagation of Uncertainty through Travel Demand Models: An Exploratory Analysis. Annals of Regional Science, Vol. 36, No. 1, 2002, pp. 145–163.

25.

Krishnamurthy

Kockelman

Propagation of Uncertainty in Transportation Land Use Models - Investigation of DRAM-EMPAL and UTPP Predictions in Austin, Texas. Transportation Research Record: Journal of the Transportation Research Board, 2003. 1831: 219–229.

26.

Harvey

Deakin

Description of the STEP Analysis Package. UC Berkeley Transportation Library, Berkeley, CA, 1996.

27.

Thompson

Baker

Wade

Conformity: Long-Term Prognoses for Selected Ozone Nonattainment Areas in California. Transportation Research Record: Journal of the Transportation Research Board, 1997. 1587: 44–51.

28.

Saadi

Mustafa

Teller

Cools

Forecasting Travel Behavior using Markov Chains-Based Approaches. Transportation Research Part C: Emerging Technologies, Vol. 69, 2016, pp. 402–417. https://doi.org/10.1016/j.trc.2016.06.020.

29.

Clay

M. J.

Johnston

R. A.

Multivariate Uncertainty Analysis of an Integrated Land Use and Transportation Model: MEPLAN. Transportation Research Part D: Transport and Environment, Vol. 11, No. 3, 2006, pp. 191–203. https://doi.org/10.1016/j.trd.2006.02.001.

30.

Clay

M. J.

Valdez

Norr

Otterstrom

S. M.

Uncertainty Analysis of the Large Zone Economic Module of the Simple, Efficient, Elegant, and Effective Model (SE3M) of Land Use and Transportation. Transportation Planning and Technology, Vol. 38, No. 5, 2015, pp. 855–874. http://dx.doi.org/10.1080/03081060.2015.1039231.

31.

Pontius

R. G.

Jr. Spencer

Uncertainty in Extrapolations of Predictive Land-Change Models. Environment and Planning B: Planning and Design, Vol. 32, No. 2, 2005, pp. 211–230. http://journals.sagepub.com/doi/abs/10.1068/b31152.

32.

Sevcikova

Raftery

A. E.

Waddell

P. A.

Uncertain Benefits: Application of Bayesian Melding to the Alaskan Way Viaduct in Seattle. Transportation Research Part A: Policy and Practice, Vol. 45, No. 6, 2011, pp. 540–553. https://doi.org/10.1016/j.tra.2011.03.009.

33.

Saltelli

Tarantola

Campolongo

Ratto

Sensitivity Analysis in Practice: A Guide to Assessing Scientific Models. John Wiley & Sons, New York, NY, 2004.

34.

Kalos

M. H.

Whitlock

P. A.

Monte Carlo Methods. John Wiley & Sons, New York, NY, 2008.

35.

Kroese

D. P.

Brereton

Taimre

Botev

Z. I.

Why the Monte Carlo Method Is So Important Today. Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 6, No. 6, 2014, pp. 386–392.

36.

Chen

T. D.

Kockelman

K. M.

Zhao

What Matters Most in Demand Model Specifications: A Comparison of Outputs. Journal of the Transportation Research Forum, Vol. 54, No. 2, 2015, pp. 71–89.

37.

Sheffi

Urban Transportation Networks: Equilibrium Analysis with Mathematical Programming Methods. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1985. https://doi.org/10.1016/0191-2607(87)90038-0.

38.

Martin

NCHRP Report 365: Travel Estimation Techniques for Urban Planning. HRB, National Research Council, Washington, D.C., 1998.

39.

Ziegler

Generalized Estimating Equations. Springer, New York, NY, 2011. https://doi.org/10.1007/978-1-4614-0499-6.