On Time-Dependent Trip Distance Distribution with For-Hire Vehicle Trips in Chicago

Abstract

For transportation system analysis in a new space dimension with respect to individual trips’ remaining distances, vehicle trips demand has two main components: the departure time and the trip distance. In particular, the trip distance distribution (TDD) is a direct input to the bathtub model in the new space dimension, and is a very important variable to consider in many applications, such as the development of distance-based congestion pricing strategies or mileage tax. For a good understanding of the demand pattern, both the distribution of trip initiation and trip distance should be calibrated from real data. In this paper, it is assumed that the demand pattern can be described by the joint distribution of trip distance and departure time. In other words, TDD is assumed to be time-dependent, and a calibration and validation methodology of the joint probability is proposed, based on log-likelihood maximization and the Kolmogorov–Smirnov test. The calibration method is applied to empirical for-hire vehicle trips in Chicago, and it is concluded that TDD varies more within a day than across weekdays. The hypothesis that TDD follows a negative exponential, log-normal, or Gamma distribution is rejected. However, the best fit is systematically observed for the time-dependent log-normal probability density function. In the future, other trip distributions should be considered and also non-parametric probability density estimation should be explored for a better understanding of the demand pattern.

To improve mobility, researchers study ways of “shaping travel demand.” Many different strategies can be considered, and most involve the reduction of distances traveled by motorized vehicles, for example, by reducing the number of motorized trips or increasing vehicle occupancy level for the motorized trips ( 1 ). Transportation demand in a road network is traditionally stored in origin–destination (OD) matrices, where each cell represents the number of vehicle trips between each OD pair. Depending on the duration of the time interval, the OD matrix is said to be static (long period of time) or dynamic (short period). However, the estimation of OD matrices is generally not very accurate because of the limited data available. In fact, the problem is known to be under-determined when one tries to estimate OD matrices from link flows. Further, the estimation of dynamic OD matrices is very computationally expensive. Nevertheless, demand calibration or modeling is a key component in any traffic flow model to model traffic congestion accurately and propose adequate operational strategies to alleviate it.

Recently, a new paradigm for transportation system analysis has been introduced where the spatial dimension is relative. The idea is to disregard the network topology and use space as a relative distance to the destination ( 2 , 3 ). In this new paradigm, the travel demand is described by the number of trips initiated at any given time and the trip distance distribution (TDD) of these trips ( 4 ). This demand definition is a direct input (or assumption) of the so-called bathtub models (2 –5), which have been gaining interest in recent years among the research community. While the supply side of the bathtub models has had a lot of attention in the literature, the calibration of demand has been overlooked. Several studies have highlighted the important role of trip distance (in this case at a regional level) on the accurate prediction of traffic dynamics (6 –8). More recently, it has been observed, based on mobile phone data, that the mean trip distance changes over the day ( 9 ), which contradicts the common assumption of time-independent TDD for the bathtub model ( 2 , 3 , 10 , 11 ).

In transportation science, the trip distance of the users is a fundamental variable that is needed for studying different aspects. For example, it is an important input in the “trip distribution” step of the four-step model, since it represents a measure of travel impedance. In the past, the data available to estimate the trip distance has been sparse, because it was collected through surveys. However, in recent years, other ways to collect the trip distance information have become available from mobile phone data (12 –14) and GPS traces ( 15 , 16 ). Several empirical studies have reported different TDD, such as log-normal distribution ( 14 , 17 , 18 ). These new technologies for data collection will make it possible to obtain larger data sets and a more accurate estimation of demand. However, the collection of such detailed information might lead to concerns about users’ privacy. When considering the new relative space paradigm for transportation, the data requirements are lower and it might be easier to guarantee privacy because only two variables should be collected. Thus, GPS traces and mobile phone data present a potential data source to estimate the demand for the new transportation paradigm. However, there is no systematic calibration procedure in the literature for this demand definition.

In summary, the study of TDD is gaining interest in the research community, both from the practical and modeling perspectives. TDD jointly with the trip initiation rate defines the travel demand and dictates the congestion dynamics in a road network. It is natural to think of the demand as a joint distribution of trip distances and departure time, although this concept is rather novel ( 4 ). This paper proposes a trip demand estimation methodology for such joint distribution. The method is then used to calibrate the TDD of for-hire vehicle trips reported by the transportation network providers in Chicago, for two common TDD assumed in the literature and a generalized function. Through statistical testing, the hypothesis that TDD is time-independent is rejected. The hypothesis that it follows any of the distributions considered in this paper is also rejected, which reveals the necessity of further studying the joint distribution of trip distances and departure time, since the common assumptions in the literature are not supported, at least for the data set considered in this paper.

The rest of the paper is organized as follows. The next section presents a literature review of TDD assumptions and models. Then, the definition of the joint trip distance and departure time distribution are presented and the methodology for its calibration is proposed. Later, an overview of the data used is presented and the time-dependency of the TDD is established through hypothesis testing. Then, both calibration and validation of three different time-dependent probability density functions are performed, and the results are discussed. Finally, a discussion on the contributions, limitations, and practical implications of this study is presented, and the paper concludes with a short summary.

Literature Review

This section will first summarize the assumptions on TDD for bathtub models. Then, some existing models and empirical calibrations of TDD are reviewed.

Trip Distance Assumptions in Bathtub Models

The “bathtub model” was a term proposed by Vickrey ( 2 , 3 ) to describe an aggregated model capturing the completion of trips in the city as a function of the number of vehicles in it. The dynamics are described by a conservation equation of the active vehicles in the system. The supply side of the bathtub models has been studied extensively in the literature, for example, through the network fundamental diagram calibration, also known as the macroscopic fundamental diagram. Demand is another important input of the bathtub model. However, the estimation of the TDD has been largely overlooked in the bathtub model literature. This paper aims to fill this gap by defining and estimating the time-dependent TDD. This will also help us to have a better understanding of the demand.

There is a special case where the traffic dynamics can be modeled with a simple ordinary differential equation. This was first derived assuming that the trip distance of the users in the network follows a (time-independent) negative exponential (NE) distribution ( 2 , 3 ), such as

φ (x)_{NE} = \frac{1}{B} \exp (\frac{- x}{B}),

(1)

where $B$ represents the average trip distance and $φ (x)$ is the probability density function of trip distances $x$ . This bathtub model is referred to as “Vickrey’s bathtub model.” Later, the same dynamics were independently derived by other authors ( 10 , 11 ). However, the assumption of NE distribution was not explicitly stated in these later works. Empirical findings by Yokohama ( 19 ) suggest that the average distance of trips, $B$ , does not change much over time, and is a homogenous parameter for the whole city, which relates the flow in the network with the completion rate of trips. This would be consistent with the NE assumption ( 20 ). However, more recent studies contradict these observations and have reported that the average trip length in an area does change over time ( 9 , 21 ).

Other TDD have been assumed in the literature for bathtub models, for example, the same trip distance for all travelers ( 5 ). This model with homogeneous users is often referred to as the “basic bathtub model.” Recently, the so-called “generalized bathtub model” was derived for any given TDD ( 4 ). The completion rate of trips for the generalized bathtub model depends on the distribution of remaining trip distances ( 4 ).

The so-called “trip-based model” ( 5 , 22 –24) is developed as a reformulation of the bathtub model, where the objective is to track individual users’ trip progression. Most of these studies consider constant trip distances and thus are equivalent to the basic bathtub model. A framework to determine explicit distributions of travel distances has been proposed based on information on vehicle trips in the city network through sampling of a set of (virtual) trips ( 7 ). These distributions were assumed to be time-independent but the probability density function was not calibrated. Later, the framework was extended to assume that a single trip can change its route depending on the traffic conditions. However, in this paper, it is assumed that the trip distances are fixed for individual vehicle trips.

In summary, there are several assumptions in the literature on TDD for bathtub models. Many researchers have assumed constant TDD ( 5 , 24 , 25 ), and others have assumed NE distribution either explicitly ( 2 , 3 ) or implicitly ( 10 , 11 ). In this paper, whether the trip distance is time-independent and whether the trip distance follows a NE distribution will be tested.

Existing TDD Models

In general, trip distance (or trip length) is studied through regression models based on population density and other demographic characteristics ( 13 ). The distribution step of the four-step model has traditionally been done based on gravity models, using a functional distribution of the travel impedance, for which parameters are calibrated subsequently. The travel impedance is in most cases defined by the trip distance. Thus, different functional forms of TDD have been considered in the literature. The most popular functions are NE, power-law, or a combination as $φ (x) = x^{a} \exp (- bx)$ ( 26 ). Other TDD have been used too, for example, log-normal, or logistic distribution ( 27 ). For example, Thomas and Tutert ( 17 ) established that the trip distribution can be explained well by a “negative exponential-to-the-power-law” distribution. The slopes of this function were reported to vary for different education levels and the average trip distance was shown to increase over time ( 17 ). However, considering a “negative-exponential-to-the-power-law” expression, $φ (x) = e^{a + b x^{0.4}}$ , does not necessarily satisfy $\int_{0}^{\infty} φ (x) dx = 1$ for all values of $a$ and $b$ . This property is one of the most important characteristics of a probability density function, since the area under the curve should be one. In particular, one can derive the required parameter $a = - 0.187$ , given the estimated value of $b = 1.5$ by Thomas and Tutert ( 17 ), which is not consistent with the empirically estimated value $a = 6$ . A more generic relation can be established between $a$ and $b$ , solving $\int_{0}^{\infty} e^{a} e^{- b x^{0.4}} dx = 1$ , that is, $a = \ln (\frac{8 b^{\frac{5}{2}}}{15 \sqrt{π}})$ .

The calibration in TDD for gravity models is traditionally done with limited survey data, thus the results were not very accurate. More recent findings by Colak et al. ( 14 ) based on mobile phone data suggest very interesting similarities between TDDs across the five cities analyzed. In particular, they concluded that the straight line distance between origins and destinations for commuting trips follows a log-normal distribution, that is,

φ_{LN} (x; μ, σ) = \frac{1}{x σ \sqrt{2 π}} \exp (- \frac{{(\ln (x) - μ)}^{2}}{2 σ^{2}}),

(2)

where $μ$ is the mean of the variable’s natural logarithm and the $σ$ is the standard deviation of the variable’s natural logarithm. The values obtained from the fitted log-normal distribution ranged from $μ \in [1.6; 2.1]$ to $σ \in [0.7; 1.2]$ ( 14 ). Later, another study based on three months of empirical data of Didi trips in 10 different cities in China also considered log-normal distribution of trip distance for the parameter estimation ( 28 ). The log-normal TDD has been assumed for other modeling purposes ( 29 ) as well.

All these empirical studies aggregated the trip distances across hours (or days) and calibrated a single distribution. This means that their studies (indirectly) assumed that the TDD is time-independent. However, it is important to consider trip distance variation for micro-simulations ( 6 ). For this reason, the present paper defines the joint distribution of departure time and trip distance ( 4 ) as the demand and proposes a method to study and calibrate the time-dependent TDD. Very recently the variations in mean trip distance (MTD) across peak and off-peak hours have been studied by Paipuri et al. ( 9 ). They observed, with empirical data, that MTD changes over time. However, the trip distance analysis was based on regional paths, that is, single trips were “cut” into regions and for each region the TDD was obtained. Therefore the analyzed TDD were dictated by the topological features of the city network and the network partitioning. There are three main differences between this paper and Paipuri et al. ( 9 ). First, this study considers the trip distance analysis of the whole trips, which allows us to study the demand, instead of studying the distribution of partial trip distances in regions that have been defined for the purpose of modeling. Thus, the present approach is more generic and the study of the TDD can serve other purposes than being the input of a bathtub model. Second, this study assumes a continuous TDD that is time-dependent and the parameters of three assumed distributions are calibrated, based on data. On the other hand, Paipuri et al. ( 9 ) do not present any calibration of TDD, and the time variation is only studied and discussed for the MTD. Finally, the present paper studies TDD through standard statistical hypothesis testing, which allows us to draw conclusions by rejection of certain hypotheses, rather than only by looking at graphical representation of the data.

Methodology

It is assumed that the TDD is time-dependent, as discussed in Thomas and Tutert ( 17 ). However, it is argued here that this time-dependency might be on a shorter time-scale, that is, within a day or across days, instead of considering a change over years. In the following, the time-dependent TDD is defined as a mixed continuous-discrete joint distribution.

Definition of Joint Probability Function

The concept of joint probability density function for trip lengths and time is a very natural concept, but a very novel one ( 4 ). A joint probability function defines the likelihood of two events occurring together at the same instant. This joint distribution can be discrete, continuous, or a mixture. If the joint distribution of trip distance and departure time, $φ_{T, X} (t, x)$ , is defined as a continuous function, then the probability that between $t_{0}$ and $t_{0} + Δ t$ a trip of distance $x \in [x_{0}, x_{0} + Δ x]$ is initiated is $\int_{t_{0}}^{t_{0} + Δ t} \int_{x_{0}}^{x_{0} + Δ x} φ_{T, X} (t, x) dxdt$ .

There is no empirical study that has tried to calibrate this joint distribution $φ_{T, X} (t, x)$ . This paper proposes a systematic approach to fill this gap. Notice that this joint distribution can be continuous or discrete, both in time and space. In this paper, it is assumed that $φ_{T, X} (t, x)$ is discrete in time and continuous in space. The reasons are twofold. First, this allows us to assume that, at a given time interval, the trip distances follow a well-established (continuous) probability density function, the same as those suggested in the literature, for example, NE distribution or log-normal distribution. Second, the data that used for the estimation is not continuous in time, but a trip’s start time is rounded to the nearest 15 min. This means that we cannot differentiate between trips that started at 7:55 a.m. and trips that started at 8:05 a.m., they are all reported to have started at 8:00 a.m.

From the axiom of probability the joint probability density function can be defined mathematically as

φ_{T, X} (t, x) = φ_{T} (t) \cdot φ_{X | T} (x | t),

(3)

where $φ_{T} (t)$ is the marginal probability function and $φ_{X | T} (x | t)$ is the conditional probability of trip distance given a time $t$ . This conditional distribution is in fact one input of the generalized bathtub model ( 4 ). Then, we can estimate $φ_{T, X} (t, x)$ by estimating both the trip generation rate mass function and the conditional probability of trip distance for any given time. In the following, the subscripts are omitted from the probability functions to simplify the notation.

Time being a discrete variable, the marginal probability of trip generation is defined as

φ (t) = \frac{e (t)}{E (T)},

(4)

where $E (t) = \sum_{i = 0}^{t} e (i)$ is the cumulative initiation of trips, and $T$ indicates the end of the day. The number of trips starting between time $t$ and $t + 1$ is $e (t)$ , where $t$ is the interval. Thus, $φ (t)$ is the marginal probability of a trip occurring during time interval $t$ and $t + 1$ and Equation 4 is discrete. The conditional probability of trip distance for a given time $t$ is defined as $φ (x | t) = φ (x; \bar{z} (t))$ , where $\bar{z} (t)$ are the parameters that define the trip distance probability density function and are time-dependent. Thus, we have

φ (t, x) = \frac{e (t)}{E (T)} φ (x; \bar{z} (t)) .

(5)

This mixed joint distribution based on discrete time intervals and continuous trip distance $x$ is a well-defined joint probability function since it integrates to one. The volume below the curve is

\begin{matrix} \sum_{t = 0}^{T} [(t + 1 - t) \cdot \int_{0}^{\infty} \frac{e (t)}{E (T)} φ (x; \bar{z} (t)) dx] = \\ \sum_{t = 0}^{T} [\frac{e (t)}{E (T)} \int_{0}^{\infty} φ (x; \bar{z} (t)) dx] . \end{matrix}

Clearly, the volume is one since $\int_{0}^{\infty} φ (x; \bar{z} (t)) dx = 1$ by definition of conditional distribution, and $\sum_{t = 0}^{T} \frac{e (t)}{E (T)} = 1$ , by definition of marginal mass function.

Hypotheses Considered

In this paper, the time-dependency is studied through Hypothesis 1.

Hypothesis 1: Null hypothesis: TDD is time-independent.

The joint probability distribution is then studied by considering three different possible functions. The NE distribution (Equation 1) and the log-normal distribution (Equation 2) are considered, since other empirical data were calibrated under that assumption ( 14 , 28 ). Moreover, a Gamma distribution is also considered,

φ_{Ga} (x; α, β) = \frac{β^{α} x^{α - 1} \exp (- β x)}{Γ (α)},

(6)

where $Γ (\cdot)$ is the Gamma function. This distribution has not been considered in the literature as a possible underlying function of the TDD. However, it will be considered in this paper, since $φ_{Ga} (x; α, β)$ is a generalization of the NE distribution and for some values of $α$ it has a similar “form” to the log-normal distribution. Furthermore, the Gamma distribution can be considered a particular case of a combination between exponential and power-law function, which has also been assumed to describe TDD ( 26 ).

If Hypothesis 1 is rejected, the joint probability functions defined above will be considered to be time-dependent. Hypothesis 2a, 2b and 2c will then be tested with the following null hypothesis ( $H_{0}$ ):

Hypothesis 2a: $H_{0}$ : The hourly TDD follows a NE distribution.

Hypothesis 2b: $H_{0}$ : The hourly TDD follows a log-normal distribution.

Hypothesis 2c: $H_{0}$ : The hourly TDD follows a Gamma distribution.

These functions are presented here:

φ_{NE} (t, x) = \frac{e (t)}{E (T)} \frac{1}{B (t)} \exp (\frac{- x}{B (t)}),

(7)

which has a single time-dependent parameter: the average trip length $B (t)$ .

φ_{LN} (t, x) = \frac{e (t)}{E (T)} \frac{1}{x σ (t) \sqrt{2 π}} \exp (- \frac{{(\ln (x) - μ (t))}^{2}}{2 σ^{2} (t)}) .

(8)

which has two time-dependent parameters: $μ (t)$ and $σ (t)$ .

φ_{Ga} (t, x) = \frac{e (t)}{E (T)} \frac{{[β (t)]}^{α (t)} x^{α (t) - 1} \exp (- β (t) x)}{Γ (α (t))},

(9)

which also has two time-dependent parameters $α (t)$ and $β (t)$ .

In this paper, trips are aggregated on an hourly basis, creating the discrete time variable $t = {0, \dots, 23}$ . As an example, all trips that started between 7:52.5 a.m. and 8:52.5 a.m. will be sampled and labeled as $t = 8$ . However, the presented methodology could be applied to any time interval aggregation, for example, 15 min, 5 min, and so forth if the data is detailed enough.

The rest of this section will explain the procedure of the calibration-validation method. For each hour of the day, the trips will be randomly divided into two samples: one calibration sample used for the maximum likelihood estimation of parameters; and one validation sample to perform statistical tests.

Calibration: Maximum Likelihood Estimation

The calibration will be based on the maximum likelihood parameter estimation method. This method estimates the parameters of a given distribution from a sample of trips, for example, $B (t)$ for the NE distribution (Equation 7) at time $t$ , and so forth. For simplicity, the time index $t$ is omitted from the following. This parameter estimation method is a typical calibration technique that maximizes the log-likelihood, L, that a set of observations are drawn from a distribution with one or more parameters. The likelihood of a given parameter $z$ given $x_{1}, \dots, x_{n}$ observations is defined as $L (z | x_{1}, \dots, x_{n}) = Π_{i = 1}^{n} L (z | x_{i})$ , where the likelihood $L (z | x_{i}) = φ (x_{i}; z)$ . The log-likelihood is then defined as

L = \ln (Π_{i = 1}^{n} φ (x_{i}; z)),

(10)

which is maximized by solving $\frac{d L}{dz} = 0$ .

For the NE function (Equation 7) the parameter $B$ can be estimated given a certain number $n$ of measurements $x_{i}$ . The likelihood is

L (B | x_{1}, \dots, x_{n}) = {(\frac{1}{B})}^{n} \exp (- \frac{1}{B} \sum_{i = 1}^{n} x_{i}) .

(11)

Maximizing Equation 10 with the likelihood in Equation 11 leads to $B^{ML} = \frac{\sum_{i = 1}^{n} x_{i}}{n}$ , where the superscript $ML$ indicates that it has been estimated through the maximum log-likelihood method. Similarly, this can be done for distributions with more than one parameter. For the log-normal distribution (Equation 2), the log-likelihood is

\begin{matrix} L (μ, σ | x_{1}, \dots x_{n}) = \\ - \frac{n}{2} \ln (2 π σ^{2}) - \sum_{i}^{n} \ln (x_{i}) - \frac{\sum_{i}^{n} \ln {(x_{i})}^{2}}{2 σ^{2}} \\ + \frac{\sum_{i}^{n} \ln (x_{i}) μ}{σ^{2}} - \frac{n μ^{2}}{2 σ^{2}} . \end{matrix}

Thus, the maximum likelihood estimators are

μ^{ML} = \frac{\sum_{i = 1}^{n} \ln (x_{i})}{n},

and

σ^{ML} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(\ln x_{i} - \sum_{i = 1}^{n} \frac{\ln x_{i}}{n})}^{2}} .

The same procedure could be applied to find the Gamma maximum likelihood estimators, but to solve $\frac{\partial L}{\partial α} = 0$ and $\frac{\partial L}{\partial β} = 0$ leads to a fixed point problem. Thus, there is no closed form expression for these estimators. To obtain the solution a numerical scheme should be used, for example, using a fixed point iteration scheme. However, software is readily available that can do the maximum likelihood estimation for Gamma distribution without requiring to code or solve the fixed point problem. This study used Python’s library stats to obtain the Gamma maximum likelihood estimators.

Validation: Kolmogorov–Smirnov Tests

To test whether the probability distribution of a sample follows a reference probability distribution, the Kolmogorov–Smirnov (KS) test can be used ( 30 , 31 ). There are two types of KS test: one-sample test and two-sample test. In the first, the null hypothesis, $H_{0}$ , is defined as: “the validation data set $x_{1}, \dots, x_{n}$ follows the estimated distribution from the calibration data set,” which is how Hypothesis 2a, 2b and 2c are defined. The KS statistic for the one-sample test is defined as

D_{n} = sup_{x} | Ω_{n} (x) - Ω (x) |,

(12)

where $Ω_{n} (x)$ is the empirical cumulative distribution function, and $Ω (x) = \int_{0}^{x} φ (s) ds$ is the reference cumulative distribution, which can be obtained from the calibration step in the previous section. The KS statistic quantifies the distance between the cumulative distribution function of the theoretical distribution and the empirical one ( 31 ). This test can only be used for continuous probability functions $φ (x)$ . To determine whether the null hypothesis can be rejected or not, the KS statistic should be compared with the critical value. The null hypothesis, $H_{0}$ , is rejected if the test statistic, $D_{n}$ , is greater than the critical value, $C_{n}$ , obtained from a table. For large sample sizes and for significance level $α = 0.05$ , the critical value can be approximated as

C_{n} (α = 0.05) = \frac{1.36}{\sqrt{n}},

(13)

where $n$ is the sample size, for the one-sample KS test.

For the two-sample KS test, the null hypothesis, $H_{0}$ , is that both empirical data sets were sampled from populations with identical distributions. This test will be performed to compare the empirical subsets (e.g., trip distance from different time of day or different days) and test whether they can be assumed to be a sample of the same TDD. If the hypothesis is rejected, the TDD of different times of day are not samples from the same distribution and it can be concluded that the TDD is time-dependent. The KS statistic for the two-sample test is

D_{n, m} = sup_{x} | Ω_{n} (x) - ϒ_{m} (x) |,

(14)

where $Ω_{n} (x)$ is the cumulative distribution of the first sample and $n$ is its sample size; and $ϒ_{m} (x)$ is the second cumulative distribution with sample size $m$ . Again, if the critical value $C_{n, m} < D_{n, m}$ , the null hypothesis is rejected. The critical value can be approximated for large sample sizes as

C_{n, m} (α = 0.05) = 1.36 \sqrt{\frac{n + m}{nm}} .

(15)

Empirical Data

Data Overview

The data source of this paper corresponds to the information that transportation network providers (that is, rideshare companies) in the city of Chicago have collected since late 2018. This data is publicly available at https://data.cityofchicago.org/Transportation/Transportation-Network-Providers-Trips/m6dm-c72p. Each trip recorded has a unique identifier and information about the trip distance (in miles), duration (in seconds), starting and ending times (reported in 15 min intervals, by rounding the actual time to the nearest interval), trip fare and tip, and information whether other trips were pooled. The trips’ origins and destinations are zone-based, corresponding to the 77 community areas in Chicago, depicted in Figure 1. We are not interested in including the trips with origin or destination at the airport or other peripheral areas. For this reason, a limited (more or less convex) area is considered for the analysis, represented in yellow in Figure 1.

Figure 1.

City of Chicago community areas. Zone of interest in yellow.

The data is cleaned to ensure that the trips recorded are meaningful, that is, trips that lasted for less than 10 s or with average speeds higher than 80 mph or lower than 1 mph are discarded from the data set. The data selected for this paper are trips that took place in 2019 that had both pick-up and drop-off points in the yellow community areas marked, which corresponds to 45 community areas and 72.7 million trips. The millions of trips initiated and ended in each community area and their associated mean distances are depicted in Figure 2. Clearly, the average trip distance is location-dependent. However, the analysis of location-dependent trip distance is outside the scope of this paper. Recently, the spatial variation of ride-hail trip demand of this data set has been analyzed ( 32 ). Notice that the areas with higher number of trips correspond to the areas with lower average trip distance. These community areas (6, 7, 8, 22, 24, 28 and 32) also correspond to the downtown of Chicago, where many trips are likely to originate and end in the downtown area itself, and are short trips.

Figure 2.

(a) Millions of trips started/completed in each community in the zone of interest, and (b) associated average trip distance (miles).

The data analysis in this paper is based on two different samples of trips from the data described. The first set of data analyzed will be all the trips over a given week. In particular, the trips that took place during week 11 are analyzed, since it corresponds to the week of 2019 with most trips recorded. This sample has a 1.63 million trips with average trip distance of 3.47 miles and standard deviation of 2.80 miles. For this set, the trip generation rate and the evolution of MTD over time will be analyzed in the following subsection. This will be done to study Hypothesis 1. In the next section, the time-dependent joint distribution will be estimated for March 13, 2019 (i.e., Wednesday of week 11). This day had a total of 195,119 trips with average distance of 3.53 miles and standard deviation of 2.85 miles. As explained in the previous section, the estimation of the density function of trip distances will be made per hour. Each hour’s data will be split into two random equally large data samples, one for calibration and the other for validation. Finally, a larger data sample will be also considered in next section (∼3.64 million trips) to perform further calibration-validation analysis and study the variation of TDD over a single day and across days.

Time-Dependent Trip Distance

In this section a detailed analysis of the demand over time is presented. The data sample used corresponds to the trips recorded in week 11 of 2019. As expected from common knowledge, the trip generation at different times of day is not the same, see Figure 3, a and b , where the trip initiation for the different days is depicted with respect to time. However, the underlying TDD could potentially be time-independent. A necessary condition for the trip distribution to be time-independent is that the average trip distance is time-independent. Therefore, we will do hypothesis testing on the MTD for each hour of data. A first look into the average trip distance over time in Figure 3, c and d , suggests that the average trip distance might be time-dependent. To test this, a one-way analysis of variance (ANOVA) test will be used, considering the following null hypothesis:

Hypothesis 3: $H_{0}$ : MTD is the same across hours, that is, ${\bar{x}}_{t = 0} = {\bar{x}}_{t = 1} = \dots = {\bar{x}}_{t = 23}$ .

Figure 3.

Trip initiation and distance analysis in zone of interest for week 11 of 2019: (a) trip generation at different times of day on weekdays, (b) trip generation at different times of day on the weekend (Saturday and Sunday), (c) average trip distance of trips initiated in each hour on weekdays, and (d) average trip distance of trips initiated in each hour on the weekend.

Each day’s MTD represents a data point for the groups “hour-of-day.” This leads to 24 sets of data and five data points in each group. It is verified that the assumption of normal distribution and homogeneity of variances are met for all 24 sets of data. Considering Hypothesis 3, the statistic for the ANOVA test is 37.9 and p-value is $8.06 \cdot 10^{- 49}$ . Thus, the hypothesis that the MTD is time-independent (over a day) is rejected. Note that the MTD for Monday to Friday follows the same trend. This suggests that the TDD may very within a day, with a fixed pattern from day-to-day. This will be studied in the subsection “Two Sided KS Test.” MTD on weekends was excluded from the analysis, because it is reasonable to assume that the nature of these trips is different.

Another observation to highlight from Figure 3 is that the average trip distance is longer for the hours of day when the trip generation is lower. This is so for all the days of the week. Notice that this relation is consistent with the observations in the previous subsection (i.e., Figure 2) where regions with lower production and attraction of trips had longer average trip distances. It seems that during the early morning, when the demand is low (especially between 4:00 and 7:00 a.m.), longer trips are more likely to happen than during other times of the day, increasing the mean value. The reasons for this relation may be the nature of the demand. For example, it is natural that people with long trip distances would choose transportation mode other than ride-hailing, especially during peak congestion time. Another possible explanation is that riders with longer trip distances try to avoid the early morning congestion by requesting a for-hire trip earlier. Alternatively, another reason why these longer trip distances are not observed in the afternoon is that these commuters might use other modes of transportation that were unsafe (or not available) in the early morning. Finally, this relation might be explained with spatial variations of trip distances, as reported in Figure 2. All these assumptions should be tested in future studies.

The MTD and standard deviation for each hour are compared pairwise between weekdays and weekends in Figure 4. The figure also shows that the standard deviation is systematically lower than the MTD, which indicates that the assumption of NE distribution ( 2 , 3 ) might not be adequate, since the expected value and standard deviation are equal in a negative exponential distribution. This will be tested in the subsections “Joint Distribution for Single Day” and “Calibration-Validaton of Tuesday and Wednesday Subsets.”

Figure 4.

Trip distance mean and standard deviations over time for weekdays (black) and weekends (blue): (a) compares mean trip distance between weekdays and weekends, (b) compares mean and standard deviation of trip distance for weekends, (c) compares mean and standard deviation of trip distance for weekdays, and (d) compares standard deviation of trip distance between weekdays and weekends.

The aggregated trip distance probability mass function (PMF) across all weekdays is presented in Figure 5a, while individual days’ PMF are presented in Figure 5b for completeness purposes. These aggregated mass functions for the whole day have a similar form to the Gamma or log-normal probability density functions. However, the estimation of these aggregated distributions is not the purpose of the paper, because we have rejected Hypothesis 1 and concluded that TDD is time-dependent. In the next section time-dependent calibration-validation analysis will be performed.

Figure 5.

Trip length empirical probability mass functions for (a) all days of week 11 of 2019 and (b) individual week days, that is, Monday, March 11 to Friday, March 15, 2019.

Empirical Calibration and Validation

Joint Distribution for Single Day

In this section a maximum likelihood estimation from (Equations 7 –9) is done for the March 13, 2019 trip data. First, the joint empirical PMF for that day is presented in Figure 6a, where trips have been aggregated in an hourly way and the trip distance $x$ is also discrete, in intervals of ∼0.42 miles. First, the marginal distribution $φ (t)$ can be calibrated from Equation 4. This marginal mass function is non-convex because of the two peak periods. The conditional TDD $φ (x | t)$ is calibrated and validated for each hour of the day, $t \in [0, 23]$ , and we test Hypothesis 2a, 2b, 2c for each hourly sample data. The sample size for both calibration and validation is presented in Figure 6b.

Figure 6.

Joint distribution for trip distance and trip initiation for Wednesday, March 13, 2019: (a) empirical joint probability mass function and (b) sample sizes of calibration and validation subsets for each hour.

The parameters from the estimation for the three distributions considered are omitted here for the sake of brevity. The KS statistics (Equation 12) are presented in Figure 7 for validation purposes. The subscript indicates what is the underlying distribution, that is, $D_{n, NE}$ for the NE distribution, $D_{n, LN}$ for the log-normal distribution, and $D_{n, Ga}$ for the Gamma distribution, where $n$ is the sample size shown in Figure 6b. Figure 7 also presents the hourly critical values, $C_{n}$ , based on Equation 13. Comparing $D_{n, NE}$ and $C_{n}$ , Hypothesis 2a is rejected for all hours. By comparison of $D_{n, LN}$ and $C_{n}$ , Hypothesis 2b is rejected for all hours, except 4:00 a.m. Thus, we fail to reject the null hypothesis that the samples come from a log-normal distribution at 4:00 a.m. Finally, Hypothesis 2c is rejected for most hours by comparing $D_{n, Ga}$ and $C_{n}$ , but from 2:00 to 4:00 a.m. the results of the KS statistic are inconclusive for the Gamma distribution.

Figure 7.

Kolmogorov–Smirnov (KS) statistic for each hourly estimation for the three different trip distance distributions (TDD) considered. Blue $D_{n, NE}$ for the negative exponential (NE) distribution, orange $D_{n, LN}$ for the log-normal distribution and green $D_{n, Ga}$ for the Gamma distribution. The critical values $C_{n}$ depend on the sample size of each hourly data set and are depicted in a dashed black line.

Between 4:00 a.m. and 9:00 p.m., we can see that $D_{n, LN}$ is much lower than the other two KS statistics. However, from 9:00 to 11:00 p.m. and at midnight, 2:00 a.m., and 3:00 a.m. the Gamma distribution has the lowest KS statistic. Moreover, notice that $D_{n, LN}$ is remarkably lower than the other two statistics for larger sample sizes, and the statistic increases for lower sample sizes. On the other hand, the KS statistic for the other two distributions is lower for smaller sample sizes. The hypothesis of the KS test is that “the maximum difference between the empirical and the theoretical cumulative density functions tends to zero for increasingly sample sizes.” Thus, for most of the hours, the log-normal distribution it is the best assumption from the three considered. Therefore, the “best” estimated joint distribution for this day considered in this paper is

\begin{matrix} φ_{LN} (t, x; μ (t), σ (t)) = \\ \frac{e (t)}{195119} \frac{1}{x σ (t) \sqrt{2 π}} \exp (- \frac{{(\ln (x) - μ (t))}^{2}}{2 σ^{2} (t)}), \end{matrix}

(16)

presented in Figure 8a. For a better representation, Figure 8, b–e, present the cuts of the joint probability $φ (t, x)$ at different times. The time-dependent parameters are depicted in Figure 9a.

Figure 8.

(a) Calibrated joint distribution and (b–e) cut of joint distribution represented as a curve for each hour $t \in [0, 23]$ , that is, the conditional probability of trip distance scaled down by the probability of a trip starting at that time interval.

Figure 9.

(a) Time-dependent parameters of log-normal estimated parameters and the estimated parameters for the whole day and (b) Kolmogorov–Smirnov (KS) statistic $D_{n}$ and critical KS value $C_{n}$ for each hour and for the 24 h analysis.

For the sake of comparison with the aggregated TDD for the whole day, calibration and validation of the log-normal distribution is also performed (see gray horizontal lines in Figure 9a). Compared with the parameters obtained from other cities, $σ^{ML}$ lies between the range observed in other cities ( 14 ), that is, $σ^{ML} \in [0.71.2]$ , while $μ^{ML}$ is about half of the reported lower bound. The estimation for the 24 h data set underestimates $μ^{ML}$ for early morning and overestimates it for the peak periods and during midday, compared with the proposed time-dependent calibration. On the other hand, the estimation of $σ^{ML}$ is generally underestimated and only overestimated during the peak hour periods. Figure 9b presents again the hourly $D_{n, LN}$ and $C_{n}$ and the whole day KS statistic and critical value for comparison. Notice that the difference (both absolute and relative) between the $D_{n}$ and $C_{n}$ is much larger than when estimating a TDD for an hourly base. Thus, a better estimation of the TDD distribution can be achieved if it considers time-dependent parameters.

Time Variation Analysis

This section considers a larger data sample (from multiple days), to ensure the robustness of the results reported for a single day. Moreover, it is interesting to study whether time variation in TDD is larger across days or within days. To do so, the data collected during four different hours (two off-peak and two peak hours) for all weekdays (except Friday) in 2019 are considered. Therefore, there is a total of 16 data sets, which are labeled as in Table 1. The focus here is only on the conditional distribution estimation, $φ (x | t)$ , since the data is sampled for certain hours of the day.

Table 1.

Summary of Data Sets for Time Variation Analysis

Time of day	Mondays	Tuesdays	Wednesdays	Thursdays
4:00–5:00 a.m.	Set 1	Set 5	Set 9	Set 13
8:00–9:00 a.m.	Set 2	Set 6	Set 10	Set 14
1:00–2:00 p.m.	Set 3	Set 7	Set 11	Set 15
7:00–8:00 p.m.	Set 4	Set 8	Set 12	Set 16

The cumulative mass function and the cross-comparison across days and hours of days is presented in Figure 10. A first look into the cumulative mass function highlights the similarity between days for each given hour. In the following, only the Sets 5 to 12 (i.e., Tuesdays and Wednesdays) are analyzed because they correspond to the days with higher reported difference in the cumulative mass functions, see Figure 10, f and i . The pairwise comparison for all 4 h is depicted in Figure 11a. The similitude between the early morning hours (4:00–5:00 a.m.) is not as evident as the other hours analyzed. Figure 11b presents the differences between the cumulative mass functions on Tuesdays for different hours of day compared with the morning peak. This figure reinforces the hypothesis in the previous section that the TDD time-dependency changes more over a single day than across days of the week.

Figure 10.

(a, d, g, j) Cumulative mass functions for the different sets of data described in Table 1, Monday–Thursday. (b, e, h, k) Difference in cumulative mass functions for a given day across time of day, Monday–Thursday. (d, f, i, l) Difference in cumulative mass functions across days for a given time: (a) Monday, (b) Monday, (c) 4:00–5:00 a.m., (d) Tuesday, (e) Tuesday, (f) 8:00–9:00 a.m., (g) Wednesday, (h) Wednesday, (i) 1:00–2:00 p.m., (j) Thursday, (k) Thursday, and (l) 7:00–8:00 p.m.

Figure 11.

Difference in trip distance cumulative mass function (CMF) across subsets: (a) comparison between Tuesday and Wednesday, and (b) comparison within day for Tuesday data.

Two Sided KS Test

This section will consider the following hypothesis:

Hypothesis 4: $H_{0}$ : The empirical samples from Set x and Set y come from the same underlying distribution.

To test Hypothesis 4, the two-sample KS test is used. The KS statistic (Equation 14) is presented in Figure 12. In particular the data sets of Wednesday samples are tested against each other in Figure 12a; and against the data sets for Tuesday in Figure 12b. The pairs with darker color present more similitude among their cumulative distribution. The diagonal comparison in Figure 12a is ignored, since it compares the same sample. The $C_{n, m}$ from Equation 15 is lower than 0.0089 in all cases. All the null hypotheses are rejected, except one. The test to compare Set 7 with Set 11 leads to a KS statistic of $D_{n, m} = 0.0019 < 0.0031 = C_{n, m}$ and we fail to reject the null hypothesis in this case.

Figure 12.

Two-sample Kolmogorov–Smirnov (KS) statistic to compare data sets. The color of each square represents the value of the KS statistic, $D_{n, m}$ : (a) Monday CMF, (b) Monday difference between hourly CMF, (c) 4:00–5:00 a.m. difference across days, (d) Tuesday CMF, (e) Tuesday difference between hourly CMF, (f) 8:00–9:00 a.m. difference across days, (g) Wednesday CMF, (h) Wednesday difference between hourly CMF, (i) 1:00–2:00 p.m. difference across days, (j) Thursday CMF, (k) Thursday difference between hourly CMF, and (l) 7:00–8:00 p.m. difference across days.

The diagonal terms in Figure 12b are darker than the non-diagonal squares in 12a, which indicates that the distributions from the same hour are more similar across days than the distributions of a given day for different hours. The peak hour TDDs have higher similitude than the off-peak TDDs. Furthermore, the lightest pairs, which correspond to greater differences in the cumulative distributions, are always for Sets 5 and 9, which corresponds to trips sampled at 4:00 a.m. This suggests that TDD during the early morning is notably different than the rest, as suggested by Figure 11.

Calibration-Validation of Tuesday and Wednesday Subsets

In this subsection the validation and calibration are performed, as presented in the Methodology section, on different samples of the data sets corresponding to Tuesday and Wednesday (Set 5–Set 12). The results are shown in the first part of Table 2. The log-likelihood is calculated from the calibrated data as Equation 10. From the results in Table 2 it can be seen that for each data set the maximum log-likelihood is obtained for the calibration of log-normal distribution, except for Sets 5 and 9. For these sets, the log-likelihood from the Gamma distribution fit is either greater than or equal to the log-likelihood from the log-normal calibration. Notice that the log-likelihood for the NE distribution is consistently lower than the other two, indicating a worse fit to the data.

Table 2.

Calibration and Validation Results for the Full Sets (All Year Data)

	Negative exponential distribution	Log-normal distribution	Gamma distribution
Parameters
Set 5	$B^{ML}$ =4.807	$μ^{ML} = 1.237$ ; $σ^{ML} = 0.862$	$α^{ML} = 1.650$ ; $β^{ML} = 0.343$
Set 6	$B^{ML}$ =3.346	$μ^{ML} = 0.932$ ; $σ^{ML} = 0.750$	$α^{ML} = 1.963$ ; $β^{ML} = 0.587$
Set 7	$B^{ML}$ =3.431	$μ^{ML} = 0.912$ ; $σ^{ML} = 0.809$	$α^{ML} = 1.708$ ; $β^{ML} = 0.498$
Set 8	$B^{ML}$ =3.254	$μ^{ML} = 0.902$ ; $σ^{ML} = 0.757$	$α^{ML} = 1.949$ ; $β^{ML} = 0.599$
Set 9	$B^{ML}$ =4.788	$μ^{ML} = 1.237$ ; $σ^{ML} = 0.858$	$α^{ML} = 1.667$ ; $β^{ML} = 0.348$
Set 10	$B^{ML}$ =3.304	$μ^{ML} = 0.918$ ; $σ^{ML} = 0.751$	$α^{ML} = 1.953$ ; $β^{ML} = 0.591$
Set 11	$B^{ML}$ =3.397	$μ^{ML} = 0.900$ ; $σ^{ML} = 0.812$	$α^{ML} = 1.696$ ; $β^{ML} = 0.499$
Set 12	$B^{ML}$ =3.241	$μ^{ML} = 0.903$ ; $σ^{ML} = 0.749$	$α^{ML} = 1.981$ ; $β^{ML} = 0.611$
Log-likelihood, $L$
Set 5	−6.31 $\cdot 10^{- 4}$	−6.16 $\cdot 10^{- 4}$	−6.15 $\cdot 10^{- 4}$
Set 6	−7.31 $\cdot 10^{- 5}$	−6.83 $\cdot 10^{- 5}$	−6.94 $\cdot 10^{- 5}$
Set 7	−4.33 $\cdot 10^{- 5}$	−4.11 $\cdot 10^{- 5}$	−4.19 $\cdot 10^{- 5}$
Set 8	−7.79 $\cdot 10^{- 5}$	−7.29 $\cdot 10^{- 5}$	−7.40 $\cdot 10^{- 5}$
Set 9	−5.75 $\cdot 10^{- 4}$	−5.60 $\cdot 10^{- 4}$	−5.60 $\cdot 10^{- 4}$
Set 10	−7.37 $\cdot 10^{- 5}$	−6.88 $\cdot 10^{- 5}$	−7.00 $\cdot 10^{- 5}$
Set 11	−4.31 $\cdot 10^{- 5}$	−4.09 $\cdot 10^{- 5}$	−4.17 $\cdot 10^{- 5}$
Set 12	−7.91 $\cdot 10^{- 5}$	−7.39 $\cdot 10^{- 5}$	−7.50 $\cdot 10^{- 5}$
Kolmogorov–Smirnov statistic, $D_{n}$
Set 5	0.109	0.0333	0.0509
Set 6	0.169	0.0474	0.0784
Set 7	0.144	0.0539	0.0875
Set 8	0.165	0.0287	0.0590
Set 9	0.114	0.0419	0.0452
Set 10	0.170	0.0433	0.0762
Set 11	0.148	0.0433	0.0787
Set 12	0.167	0.0296	0.0581

The KS statistic, $D_{n}$ , is presented in the third part of Table 2, for each TDD. For the given sample sizes of all sets (Figure 13a black color), the KS critical value is much lower ( $\leq 0.0065$ ) than any $D_{n}$ . Thus, based on the sample trips from all of 2019, the null hypothesis should be rejected for all distributions considered. However, the KS statistic for the log-normal distribution is systematically lower than the others for all sets of data considered. As a graphical representation, the empirical validation data histograms are presented in Figure 14, with the superposition of the theoretical NE distribution, log-normal, and Gamma conditional distributions estimated from the calibration sample. From visual inspection it is clear that the log-normal distribution captures better the mode of the trip distance.

Figure 13.

(a) Number of trips for each sample considered. Each color represents a sample of the data set during a different period of time and (b) analysis of Kolmogorov–Smirnov (KS) statistic, $D_{n}$ , from each trip distance distribution considered for different sample sizes against the KS critical value, $C_{n}$ (dashed gray line).

Figure 14.

Empirical conditional trip distance distribution of validation subsets and the three maximum likelihood calibration distributions, $φ (x | t)$ , obtained from calibration subsets with parameters in Table 2: (a) Tuesday 4 -5 AM, (b) Tuesday 8 -9 AM, (c) Tuesday 1-2 PM, (d) Tuesday 7-8 PM, (e) Wednesday 4 -5 AM, (f) Wednesday 8 -9 AM, (g) Wednesday 1-2 PM, and (h) Wednesday 7-8 PM.

Recall that decay of KS critical value is inversely proportional to the square root of $n$ with increasing sample size, as per Equation 13. In general, KS tests are performed with sample sizes of less than 10,000 data points, that is, critical $C_{n} (α = 0.05) > 0.0136$ . Thus, the same calibration and validation procedure is performed for different samples. Five samples are drawn and tested for each set, presented in Figure 13a with their corresponding sample sizes. Using different samples (with different sizes), multiple one-sample KS tests can be performed to ensure that the conclusion of the test is not affected by the sample size. The $D_{n}$ for all these data samples are presented in Figure 13b. For all, the null hypotheses from Hypothesis 2a, 2b, 2c are rejected. Therefore, it can be concluded that TDD does not follow either a NE, a log-normal, or a Gamma distribution. The following observations can also be made from Figure 13b:

Increasing the sample size increases the KS statistic for the NE distribution. In other words, the larger the sample size, the larger the difference is between the sample and NE distribution.

For the log-normal and Gamma distributions, a larger sample size does not much affect $D_{n}$ . However, for lower sample sizes, the Gamma and log-normal validations lead to similar KS statistics, while for larger sample sizes the log-normal leads to consistently lower $D_{n}$ than the Gamma distribution.

For very large samples, the $D_{n, LN}$ is systematically lower than $D_{n}$ from the other two distributions considered. This result corroborates the conclusion in the section “Joint Distribution for Single Day” that the best approximation would be to assume that TDD follows a time-dependent log-normal distribution.

Practical Implications, Contributions, and Limitations

Travel demand can be obtained from a travel demand model (four-step model, activity-based model, etc.) or calibrated from empirical data. In this paper, the focus is on the calibration of travel demand, especially the TDD, for transportation system analysis in a relative space dimension ( 2 – 4 ). TDD is important not only for modeling traffic congestion, but also for designing and evaluating mileage tax and other real-time operations and management schemes, including transit scheduling and distance-based congestion pricing in high-occupancy toll lanes. Further, TDD can also be used to calibrate and validate the resulting trips from activity-based models.

This paper proposes a calibration procedure to find the demand pattern empirically through the definition of the joint distribution of departure time and trip distance and maximum likelihood estimation. The Chicago data set was used to perform some statistical tests on the TDD. Most of the hypotheses considered in this paper were rejected. From Hypotheses 1, 3, and 4, the TDD should be considered time-dependent, which is consistent with another recent study ( 9 ). Testing Hypothesis 2 showed that the NE distribution is a very bad approximation for TDD, compared with the other two distributions considered (Gamma, as a generalization of NE distribution, and log-normal distribution, assumed by some researchers). Therefore, the trip dynamics of Vickrey’s bathtub model are only an imprecise approximation of the actual bathtub dynamics.

However, none of the distributions considered in this paper present a good fit for TDD. From the behavioral point of view, the distribution will depend on land use, mode choice, and departure time choice of travelers. Whether a generic TDD can be found to describe any demand pattern in any region is an open and challenging question. It is likely that a site- and mode-specific calibration is needed for each case. This highlights the importance of using the generalized bathtub model ( 4 ) over the most common bathtub models used in the literature that assume directly ( 2 , 3 ) or indirectly ( 10 , 11 ) a time-independent NE distribution of trip distances, or constant trip distances ( 5 , 25 ).

In summary, the contributions of this paper are threefold. First, it provides a systematic way to define and estimate the demand in the space–time domain, where the space dimension is with respect to the remaining trip distances. This is achieved by defining the joint distribution of departure time and trip distance as the product of the marginal and conditional distribution. Second, based on empirical data of Chicago for-hire trips, it proves through statistical methods that TDD is time-dependent, and that the variation is larger within the day than across days. Third, it rejects the hypothesis that the trip distribution follows a NE distribution (as assumed in Vickrey’s bathtub model), a log-normal distribution (considered by several authors), or the Gamma distribution. Other observations from the empirical data analyzed are that the average trip distance changes both spatially and in a temporal manner. Also, in peak periods the average trip distance is shorter, while in the early morning a higher percentage of longer trips take place. In all, a case-by-case data-based demand calibration might be needed for real-time control and traffic modeling. Even if a site-specific calibration of demand is needed, the relative space dimension paradigm allows for easier calibration and definition of demand than the traditional OD demand estimation problem in transportation analysis.

Finally, it is important to highlight that the data set used is from for-hire vehicles. For this reason, the results of this paper might not be representative of other types of trips. This study was only analyzing data from a sample of the total demand, that is, considering only one mode. Thus, to have a better understanding of the total demand pattern, further calibration of other empirical data sources is required, ideally based on privately-owned vehicle trips. Since it is conclude that the most common assumptions in the literature on TDD are not adequate for for-hire trips, the authors want to highlight the importance of carefully revising and selecting the demand assumptions for other modes. For this reason, studying empirical trip distances of other modes is important to define more adequate assumptions on the demand in the future.

Conclusions

The study of TDD has been gaining interest in the research community because of the newly available data collection technologies. This paper considers a different type of demand definition, where the geographical location of the origins and destinations is ignored. It presents a calibration and validation procedure to study the demand as a joint probability density function of trip distance and departure time. In the past, TDD has been studied and calibrated, but always based on daily (or larger temporal-scale) aggregation of trips. However, TDD could depend on the day in the same way that the trip initiation rate changes during the day. This paper considers a time-dependent TDD and proposes a methodology to calibrate the joint distribution of trip distance and trip initiation rate.

Further, using the for-hire trips data set from Chicago, it is proved through hypothesis testing that TDD is indeed time-dependent. For a day in 2019, both calibration and validation of three different joint distributions (time-dependent NE, log-normal, and Gamma distributions) were performed. With statistical tests, such as the KS test, the hypothesis that samples of trips at different times of day follow any of the considered distributions is rejected. Among the three distributions tested, the log-normal is the most promising, because it consistently leads to lower KS statistics and higher log-likelihood. One of the time-dependent parameters estimated for the log-normal distribution matches well with other calibration in other cities ( 14 ), while the other parameter is consistently lower.

In summary, this paper highlights the importance of considering other distributions in the future to study transport demand. Future research should consider the calibration and validation of Weibull distribution or generalized extreme value distribution, and even consider non-parametric calibration ( 33 ). In the future, the authors are also interested in performing a location-dependent TDD analysis. The proposed methodology can be applied to smaller regions, even considering each individual community area. This type of study might bring shed light on mobility patterns and also regional trip length analysis.

Footnotes

Acknowledgements

The second author would like to acknowledge the support of NSF-SCC- CMMI#1952241, entitled “SCC-PG: Addressing Unprecedented Community-Centered Transportation Infrastructure Needs and Policies for the Mobility Revolution.”

Author Contributions

The authors confirm their contribution to the paper as follows: study conception and design: I. Martínez and W-L. Jin; statistical analysis: I. Martínez; interpretation of results: I. Martínez and W-L. Jin; draft manuscript preparation: I. Martínez. Both authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The first author would like to acknowledge the Graduate Balsells Fellowship support. The second author would like to acknowledge the support of NSF-SCC- CMMI#1952241, entitled “SCC-PG: Addressing Unprecedented Community-Centered Transportation Infrastructure Needs and Policies for the Mobility Revolution.”

ORCID iDs

Irene Martínez

Wen-Long Jin

References

Cervero

Kockelman

Travel Demand and the 3Ds: Density, Diversity, and Design. Transportation Research Part D: Transport and Environment, Vol. 2, No. 3, 1997, pp. 199–219.

Vickrey

Congestion in Midtown Manhattan in Relation to Marginal Cost Pricing. Unpublished Notes, Columbia University, New York, 1991.

Vickrey

Congestion in Midtown Manhattan in Relation to Marginal Cost Pricing. Economics of Transportation, Vol. 21, 2020, p. 100152.

Jin

W.-L.

Generalized Bathtub Model of Network Trip Flows. Transportation Research Part B: Methodological, Vol. 136, 2020, pp. 138–157.

Arnott

A Bathtub Model of Downtown Traffic Congestion. Journal of Urban Economics, Vol. 76, 2013, pp. 110–121.

Yildirimoglu

Geroliminis

Approximating Dynamic Equilibrium Conditions with Macroscopic Fundamental Diagrams. Transportation Research Part B: Methodological, Vol. 70, 2014, pp. 186–200.

Batista

Leclercq

Geroliminis

Estimation of Regional Trip Length Distributions for the Calibration of the Aggregated Network Traffic Models. Transportation Research Part B: Methodological, Vol. 122, 2019, pp. 192–217.

Mariotte

Leclercq

Batista

S. F. A.

Krug

Paipuri

Calibration and Validation of Multi-Reservoir MFD Models: A Case Study in Lyon. Transportation Research Part B: Methodological, Vol. 136, 2020, pp. 62–86.

Paipuri

González

M. C.

Leclercq

Estimating MFDs, Trip Lengths and Path Flow Distributions in a Multi-Region Setting Using Mobile Phone Data. Transportation Research Part C: Emerging Technologies, Vol. 118, 2020, p. 102709.

10.

Small

K. A.

Chu

Hypercongestion. Journal of Transport Economics and Policy, Vol. 47, 2003, pp. 19–52.

11.

Daganzo

Urban Gridlock: Macroscopic Modeling and Mitigation Approaches. Transportation Research Part B: Methodological, Vol. 41, No. 1, 2007, pp. 49–62.

12.

González

M. C.

Hidalgo

C. A.

Barabási

A. L.

Understanding Individual Human Mobility Patterns. Nature, Vol. 453, No. 7196, 2008, pp. 779–782.

13.

Calabrese

Diao

Di Lorenzo

Ferreira

Ratti

Understanding Individual Mobility Patterns from Urban Sensing Data: A Mobile Phone Trace Example. Transportation Research Part C: Emerging Technologies, Vol. 26, 2013, pp. 301–313.

14.

Colak

Lima

Gonzalez

M. C.

Understanding Congested Travel in Urban Areas. Nature Communications, Vol. 7, 2016, p. 10793.

15.

Wolf

Schönfelder

Samaga

Oliveira

Axhausen

K. W.

Eighty Weeks of Global Positioning System Traces: Approaches to Enriching Trip Information. Transportation Research Record: Journal of the Transportation Research Board, 2004. 1870: 46–54.

16.

Gong

Morikawa

Yamamoto

Sato

Deriving Personal Trip Data from GPS Data: A Literature Review on the Existing Methodologies. Procedia - Social and Behavioral Sciences, Vol. 138, 2014, pp. 557–565.

17.

Thomas

Tutert

An Empirical Model for Trip Distribution of Commuters in the Netherlands: Transferability in Time and Space Reconsidered. Journal of Transport Geography, Vol. 26, 2013, pp. 158–165.

18.

Cich

Knapen

Bellemans

Janssens

Wets

TRIP/STOP Detection in GPS Traces to Feed Prompted Recall Survey. Procedia Computer Science, Vol. 52, 2015, pp. 262–269.

19.

Geroliminis

Daganzo

C. F.

Existence of Urban-Scale Macroscopic Fundamental Diagrams: Some Experimental Findings. Transportation Research Part B: Methodological, Vol. 42, No. 9, 2008, pp. 759–770.

20.

Lamotte

Murashkin

Kouvelas

Geroliminis

Dynamic Modeling of Trip Completion Rate in Urban Areas with MFD Representations. Presented at 97th Annual Meeting of the Transportation Research Board, Washington, D.C., 2018. .

21.

Ramezani

Haddad

Geroliminis

Dynamics of Heterogeneity in Urban Networks: Aggregated Traffic Modeling and Hierarchical Control. Transportation Research Part B: Methodological, Vol. 74, 2015, pp. 1–19.

22.

Fosgerau

Congestion in the Bathtub. Economics of Transportation, Vol. 4, No. 4, 2015, pp. 241–255.

23.

Daganzo

C. F.

Lehe

L. J.

Distance-Dependent Congestion Pricing for Downtown Zones. Transportation Research Part B: Methodological, Vol. 75, 2015, pp. 89–99.

24.

Mariotte

Leclercq

Laval

J. A.

Macroscopic Urban Dynamics: Analytical and Numerical Comparisons of Existing Models. Transportation Research Part B: Methodological, Vol. 101, 2017, pp. 245–267.

25.

Arnott

Buli

Solving for Equilibrium in the Basic Bathtub Model. Transportation Research Part B: Methodological, Vol. 109, 2018, pp. 150–175.

26.

de Dios Ortúzar

Willumsen

L. G.

Chapter 5. Trip Distribution Modelling, in Modelling Transport. 4th ed.. John Wiley and Sons Ltd, Hoboken, NJ, 2011, pp. 175–206.

27.

de Vries

J. J.

Nijkamp

Rietveld

Exponential or Power Distance Decay for Commuting? An Alternative Specification. Tinbergen Institute Discussion Paper, No. 04-097/3. Tinbergen Institute, Amsterdam and Rotterdam, 2004.

28.

Yang

A Universal Distribution Law of Network Detour Ratios. Transportation Research Part C: Emerging Technologies, Vol. 96, 2018, pp. 22–37.

29.

Lehe

L. J.

Downtown Tolls and the Distribution of Trip Lengths. Economics of Transportation, Vol. 11–12, 2017, pp. 23–32.

30.

Conover

. Chapter 6. in Practical Nonparametric Statistics, In Nonparametric Methods, Vol. II, 3rd ed. John Wiley & Sons, New York, NY, 1999, pp. 233–305.

31.

Chakravarti

I. M.

Laha

R. G.

Roy

Handbook of Methods of Applied Statistics. John Wiley and Sons, New York, NY, 1967.

32.

Dean

M. D.

Kockelman

K. M.

Spatial Variation in Shared Ride-Hail Trip Demand and Factors Contributing to Sharing: Lessons from Chicago. Journal of Transport Geography, Vol. 91, 2021, p. 102944.

33.

Kvam

P. H.

Vidakovic

Nonparametric Statistics with Applications to Science and Engineering. John Wiley & Sons, New York, NY, 2007.