Evaluating the Reliability and Consistency of Statistical Models for Speed Distribution Analysis at Hazardous and Non-Hazardous Roadway Locations: Multifraction Data Set Approach

Abstract

Understanding the statistical dynamics of traffic speeds at hazardous and non-hazardous locations is essential for effective roadway safety interventions. This study investigates the distinct characteristics of spot speed distributions across six Indian highway segments, including National and State Highways. It uses continuous probability distributions and hypothesis testing to assess the statistical significance of speed differences between hazardous and non-hazardous locations. The analysis is based on observed spot speed measurements, stratified into four data fractions (25%, 50%, 75%, and 100%), obtained using a simple random sampling with replacement approach. Seven continuous probability distributions, including normal, lognormal, gamma, logistic, Weibull, Burr, and generalized extreme value (GEV), have been fitted independently for each location type and data fraction to capture their distributional characteristics. The location, scale, and shape parameters of the models have been estimated using maximum likelihood estimation. However, model adequacy has been confirmed using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values. Furthermore, a two-sample Kolmogorov–Smirnov test has been conducted to assess the statistical difference in speed profiles between hazardous and non-hazardous locations. The results reveal that the GEV distribution consistently outperforms other models across all locations and data fractions, demonstrating strong parameter stability and model adequacy. Larger data fractions improved model performance and hypothesis testing power, indicating greater distributional robustness. To address the potential effect of vehicle interactions and transient congestion during the observation periods, a modified Kaplan–Meier (KM) framework is used to estimate congestion-adjusted desired speed distributions. The KM-based findings show that hazardous roadway locations exhibit higher desired speed potential and greater upper-tail speed characteristics compared with non-hazardous locations. Interestingly, statistically significant speed differences have been found in nearly all settings, confirming the notion that crash-prone zones exhibit distinct speed dynamics. These findings have significant implications for road safety policy and infrastructure design, as well as the need for location-specific speed management strategies.

Keywords

spot speed distribution hazardous roadway segments distribution models heterogeneous traffic traffic safety

Introduction and Background

Overspeeding accounts for more than half of all road crash fatalities in low- and middle-income countries, while it accounts for approximately 30% of road fatalities in high-income countries ( 1 – 3 ). In addition, over 67% of road crashes in India occur on straight road segments, highlighting speed as a significant factor in increasing crash frequency ( 4 , 5 ). Therefore, understanding the statistical dynamics of speeding is crucial for identifying hazardous roadway environments and developing effective safety interventions.

Traffic speed assessment is an important aspect of road safety research because it directly affects a driver’s reaction time, braking distance, and the overall crash risk ( 4 ). Statistical speed modeling, encompassing theoretical and simulation-based traffic flow analysis, plays a crucial role in quantifying the stochastic behavior of traffic flow and explaining speed variations across different roadway environments ( 6 ). Previous research has focused on several aspects, including driver-related factors (age, gender, and alcohol consumption) ( 7 , 8 ), road and vehicle-related factors (geometric design, surface quality, vehicle age, and vehicle power) ( 9 , 10 ), and contextual road environment factors (traffic composition, density, prevailing speed, and weather conditions) ( 11 ) that affect speeding behavior of drivers. Researchers have used different statistical methodologies, including structured regression analysis ( 12 ), binary logistic regression ( 13 ), and Tobit models ( 14 ), to examine crash-causality relationships in this research domain. These models usually provide insights into the correlations between speeding, including demographic characteristics, road geometric design features, and enforcement intensity; however, they often fail to fully capture the distributional nature of spot speeds across diverse traffic environments.

Several earlier studies suggest that speed, as a continuous random variable, follows a normal distribution because it is symmetrically distributed around its central value ( 4 , 15 – 18 ). According to Haight and Mosher ( 19 ), lognormal and gamma distributions may more accurately represent speed distributions than normal distributions, because the same function can be used to convert a time–speed distribution into a space–speed distribution. Pillai and Ramanayya ( 20 ) emphasized the need to fit the speed distribution function for each vehicle type because each vehicle has distinct power, acceleration, and deceleration characteristics, resulting in significant speed differences. McLean ( 21 ) found that car speeds generally follow a normal distribution, with coefficients of variation ranging from 0.11 to 0.18 in moderately dense traffic. Some recent studies have shown that, under heterogeneous traffic conditions, observed spot speed distributions often deviate from normality because of interactions among slow- and fast-moving vehicles, weak lane discipline, and variation in driver behavior ( 18 , 22 , 23 ). For example, Harding et al. ( 24 ) found that the inclusion of auto-rickshaws in the traffic flow considerably reduces the average speed of the stream across the traffic volume range. Saha et al. ( 25 ) focused on minimizing inconsistencies in the distributional assumptions of speed data on two-lane highways with mixed traffic, aiming to improve the accuracy of capacity and Level of Service analyses. The findings indicated that Indian traffic patterns show that a normal distribution effectively characterizes the spot speeds of cars, light commercial vehicles (LCVs), and scooters; however, a lognormal distribution holds for bicycles.

More recent research has increasingly focused on fitting continuous probability distribution functions to model traffic speed behavior, particularly under heterogeneous traffic conditions, in parallel to econometric and regression-based approaches ( 4 , 16 , 18 , 26 ). Researchers, such as Mondal and Gupta ( 18 ) and Sarkar and Kumar ( 4 ), have used continuous probability distributions to investigate the distributional characteristics of vehicle speeds, particularly high-speed driving behavior at four-legged signalized and three-legged unsignalized intersections, respectively. Their study results indicate that the generalized extreme value (GEV) and Burr distributions are the most appropriate empirical speed distributions, with the GEV exhibiting the best-fit above 96%. In a mixed traffic scenario, when the proportion of heavy vehicle composition (trucks, buses, and tractors) is below 10%, it adheres to the Weibull distribution; between 10% and 14%, it follows the gamma distribution; between 15% and 20%, it conforms to the GEV distribution; and above 20%, it fits with the Burr distribution. Furthermore, the normal and lognormal distributions are considered the least suitable models. These findings suggest that three-parameter distribution functions outperform their two-parameter counterparts in modeling the range of observed speeds. These functions capture central tendencies and account for variability, skewness, and kurtosis, all of which are essential for understanding extreme speeding behaviors. This is particularly relevant from a safety perspective because rare but extremely high-speed incidents often contribute disproportionately to crash occurrence and severity.

The majority of research investigations have focused on speed analysis across different road settings and traffic scenarios, including urban and rural roads, freeways, and two-lane highways ( 14 , 15 , 17 , 22 , 25 , 27 ). These analyses have largely focused on specific road facility types, such as midblock, signalized, or unsignalized intersections, without a systematic comparison across these roadway facilities for hazardous roadway locations (HRLs) and non-hazardous roadway locations (non-HRLs) ( 28 ). This represents a significant research gap. To the best of the authors’ knowledge, no previous study has conducted a comparative analysis of traffic speed characteristics at HRLs and non-HRLs on rural two-lane highways. An HRL, often termed a “black spot” or “crash hots pot,” is defined as a specific location on the road exhibiting a significantly higher-than-average concentration of crashes, in contrast to non-HRLs that experience minimal crash events ( 29 , 30 ). In real-world traffic systems, HRLs and non-HRLs often coexist in proximity yet exhibit vastly different driving behaviors because of variations in road geometry, roadway environment, traffic enforcement intensity, traffic flow, land use, and driver risk perception ( 31 , 32 ). Therefore, comparing the speed distributions at HRLs and non-HRLs is statistically and practically insightful in determining how crash-prone segments differ from relatively safer roadway segments for vehicular speed behavior. Such a comparative analysis can reveal whether HRLs exhibit distinct distributional features, such as heavier tails, higher variance, or positive skewness, which may indicate extreme speeding behavior, sudden acceleration or deceleration zones, or irregular vehicular interactions ( 4 , 18 ). However, an additional methodological issue arises because spot speed observations collected during field surveys can capture intrinsic driver speed behavior and transient traffic interaction effects, such as vehicle-following or platooning during the measurement window, which can suppress the observed speeds relative to the desired flow conditions. Therefore, differences in observed speed distributions between HRLs and non-HRLs may partly represent interaction-induced suppression rather than true desired speed behavior. To address this concern, this study supplements conventional observed speed distribution analysis with a modified Kaplan–Meier (KM) approach for estimating the desired speed distribution ( 33 , 34 ). By distinguishing interaction-constrained observations from unconstrained observations using headway-based censoring, the KM framework provides a congestion-adjusted representation of speed behavior, offering a more robust basis for comparing speeds across hazardous and non-hazardous roadway segments.

In summary, the motivation for comparing speed distribution between HRLs and non-HRLs stems from the need to: (1) identify behavioral and statistical anomalies associated with crash occurrence; (2) refine model selection for different traffic conditions; and (3) suggest targeted and data-driven safety interventions. Therefore, this study aims to examine whether there is a statistically significant difference in speed behavior between identified HRLs and non-HRLs on rural two-lane highways under mixed traffic conditions. This study combines distribution fitting and hypothesis testing to assess similarities and differences in observed speed characteristics across the two location types. In addition, it employs a KM framework to estimate congestion-adjusted desired speed distributions. The null hypothesis $H_{0}$ assumes that there is no significant variation in speed distributions between the HRLs and non-HRLs, while the alternative hypothesis $H_{1}$ claims a significant difference in speed distributions between the two location types.

The structure of this study is as follows: the first section, Introduction and Background, introduces this study by explaining the relevance of traffic speed to road safety and providing a concise background on research on traffic speed distribution modeling. The next section, Methodology, presents the methodology for modeling spot speed distributions, including goodness-of-fit (GoF) evaluation, a hypothesis-testing framework for speed distribution analysis, and a modified KM approach for desired speed estimation. The following section, study area, data collection, and data preparation, presents the study area and field data collection procedures, along with the data stratification approach and data preparation steps for fitting the speed distributions. The subsequent section, analysis and results, describes the analysis and outcomes of hypothesis testing. The next section, discussion, presents a discussion on the key findings and their implications, and the final section, conclusions, provides the conclusions of this study.

Methodology

Speed distribution is an essential feature in traffic flow modeling for different roadway infrastructure facilities, including both signalized and unsignalized intersections on urban and rural roads under homogeneous and heterogeneous traffic conditions ( 35 ). This section investigates the characteristics of speed distributions formulated for spot speed measurements at HRLs and non-HRLs on rural two-lane highways under mixed traffic conditions.

In traffic flow theory, vehicular speed is considered a continuous random variable. Several earlier studies have assumed that speed data follow a normal distribution; however, this assumption often fails in heterogeneous traffic conditions, particularly in mixed traffic scenarios. Different vehicle types and varying driver behavior introduce nonlinearity and asymmetry in speed data. This is further exacerbated in congested traffic environments and at unsignalized intersections because of frequent and uncontrolled traffic maneuvers. These conditions necessitate flexible statistical approaches to capture skewness and extremes in speed data arising from interactions between slow and high-speed driving behaviors. This study used seven continuous probability distribution functions, including normal, lognormal, logistic, gamma, Weibull, Burr, and GEV, to accurately represent the speed data. These parametric distributions are fitted to vehicular speeds using the maximum likelihood estimation method. The empirical distributions investigated are as follows.

Normal Distribution

When the mean $μ$ and standard deviation σ of a continuous random variable x, are bell-shaped and symmetrical, then it can be considered as normally distributed. The probability density function (PDF) of x is given as follows (Equation 1):

f_{n} (x; σ, μ) = \frac{\exp (- \frac{1}{2} {(\frac{x - μ}{σ})}^{2})}{σ \sqrt{2 π}}

(1)

The cumulative distribution function (CDF) of the normal distribution is as follows (Equation 2):

F (x) = \frac{1}{2} [1 + \erf (\frac{x - μ}{σ \sqrt 2})]

(2)

Lognormal Distribution

A random variable x follows a lognormal distribution, which is characterized by two continuous parameters: the scale parameter σ and the location parameter $μ (μ, σ > 0)$ . The PDF of this distribution is given as follows (Equation 3):

f_{\ln} (x; σ, μ) = \frac{1}{x σ \sqrt{2 π}} (\exp (- \frac{{(\ln (x) - μ)}^{2}}{2 σ^{2}}))

(3)

The CDF of the lognormal distribution is as follows (Equation 4):

F (x) = \frac{1}{2} [1 + \erf (\frac{\ln x - μ}{σ \sqrt 2})]

(4)

Logistic Distribution

The logistic distribution is a symmetric, bell-shaped distribution similar to the normal distribution, but with heavier tails, making it suitable for various purposes, such as modeling traffic speed when data show moderate skewness and variability. The PDF of this distribution is given as follows (Equation 5):

f_{lo} (x; σ, μ) = \frac{\exp (- (\frac{x - μ}{σ}))}{σ {(1 + \exp (- (\frac{x - μ}{σ})))}^{2}}

(5)

The CDF of the logistic distribution is as follows (Equation 6):

F_{lo} (x; σ, μ) = \frac{1}{1 + e^{- \frac{(x - μ)}{σ}}}

(6)

Gamma Distribution

A random variable x is considered to follow a gamma distribution characterized by two continuous parameters, $σ$ and $k$ ; its PDF is defined as follows (Equation 7):

f_{g} (x; σ, k) = \frac{σ^{k}}{Γ (k)} x^{k - 1} \exp (- σ x)

(7)

The CDF of the Gamma distribution is as follows (Equation 8):

F (x, σ, k) = \int_{0}^{x} f (t; σ, k) dt = \frac{1}{Γ (k)} \int_{0}^{σ x} t^{k - 1} e^{- t} dt

(8)

This integral is known as a lower incomplete gamma function; therefore, the CDF can be compactly written as follows (Equation 9):

F (x, σ, k) = \frac{γ (k, σ x)}{Γ (k)}

(9)

where $γ (k, σ x)$ is defined as a lower incomplete gamma function defined by Equation 10.

γ (k, σ x) = \int_{0}^{σ x} t^{k - 1} e^{- t} dt

(10)

Weibull Distribution

The Weibull distribution is a two-parameter continuous probability distribution widely used in reliability analysis and survival modeling, as well as in traffic speed modeling, because of its flexibility in capturing skewness in speed data. The PDF of this distribution is as follows (Equation 11):

f_{w} (x; σ, k) = \frac{k}{σ} {(\frac{x}{σ})}^{k - 1} \exp (- {(\frac{x}{σ})}^{k})

(11)

The CDF of the Weibull distribution is expressed as follows (Equation 12):

F (x, σ, k) = \int_{0}^{x} f (t; σ, k) dt = 1 - e^{- {(\frac{x}{σ})}^{k}}

(12)

Burr Distribution

The Burr distribution, also known as the Burr Type XII distribution or Singh–Maddala distribution, belongs to the unimodal family of distributions distinguished by diverse shapes. This distribution consists of multiple parameters, including continuous shape parameters $α and k$ , a continuous scale parameter $β$ , and a continuous location parameter $γ$ . In the three-parameter Burr distribution, the continuous location parameter is zero $(γ = 0)$ . The PDF of this distribution is given as follows (Equation 13):

f_{br} (x; α, β, k) = \frac{α k {(\frac{x}{β})}^{α - 1}}{β {(1 + (\frac{x}{β}))}^{k + 1}}

(13)

The CDF of the Burr distribution is given as follows (Equation 14):

F (x) = 1 - {(1 + {(\frac{x}{β})}^{α})}^{- k}

(14)

The Burr distribution is appealing for data fitting because of its adaptable scale and location parameters and its flexible shape. A positive skewness number indicates that this distribution is sometimes considered as an alternative to a normal distribution.

GEV Distribution

The GEV distribution is a three-parameter distribution encompassing the Type I (Gumbel), Type II (Fréchet), and Type III (Weibull) extreme value distributions. The GEV distribution consists of three distinct parameters: (1) location $μ;$ (2) scale σ; and (3) shape $ξ$ . The PDF of this distribution is given as follows (Equation 15):

\begin{matrix} f_{gev} (x; μ, σ, ξ) = \frac{1}{σ} \exp (- {(1 + ξ \frac{x - μ}{σ})}^{- \frac{1}{ξ}}) \\ {(1 + ξ \frac{x - μ}{σ})}^{- 1 - \frac{1}{ξ}} \end{matrix}

(15)

The CDF of the GEV distribution is given as follows (Equation 16):

F_{gev} (x; μ, σ, ξ) = {\begin{matrix} \exp {- {[1 + ξ \frac{x - μ}{σ}]}^{- 1 / ξ}} \\ \exp {- \exp (- \frac{x - μ}{σ})} \end{matrix}}, \frac{if ξ \neq 0}{if ξ = 0}

(16)

These forms include Type I (Gumbel) when $ξ = 0$ , Type II (Fréchet) $ξ > 0$ , and Type III (Weibull) $ξ < 0$ . Only real numbers describe the shape and location parameters. The shape parameter determines the type of GEV distribution. Among the seven distributions, five models depend on a combination of two parameters derived from the distribution’s mean and standard deviation. Burr and GEV distributions have three or more parameters, implying greater flexibility than other distribution models. Therefore, this distribution has become highly plausible for modeling speed distributions.

GoF and Hypothesis Testing Framework for Speed Distribution Modeling

The GoF test is a commonly used approach for determining the distribution function that best fits the empirical speed distribution data. These tests serve as diagnostic tools that determine whether the hypothesized CDF adequately represents the observed speed distribution. In this study, the Kolmogorov–Smirnov (KS) test is used as a GoF test to evaluate the suitability of each distribution in modeling the observed speed data. The KS test measures the maximum absolute difference $D_{stat}$ between the empirical cumulative distribution function (ECDF) of the observed data and the CDF of the hypothesized distribution. Therefore, the $H_{0}$ of the KS test can be expressed by Equation 17.

H_{0} : F_{n} (x) = F (x)

(17)

The alternative hypothesis $H_{1}$ is expressed as follows (Equation 18):

H_{1} : F_{n} (x) \neq F (x)

(18)

The difference between the ECDF and theoretical CDF is given as follows (Equation 19):

D_{stat} = \max | F (x) - F_{n} (x) |, 0 < x < \infty

(19)

where

$F_{n} (x)$ = empirical CDF for n number of speed observations,

$F (x)$ = CDF of analytical hypothesized distribution,

k = number of estimated parameters,

n = number of observations, and

$\hat{L}$ = maximum likelihood value for the model.

The critical value of the KS test can be calculated as follows (Equation 20):

D_{n, α} = \frac{1.35810}{\sqrt{n}}, n > 50, α = 0.05

(20)

$D_{stat}$ is generally compared to the critical value $D_{n, α}$ at a predetermined significance level to assess the fitness of a hypothesized distribution model.

This study aims to validate the hypothesis that there is a statistically significant difference in vehicular speed distributions between HRLs and non-HRLs. A two-sample KS test has been used for hypothesis testing. This test compares the ECDFs of two independent samples of observed spot speed data to determine whether they are derived from the same underlying distribution.

Let:

\begin{matrix} F_{1} (x) : ECDF of spot speed data from hazardous \\ roadway locations \end{matrix}

\begin{matrix} F_{2} (x) : ECDF of spot speed data from non - hazardous \\ roadway locations \end{matrix}

The hypotheses are defined as follows (Equation 21):

H_{0} : F_{1} (x) = F_{2} (x) for all x

(21)

This implies that there is no statistically significant difference between the speed distribution at hazardous and non-hazardous locations. However, the alternative hypothesis is defined as follows (Equation 22):

H_{1} : F_{1} (x) \neq F_{2} (x) for at least one x

(22)

This implies that there is a statistically significant difference between the speed distribution at hazardous and non-hazardous locations.

The two-sample KS test statistic is calculated as the maximum absolute difference between the two ECDFs. Suppose the test statistic has a p-value significantly lower than the 5% significance level. In that case, the null hypothesis is rejected, indicating that the speed distributions at hazardous and non-hazardous locations are statistically different. The use of the KS test contributes to the statistical validation and reinforces the relevance of distribution-based traffic risk assessment frameworks. In addition, it provides empirical justification for implementing location-specific speed management strategies, targeted enforcement, and road geometric design interventions to improve road safety.

Modified KM Estimation of Desired Speed Distribution

This study uses a modified KM survival analysis approach to distinguish intrinsic speed-choice behavior from speed reduction resulting from vehicle interactions by estimating the desired (free-flow) speed distribution ( 36 ). The KM estimator is particularly suitable for situations with right-censored data, in which the true desired speed of a vehicle is not fully observed because it is constrained by a preceding vehicle. In traffic streams, such censoring occurs when drivers travel in groups or follow slower vehicles, preventing them from attaining their desired speed.

Let $v_{i}$ denote the observed speed of vehicle $i$ , where $i = 1, 2, 3 \dots . n$ . Each observation is classified as either desired speed (uncensored) or constrained speed (right-censored) based on the observed time headway $t_{i}$ relative to a threshold headway $h^{*}$ . The censoring indicator is defined as,

δ_{i} = {\begin{matrix} 1, if t_{i} > h^{*} (free - flow observation) \\ 0, if t_{i} \leq h^{*} (contrained observation) \end{matrix}

where $δ_{i} = 1$ represents that the observed speed is a true free-flow realization, and $δ_{i} = 0$ represents that the vehicle is influenced by a following vehicle, and its true desired speed is higher than the observed speed. In the latter case, the observation is treated as right-censored.

The desired speed distribution is estimated using the KM survival function. Let $v (j)$ represent the distinct ordered speed levels. The non-parametric survival estimator for the target speed $V_{0}$ is given by Equation 23.

\hat{S} (v) = \underset{v (j) \leq v}{Π} (1 - \frac{d_{j}}{r_{j}})

(23)

where $d_{j}$ represents the number of uncensored observations (free-flow events) occurring at speed $v (j)$ , $r_{j}$ represent denotes the number of vehicles “at risk” just before speed $v (j),$ the survival function $S (v)$ represents the probability that the desired speed exceeds a specified speed level $v$ , and the CDF of desired speeds is derived by Equation 24.

\hat{F} (v) = 1 - \hat{S} (v)

(24)

From the estimated survival function, the expected desired speed $E_{d}$ is computed as the restricted mean of the distribution up to the maximum observed speed $τ$ , using Equation 25.

E_{d} = \int_{0}^{τ} \hat{S} (v) dv

(25)

Similarly, the second moment of the desired speed distribution is obtained as follows using Equation 26.

E [V_{0}^{2}] = \int_{0}^{τ} 2 v \hat{S} (v) dv

(26)

which allows the standard deviation of the desired speed distribution to be calculated as (Equation 27),

SD (V_{0}) = \sqrt{E [V_{0}^{2}] - E_{d}^{2}}

(27)

In addition to the desired speed statistics, two supplementary indicators are reported to characterize traffic interaction effects. The constrained fraction is defined as follows (Equation 28):

ϕ = \frac{n_{c}}{n}

(28)

where $ϕ$ represents the proportion of vehicles whose speeds are influenced by leading vehicles, $n_{c}$ represents the number of constrained observations, and $n$ represents all observations. The maximum observed speed is (Equation 29),

τ = \max (v_{i})

(29)

which provides an empirical upper bound for the observed speed range.

The mean observed speed indicates operating speeds influenced by interactions, while the KM-based desired speed $E_{d}$ denotes the distribution of congestion-adjusted desired speeds after accounting for right-censored observations caused by a vehicle following. The difference between these two measures quantifies the extent to which vehicle interactions reduce the observed speed distribution. The KM framework facilitates a more reliable comparison of speed behavior between HRLs and non-HRLs by independently estimating the underlying speed distribution, without accounting for vehicle interactions. This approach ensures that the observed differences in speed distributions represent intrinsic driver speed-selection behavior associated with roadway characteristics, as well as variations in traffic density during the observation period ( 37 ).

Study Area, Data Collection, and Data Preparation

Haryana is a relatively small state in India, ranking 21st for area among the 28 states, accounting for less than 1.4% of the country’s landmass ( 38 ). From 2010 to 2020, the state’s road network expanded by 0.27%; however, registered motor vehicle ownership surged by 6.02% during the same time ( 39 , 40 ). The state has witnessed a disproportionately high incidence of fatal crashes, accounting for 3.4% of all crashes and 3.3% of total fatalities across India in 2022 ( 41 ). An investigation into regional crash trends on rural two-lane highways in the Northwest and Southeast regions of Haryana is conducted to facilitate further crash analysis. The selection of this region is motivated by its documented high crash rate, representative traffic composition, and increasing vehicle density. In the southeast region of the state, there are 11 two-lane, undivided rural National Highways (NHs) and State Highways (SHs). Table 1 presents the number of fatalities and the fatality rate (per 100 km) for these 11 highways. Among these highways, it is observed that three SHs (SH 14, SH 16A, and SH 22) and three NHs (NH 709, NH 334B S1, and NH 334B S2) have a high fatality rate (fatalities per 100 km) compared with other highways, as given in Table 1 and shown in Figure 1.

Table 1.

Average Fatality Rate Per 100 km of Two-Lane National Highways (NHs) and State Highways (SH) in Haryana State

Highway name	Length (km)	Number of fatalities	Fatalities per 100 km
NH 248A	76	20	26.3
NH 334B	166	64	38.5
NH 709	175	83	47.4
NH 709A	80	19	23.7
SH 12	106	18	16.9
SH 14	61	30	49.2
SH 15A	39	9	23.1
SH 16A	63	28	44.4
SH 18	34	5	14.7
SH 19	49	8	16.3
SH 22	23	14	60.8

Note. Bold denotes the highways selected for further analysis, along with their corresponding statistics.

Figure 1.

Typical selected National Highways (NHs) and State Highways (SHs) from Haryana, India.

It is important to note that HRLs and non-HRLs were identified based on the outcomes of a previous objective of the authors’ ongoing doctoral research. In that study, HRLs were systematically identified using a negative binomial–Lindley (NB-L) model and a kernel density empirical Bayes (KDEB) approach, using police-reported, georeferenced crash data obtained from the State Police Crime and Criminal Tracking Network System (CCTNS) for 2017–2019. A location was classified as hazardous if its expected crash frequency or density exceeded the top 5% threshold determined in both NB-L and KDEB models. In contrast, locations with crash frequencies or densities consistently below this threshold over the same period were classified as non-hazardous.

Speed data are subsequently collected at these predetermined HRLs and non-HRLs along selected NHs and SHs, including NH 709, NH 334B S1, NH 334B S2, SH 14, SH 16A, and SH 22. The selected locations (HRLs and non-HRLs) consist of straight segments and three-legged unsignalized intersections located on flat terrain, with pavement widths from 7 to 10 m. Each location (e.g., L_1, L_2, and L_3) represents an independent speed measurement point that may be classified as hazardous or non-hazardous and provides location-specific speed behavior influenced by local geometric features, the localized roadway environment, and traffic volume. The locations under investigation are far from the influence of longitudinal gradients and horizontal curves, preventing confounding effects from the gradual acceleration and deceleration of vehicles. At each location, spot speed measurements are obtained for various vehicle types using a Light Detection and Ranging speed gun during a fixed 2 h duration on weekdays between September and October 2024. Data collection was conducted during daylight hours (07:00 a.m.–06:00 p.m.) in clear weather conditions to ensure uniform visibility and to minimize behavioral anomalies associated with adverse conditions. The traffic in the study region is characterized by a heterogeneous environment with poor lane discipline, making speed data collection challenging. To minimize behavioral impedance, the speed of upstream and downstream vehicles is measured at 80–200 m before approaching the conflicting zone, targeting the rear of each vehicle ( 42 ). This ensures that observed speeds reflect natural driving behavior. The observation duration remains uniform across all locations, and the collected samples inherently incorporate variations in traffic composition and operating conditions across peak and non-peak periods within the daytime window.

The 2 h videography survey provides traffic data, categorized into seven distinct vehicle classes: (1) motorized two-wheelers (2W); (2) motorized three-wheelers (3W); (3) standard cars; (4) trucks; (5) buses; (6) tractors; and (7) LCVs, according to the Indo-HCM classification. Tables 2 and 3 present the spatial information for selected hazardous and non-hazardous locations, along with their traffic and speed composition samples.

Table 2.

Location and Traffic Composition at Each Selected Location on the National Highways (NHs)

Location ID	Latitude	Longitude	Sample size	Traffic composition (%) (2W:3W:car:truck:bus:tractor:LCV)	Sample composition (%) (2W:3W:car:truck:bus:tractor:LCV)
NH 709
NH 709 L_1	28.7597	76.1011	733	36:7:23:17:7:8:2	29:9:23:12:8:12:7
NH 709 L_2	28.7263	76.0734	789	35:4:22:18:8:11:2	23:11:28:11:12:10:7
NH 709 L_3	28.6465	75.9685	658	41:1:16:12:9:12:9	24:8:24:12:9:14:9
NH 709 L_4	28.5981	75.8444	816	43:1:19:10:8:13:6	26:9:24:12:12:8:8
NH 709 L_5	28.5862	75.8241	741	42:1:19:15:10:10:3	26:9:24:12:12:8:8
NH 709 L_6	28.5273	75.7987	866	41:2:17:14:8:14:4	24:10:23:13:10:14:7
NH 709 L_7	28.4874	75.7990	887	42:1:15:18:8:9:7	25:7:23:11:12:13:9
NH 709 L_8	28.4446	75.8095	967	40:1:18:16:10:12:2	24:8:23:12:11:13:8
NH 334B S1
NH 334B S1 L_1	28.6089	76.3251	1,081	19:2:34:35:3:5:2	21:8:24:13:11:13:10
NH 334B S1 L_2	28.6147	76.3278	1,035	24:1:33:31:3:4:4	22:10:21:18:9:11:9
NH 334B S1 L_3	28.6391	76.3566	946	23:2:34:30:4:3:3	17:8:19:15:13:18:11
NH 334B S1 L_4	28.6396	76.4011	1,026	23:1:33:31:5:4:3	17:11:23:16:9:14:10
NH 334B S1 L_5	28.6259	76.4321	1,043	23:2:31:34:4:3:3	17:13:21:16:12:13:8
NH 334B S1 L_6	28.6285	76.5441	980	27:1:29:33:5:4:1	24:11:24:20:19:12:10
NH 334B S1 L_7	28.6362	76.5740	1,020	25:2:32:33:3:4:1	17:12:17:19:11:14:10
NH 334B S1 L_8	28.6312	76.5915	1,306	27:2:30:32:4:3:2	19:7:21:17:13:13:11
NH 334B S1 L_9	28.6214	76.6170	1,136	25:2:36:29:4:3:1	17:12:21:15:12:12:11
NH 334B S2
NH 334B S2 L_1	28.5933	76.2243	1,032	18:2:29:40:3:6:2	18:10:21:16:13:15:7
NH 334B S2 L_2	28.5940	76.1879	1,134	23:2:36:30:4:4:1	22:10:21:14:11:14:8
NH 334B S2 L_3	28.5946	76.1580	1,051	22:2:35:32:4:3:2	21:9:19:17:11:15:8
NH 334B S2 L_4	28.5912	76.1124	1,085	24:1:34:32:4:3:2	22:12:22:14:12:8:9
NH 334B S2 L_5	28.5389	75.9938	1,037	21:1:35:34:4:4:1	18:13:23:10:16:12:8
NH 334B S2 L_6	28.5461	76.0270	954	22:1:32:35:5:4:1	24:8:23:12:6:19:8
NH 334B S2 L_7	28.5651	76.0656L	851	23:1:31:34:5:4:2	22:11:24:11:10:11:11

Note: bold font = hazardous roadway locations; standard font = non-hazardous roadway locations; 2W = two-wheeler; 3W = three-wheeler; LCV = light commercial vehicle.

Table 3.

Location and Traffic Composition at Each Selected Location on State Highways (SHs)

Location ID	Latitude	Longitude	Sample Size	Traffic composition (%) (2W:3W:car:truck:bus:tractor:LCV)	Sample composition (%) (2W:3W:car:truck:bus:tractor:LCV)
SH 14
SH 14 L_1	29.3636	76.4674	1,144	26:3:29:23:7:8:6	23:12:20:9:15:13:9
SH 14 L_2	29.3723	76.5071	1,167	26:2:30:24:9:5:4	20:15:20:8:17:10:9
SH 14 L_3	29.3777	76.5354	1,021	23:1:31:24:8:7:6	17:11:20:13:13:16:11
SH 14 L_4	29.3933	76.6201	1,106	25:1:30:22:9:5:8	18:10:20:12:13:14:12
SH 14 L_5	29.4065	76.6873	989	27:1:33:20:6:7:6	19:7:18:14:16:16:10
SH 14 L_6	29.3938	76.8333	1,052	24:1:35:21:7:9:3	18:6:20:14:14:13:15
SH 16A
SH 16A L_1	28.9760	76.3098	774	26:1:31:19:8:14:2	16:11:22:14:16:12:10
SH 16A L_2	29.0046	76.3620	772	27:2:28:17:9:12:5	23:10:23:12:13:11:9
SH 16A L_3	29.0381	76.4442	778	24:1:32:14:7:17:5	21:12:21:13:12:11:10
SH 16A L_4	29.0492	76.4875	836	22:1:36:15:7:14:5	19:14:19:15:11:13:10
SH 16A L_5	29.0511	76.4964	874	24:1:33:12:6:16:8	20:10:20:16:12:13:9
SH 16A L_6	29.0541	76.5152	812	21:2:34:10:9:15:9	19:9:19:13:12:18:10
SH 16A L_7	29.0704	76.5528	799	21:1:35:11:8:18:6	21:13:18:12:14:13:10
SH 16A L_8	29.1151	76.6427	806	28:1:37:9:7:14:4	20:11:20:16:11:13:9
SH 22
SH 22 L_1	28.6586	76.8153	1,094	17:1:15:48:6:10:4	19:10:25:15:12:11:8
SH 22 L_2	28.6648	76.8317	1,014	19:1:17:49:5:6:3	21:12:19:16:11:13:7
SH 22 L_3	28.6693	76.8663	1,093	14:2:20:49:6:7:2	14:10:25:17:11:13:9
SH 22 L_4	28.6668	76.8471	979	18:2:22:46:4:5:3	20:11:20:16:11:13:12
SH 22 L_5	28.6662	76.8423	1,039	16:2:23:43:6:8:2	19:11:16:17:15:13:9
SH 22 L_6	28.6702	76.8731	900	18:3:24:40:7:4:4	18:8:17:20:16:14:8

Note: bold font = hazardous roadway locations; standard font = non-hazardous roadway locations; 2W = two-wheeler; 3W = three-wheeler; LCV = light commercial vehicle.

Data Preparation and Stratification for Modeling

The reliability and accuracy of statistical modeling in road safety research critically depend on the rigor of the data preparation process. To examine the sensitivity and robustness of statistical modeling to data availability, the full data set has been systematically divided into four data fractions (25%, 50%, 75%, and 100%) of the total sample size. This stratification is designed to investigate how data granularity affects model performance, parameter convergence, and the reliability of statistical inference. Furthermore, this study adopts a repetition-based sampling framework to evaluate the modeling consistency. For each highway and data fraction (25%, 50%, 75%, and full data set [i.e., 100%]), four data sets (Repetition_1–Repetition_4) are generated using the simple random sampling with replacement approach. The purpose of using several repetitions is twofold. First, these repetitions reinforce the robustness and stability of model fitting methodologies and ensure that any single data set configuration does not unduly influence inferential results. Second, to evaluate the consistency of results across independently drawn subsets of data. This methodological decision improves the generalizability of results and replicability of the findings across diverse traffic scenarios and geographical contexts ( 43 , 44 ).

The prepared data structure, which incorporates variability across location types (i.e., HRLs and non-HRLs), data fractions, repetitions, and highways, enables a comprehensive spot speed analysis based on a stratified experimental design, ensuring the reliability and validity of insights derived from modeling and inferential procedures.

Analysis and Results

Descriptive Statistics of Spot Speeds

Descriptive statistics have been computed for each surveyed location on the NHs and SHs to better understand the basic characteristics and variability of the spot speed data. The key statistical parameters include the sample size, minimum and maximum speed (in km/h), variance, skewness, and kurtosis. Tables 4 and 5 summarize these parameters for HRLs and non-HRLs on NH 709, NH 334B S1, NH 334B S2, SH 14, SH 16A, and SH 22. The Location ID column in the tables includes suffixes, such as L1, L2, and L3 to denote Location 1, Location 2, and Location 3, respectively, along a specific highway segment, as shown in Figure 2. An interactive map of the different locations (e.g., L1, L2, and L3 along NH and SHs), as shown in Figure 2, is available online at: https://parveen4007.github.io/Spot-Speed-Measurements-at-Hazardous-and-Non-Hazardous-Locations-on-National-and-State-Highways/.

Table 4.

Descriptive Statistics for Speed Data at Each Location on National Highways

Location ID	Location type	Sample size	Min. speed (km/h)	Max. speed (km/h)	Variance	Skewness	Kurtosis
NH 709 L_1	Unsignalized T junction	733	16	89	247.4	0.4	−0.6
NH 709 L_2	Midblock	789	18	135	544.9	0.3	−0.4
NH 709 L_3	Unsignalized T junction	658	15	99	329.4	0.3	−0.5
NH 709 L_4	Midblock	816	16	105	412.0	-0.1	−0.9
NH 709 L_5	Midblock	741	15	85	277.5	0.3	−0.9
NH 709 L_6	Unsignalized T junction	866	19	95	340.9	0.1	−0.8
NH 709 L_7	Midblock	887	17	115	536.3	0.3	−0.6
NH 709 L_8	Unsignalized T junction	967	15	92	373.2	0.2	−1.1
NH 334B S1 L_1	Unsignalized T junction	1,081	18	93	351.9	0.1	−0.8
NH 334B S1 L_2	Unsignalized T junction	1,035	16	116	423.9	0.5	0.0
NH 334B S1 L_3	Unsignalized T junction	946	17	88	302.5	0.0	−0.8
NH 334B S1 L_4	Unsignalized T junction	1,026	16	98	336.3	0.1	−0.8
NH 334B S1 L_5	Midblock	1,043	18	109	355.3	0.4	0.0
NH 334B S1 L_6	Midblock	980	18	88	285.5	0.1	−1.0
NH 334B S1 L_7	Unsignalized T junction	1,020	19	82	279.1	0.0	−0.9
NH 334B S1 L_8	Unsignalized T junction	1,306	20	85	219.6	0.1	−0.7
NH 334B S1 L_9	Midblock	1,136	15	82	204.3	0.3	−0.5
NH 334B S2 L_1	Unsignalized T junction	1,032	15	111	345.8	0.2	−0.6
NH 334B S2 L_2	Midblock	1,134	17	120	532.4	0.5	−0.3
NH 334B S2 L_3	Midblock	1,051	18	93	377.0	0.0	−1.1
NH 334B S2 L_4	Midblock	1,085	19	110	460.6	0.3	−0.6
NH 334B S2 L_5	Unsignalized T junction	1,037	15	92	371.8	0.2	−1.0
NH 334B S2 L_6	Midblock	954	17	123	552.9	0.6	0.0
NH 334B S2 L_7	Unsignalized T junction	851	16	89	300.4	0.5	−0.4

Note: Min. = minimum; Max. = maximum. Bold font = hazardous roadway locations; standard font = non-hazardous roadway locations.

Table 5.

Descriptive Statistics for Speed Data at Each Location on State Highways

Location ID	Location type	Sample size	Min. speed (km/h)	Max. speed (km/h)	Variance	Skewness	Kurtosis
SH 14 L_1	Unsignalized T junction	1,144	16	108	370.8	0.4	−0.3
SH 14 L_2	Midblock	1,167	19	119	482.0	0.6	0.1
SH 14 L_3	Unsignalized T junction	1,021	15	82	218.3	0.2	−0.5
SH 14 L_4	Midblock	1,106	19	109	411.7	0.2	−0.6
SH 14 L_5	Midblock	989	18	96	362.4	0.3	−0.8
SH 14 L_6	Unsignalized T junction	1,052	15	79	192.9	0.5	−0.3
SH 16A L_1	Midblock	774	15	79	213.5	0.4	−0.4
SH 16A L_2	Unsignalized T junction	772	18	121	512.9	0.5	−0.3
SH 16A L_3	Midblock	778	22	117	568.1	0.4	−0.4
SH 16A L_4	Unsignalized T junction	836	16	74	176.2	0.3	−0.6
SH 16A L_5	Midblock	874	18	108	455.2	0.3	−0.6
SH 16A L_6	Midblock	812	23	95	334.5	0.4	−0.5
SH 16A L_7	Midblock	799	17	115	422.4	0.5	0.0
SH 16A L_8	Unsignalized T junction	806	17	87	281.7	0.6	−0.5
SH 22 L_1	Midblock	1,094	18	112	430.6	0.6	0.0
SH 22 L_2	Unsignalized T junction	1,014	16	97	314.4	0.3	−0.7
SH 22 L_3	Midblock	1,093	25	110	459.1	0.4	−0.7
SH 22 L_4	Unsignalized T junction	979	15	84	265.1	0.2	−1.0
SH 22 L_5	Midblock	1,039	18	87	277.5	0.2	−0.9
SH 22 L_6	Midblock	900	17	128	253.4	0.3	−0.3

Note: Min. = minimum; Max. = maximum. Bold font = hazardous roadway locations; standard font = non-hazardous roadway locations.

Figure 2.

Spot speed data collection locations on National and State Highways.

The average sample size across all hazardous locations is relatively large, as given in Table 4, indicating sufficient data collection at each location. Extremely high speeds have been observed, from 82 to 135 km/h, while minimum speeds vary between 15 and 20 km/h. The variance in speed data shows significant variation across locations, from 204.3 to 552.9, indicating location-specific heterogeneity in vehicular traffic. Of note, midblock segments, such as NH 334B S1_L9 (non-hazardous) and NH 334B S2_L6 (hazardous), have the highest variances, indicating more dispersed speed profiles regardless of location type. For distributional shape, skewness values are predominantly between 0.1 and 0.6, indicating a slight right skew in speed distributions, which is common in mixed traffic conditions where a few high-speed vehicles increase the mean. The variance in speed data shows significant variation across locations, from 204.3 to 552.9, indicating location-specific heterogeneity in vehicular traffic.

The descriptive statistics for SH locations (Table 5) exhibit almost similar trends. HRLs have a minimum speed of 15 km/h and a maximum speed of 128 km/h.

Results

This section presents the results of fitting seven continuous probability distributions to spot speed data collected at hazardous and non-hazardous locations across six distinct highways. The subsequent results are analyzed using multiple interrelated criteria, such as KS test outcome, best-fit distribution, estimated parameters, mean speed with confidence intervals, data fraction variability, and repetition performance to determine whether the observed speeds significantly differ between HRLs and non-HRLs and how consistently the results remain stable across different data fractions and highways.

Distribution Trends of Observed Speed Data Across HRLs and Non-HRLs

The location, shape, and scale parameters of different parametric distribution models for vehicular speeds at specific HRLs and non-HRLs are presented in Tables A1 and A2 in the Appendix.

Based on penalized criteria, including Akaike Information Criterion (AIC) and BIC stands for Bayesian Information Criterion (BIC), as given in Tables A3 and A4 in the Appendix, the GEV distribution consistently provides the best-fit for spot speed data across all evaluated roadway segments under heterogeneous traffic conditions. Among the seven continuous probability distributions considered, the GEV outperformed the other distributions in capturing the underlying characteristics of speed variability. Therefore, all subsequent hypothesis testing evaluating significant differences in speed distributions between HRLs and non-HRLs will use the GEV distribution, given its superior GoF across diverse traffic contexts.

Distribution Robustness Across Data Fractions

The two-sample KS test evaluates whether the speed distributions at hazardous and non-hazardous locations are statistically different. As detailed in Table 6, unbold cells indicate that the test has rejected the null hypothesis (i.e., $H_{0}$ : no significant difference in speed distributions), whereas bold cells indicate that the test fails to reject it.

Table 6.

Two-Sample KS Test Statistics across Data Fractions and Repetitions (Rep) for Different Highways

		Data fractions
		25%	50%	75%	100%
Highway	(Rep)	Two-sample KS test statistic (p-value)				Best-performing repetition
NH 709	Rep_1	0.062 (0.084)	0.046 (0.062)	0.049 (0.005)	0.051 (0.001)	Rep_2
NH 709	Rep_2	0.067 (0.048)	0.060 (0.039)	0.192 (< 0.001)	0.051 (0.001)
NH 709	Rep_3	0.073 (0.025)	0.050 (0.129)	0.051 (0.117)	0.051 (0.001)
NH 709	Rep_4	0.061 (0.097)	0.048 (0.157)	0.075 (0.005)	0.051 (0.001)
NH 334B S1	Rep_1	0.109 (< 0.001)	0.097 (< 0.001)	0.092 (< 0.001)	0.103 (< 0.001)	Rep_1
NH 334B S1	Rep_2	0.073 (0.003)	0.088 (< 0.001)	0.101 (< 0.001)	0.103 (< 0.001)
NH 334B S1	Rep_3	0.080 (0.001)	0.049 (0.049)	0.043 (0.174)	0.103 (< 0.001)
NH 334B S1	Rep_4	0.122 (< 0.001)	0.094 (< 0.001)	0.094 (< 0.001)	0.103 (< 0.001)
NH 334B S2	Rep_1	0.065 (0.048)	0.055 (0.009)	0.065 (0.048)	0.066 (< 0.001)	Rep_3
NH 334B S2	Rep_2	0.061 (0.045)	0.072 (0.004)	0.078 (0.002)	0.066 (< 0.001)
NH 334B S2	Rep_3	0.091 (0.001)	0.064 (0.018)	0.075 (0.003)	0.066 (< 0.001)
NH 334B S2	Rep_4	0.069 (0.028)	0.078 (0.002)	0.063 (0.018)	0.066 (<0.001)
SH 14	Rep_1	0.095 (0.001)	0.102 (< 0.001)	0.110 (< 0.001)	0.111 (< 0.001)	Rep_3
SH 14	Rep_2	0.133 (< 0.001)	0.104 (< 0.001)	0.078 (0.002)	0.111 (< 0.001)
SH 14	Rep_3	0.130 (< 0.001)	0.119 (< 0.001)	0.128 (< 0.001)	0.111 (< 0.001)
SH 14	Rep_4	0.098 (0.008)	0.123 (< 0.001)	0.111 (< 0.001)	0.111 (< 0.001)
SH 16A	Rep_1	0.041 (0.500)	0.028 (0.520)	0.025 (0.254)	0.025 (0.254)	Rep_3
SH 16A	Rep_2	0.023 (0.976)	0.028 (0.795)	0.044 (0.249)	0.025 (0.254)
SH 16A	Rep_3	0.057 (0.138)	0.050 (0.137)	0.039 (0.387)	0.025 (0.254)
SH 16A	Rep_4	0.049 (0.266)	0.043 (0.274)	0.027 (0.817)	0.025 (0.254)
SH 22	Rep_1	0.073 (0.032)	0.068 (0.002)	0.074 (< 0.001)	0.071 (< 0.001)	Rep_2
SH 22	Rep_2	0.095 (0.002)	0.028 (< 0.001)	0.078 (0.004)	0.071 (< 0.001)
SH 22	Rep_3	0.079 (0.016)	0.075 (0.006)	0.095 (0.002)	0.071 (< 0.001)
SH 22	Rep_4	0.094 (0.002)	0.048 (0.183)	0.071 (0.011)	0.071 (< 0.001)

Note: NH = National Highway; SH = State Highway; KS = Kolmogorov–Smirnov.

An important finding from the two-sample KS testing is the consistent rejection of the null hypothesis across the majority of repetitions and data fractions, confirming that speed patterns differ statistically between HRLs and non-HRLs. The results, as mentioned previously, support the reliability of spot speed as a reliable indicator for evaluating roadway safety risk. An interesting pattern emerges while analyzing how the KS statistics and p-values vary across increasing data fractions (25%, 50%, 75%, 100%). The KS statistic values typically increase with data size across various highways (e.g., NH 709 and SH 14), suggesting that larger datasets improve the test’s ability to identify nuanced distributional differences ( 45 , 46 ). This pattern is simply because, as the percentage of available data or the sample size increases, the empirical estimates of the distribution parameters, such as shape, scale, and location, converge more closely to their true population values ( 47 , 48 ). Thus, the robustness of the fitted distributions improves with increased data fractions, thereby enhancing the reliability of the inference. However, in some cases (e.g., SH 16A), notably low KS statistics with high p-values have been observed across all data segments, potentially because of uniformity in driving behavior despite the nature of locations, resulting from local speed limits, lack of enforcement, and geometric consistency ( 49 , 50 ).

Sampling Repetitions and Model Robustness

In addition to distributional robustness, this study adopts a repetition-based framework to evaluate modeling consistency by creating four mutually exclusive datasets, or repetitions (Repetition_1–Repetition_4), for each highway and data fraction. The comparative analysis of repetitions (Table 6) shows that some repetitions consistently yield higher KS statistics and lower p-values across all data fractions, indicating a stronger ability to detect distributional differences between HRLs and non-HRLs. For example, Repetition_2 for NH 709, Repetition_1 for NH 334B S1, Repetition_3 for NH 334B S2, SH 14, SH 16A, and Repetition_2 for SH 22 have demonstrated consistent statistical significance (p < 0.05) and are thus deemed the most reliable. This consistency suggests that these datasets more accurately reflect the underlying distributional differences, possibly because of their better temporal coverage or a wider range of heterogeneity in traffic conditions ( 51 , 52 ). The implementation of repetition-based testing serves as a safeguard against overfitting to a particular data configuration and validates the robustness of the results across other observational environments ( 47 , 48 ). The study demonstrates that, even under varying real-world sample settings, the statistical differentiation between hazardous and non-hazardous locations remains consistently observable.

The comparative analysis of GEV distribution parameters, consisting of location, shape, and scale, between HRLs and non-HRLs across varying data fractions (25%, 50%, 75%, and 100%), as shown in Tables 7 and 8, provides significant insights into traffic speed characteristics for different highway types. The GEV location parameter consistently shows higher values for HRLs, indicating a greater tendency toward extreme-speed or crash risk in these segments than in non-hazardous roadway segments. For example, the location parameter for NH 709 ranges from 46.62 at the 25% data fraction to 45.51 at 100% data fraction at HRLs, whereas at non-HRLs, the values decrease from 45.14 to 44.37. Likewise, NH 334B S1 and S2 exhibit higher location parameters for HRLs, with NH 334B S1 achieving 46.73 at a 100% data fraction, compared to 42.84 for non-HRLs, indicating a distinct contrast between the two types of road segments. Similar trends have been observed across the majority of highways for vehicular speed profiles at hazardous and non-hazardous locations, particularly when speed deviations are pronounced but not extreme ( 52 , 53 ).

Table 7.

Parameter Estimates of Best-Fit Distribution across Data Fractions for Hazardous Locations

Highway	Hazardous locations
	Best-fit distribution	Parameter	Data fractions
	Best-fit distribution	Parameter	25%	50%	75%	100%
NH 709	GEV	Location	46.62	45.60	45.38	45.51
		Shape	19.76	19.29	19.29	19.24
		Scale	−0.14	−0.14	−0.12	−0.12
NH 334B S1	GEV	Location	46.14	46.12	46.11	46.73
		Shape	17.14	17.17	17.03	17.43
		Scale	−0.18	−0.18	−0.19	−0.20
NH 334B S2	GEV	Location	45.75	45.33	45.75	45.14
		Shape	19.16	19.17	19.16	19.54
		Scale	−0.13	−0.13	−0.13	−0.11
SH 14	GEV	Location	43.98	44.37	44.60	44.52
		Shape	18	17.88	18.05	18.02
		Scale	−0.12	−0.12	−0.12	−0.12
SH 16A	GEV	Location	41.61	41.90	42.11	42.02
		Shape	16.99	16.99	16.92	17.10
		Scale	−0.06	−0.07	−0.07	−0.07
SH 22	GEV	Location	41.82	42.30	42.62	42.37
		Shape	16.44	16.66	16.82	16.77
		Scale	−0.05	−0.06	−0.08	−0.05

Note: NH = National Highway; SH = State Highway GEV = generalized extreme value.

Table 8.

Parameter Estimates of Best-Fit Distribution across Data Fractions for Non-Hazardous Locations

Highway	Non-hazardous locations
	Best-fit distribution	Parameter	Data fractions
	Best-fit distribution	Parameter	25%	50%	75%	100%
NH 709	GEV	Location	45.14	45.09	44.58	44.37
		Shape	18.93	18.74	18.37	18.32
		Scale	−0.25	−0.24	−0.22	−0.20
NH 334B S1	GEV	Location	42.37	42.60	42.78	42.84
		Shape	16.03	16.16	16.15	16.22
		Scale	−0.17	−0.18	−0.18	−0.18
NH 334B S2	GEV	Location	43.69	44.01	43.69	43.63
		Shape	18.06	18.16	18.06	17.95
		Scale	−0.14	−0.16	−0.14	−0.16
SH 14	GEV	Location	40.48	40.62	40.61	40.61
		Shape	15.42	15.25	15.26	15.23
		Scale	−0.18	−0.09	−0.09	−0.09
SH 16A	GEV	Location	40.79	41.35	41.21	41.14
		Shape	16.18	16.23	16.24	16.18
		Scale	−0.03	−0.05	−0.03	−0.03
SH 22	GEV	Location	41.50	40.82	41.10	40.82
		Shape	16.53	16.30	16.30	16.19
		Scale	−0.20	−0.19	−0.20	−0.19

Note: NH = National Highway; SH = State Highway; GEV = generalized extreme value.

The shape parameters indicating the tail behavior of the distribution, the HRLs, often show higher values than non-HRLs, suggesting a greater frequency of crashes or speed events. For example, NH 709 has a shape parameter of 19.76 at the 25% data fraction for HRLs. In contrast, non-HRLs have a shape parameter of 18.93 and tend to exhibit lower values across data fractions. The shape parameter does not vary substantially across data fractions for HRLs and non-HRLs data, indicating a consistent distribution of extreme values within those segments. However, the non-HRLs demonstrate greater variability in the shape parameter. For example, in NH 334B S2, the shape parameter values vary from 18.06 at a 25% data fraction to 17.95 at 100% data fraction, suggesting a relatively less stable data distribution compared to HRLs ( 54 ). The scale parameter, which denotes the dispersion of the data, shows a consistent trend: HRLs exhibit less negative scale values than non-HRLs. For example, in NH 709, the scale parameter ranges from −0.14 to 0.12 for HRLs, whereas for non-HRLs it ranges from −0.25 to −0.20, further emphasizing the higher concentration of extreme values in hazardous zones.

Stability and Superiority of GEV Distribution Across Data Fractions

The cross-fractional analysis revealed significant consistency in identifying the best-fit distribution models. As shown in Tables 7 and 8, the GEV is the most stable and consistently best-performing model, both for HRLs and non-HRLs, and across nearly all highways and data fractions. Its flexibility in modeling skewed, tail-heavy vehicular speed distributions enhances its ability to fit empirical speed data. The GEV shape parameter k has marginally lower values at HRLs than at non-HRLs on the majority of highways, as shown in Tables 7 and 8, suggesting higher speed variability at locations associated with crash risk. The GEV is particularly sensitive due to its tail modeling capabilities ( 4 , 18 , 22 , 55 ). In these cases, three-parameter distributions, such as the Burr and GEV distributions, highlight the importance of capturing rare and high-magnitude deviations that are overlooked by two-parameter distributions, including the normal, gamma, and Weibull distributions ( 56 – 58 ). Importantly, as the sample size increases from 25% to 100%, the GEV distribution parameters consistently regain their statistical superiority across all highways, reaffirming the model’s parameter convergence and stability in larger data sets. This pattern aligns with statistical theory, which states that the asymptotic characteristics of estimators improve the fit quality and reduce estimation variance as the sample size increases ( 59 , 60 ). The consistent re-emergence of GEV as the best distribution in larger and more diversified data sets reinforces its methodological suitability and reliability for speed-based risk modeling ( 61 , 62 ).

The ECDFs and PDFs, along with fitted GEV distributions, collectively demonstrate the underlying distributional characteristics and tail behaviors of vehicle speeds between HRLs and non-HRLs along selected NHs and SHs, as shown in Figures 3 –8. Among these visualizations, CDFs illustrate shifts in overall speed levels by showing how rapidly the cumulative probability rises. In addition, PDFs highlight differences in dispersion and tail behavior, which are essential for understanding extreme-speed events on both HRLs and non-HRLs. The fitted GEV curves across all highways, including NH 709, NH 334B S1, NH 334B S2, SH 14, SH 16A, and SH 22, closely align with the empirical PDFs, especially in the upper-speed range, confirming their suitability for modeling the central and tail characteristics of observed speeds.

Figure 3.

(a) Comparison of ECDFs and (b) PDFs with fitted GEV distributions for hazardous (HRL) and non-hazardous locations (non-HRLs) on NH 709.

Figure 4.

(a) Comparison of ECDFs and (b) PDFs with fitted GEV distributions for hazardous (HRL) and non-hazardous locations (non-HRLs) on NH 334B S1.

Figure 5.

(a) Comparison of ECDFs and (b) PDFs with fitted GEV distributions for hazardous (HRL) and non-hazardous locations (non-HRLs) on NH 334B S2.

Figure 6.

(a) Comparison of ECDFs and (b) PDFs with fitted GEV distributions for hazardous (HRL) and non-hazardous locations (non-HRLs) on SH 14.

Figure 7.

(a) Comparison of ECDFs and (b) PDFs with fitted GEV distributions for hazardous (HRL) and non-hazardous locations (non-HRLs) on SH 16A.

Figure 8.

(a) Comparison of ECDFs and (b) PDFs with fitted GEV distributions for hazardous (HRL) and non-hazardous locations (non-HRLs) on SH 22.

On NHs, particularly NH 334B S1, as shown in Figure 4, the hazardous (hot spot) segment CDF is shifted to the right of the non-hazardous (non-hot spot) CDF over a significant portion of the mid-speed range, indicating consistently higher operating speeds at HRLs. This shift is characterized by a wider PDF with a heavier right tail, suggesting higher average speeds and increased speed variability and a greater likelihood of extreme-speed events. A related but more subtle pattern is observed on NH 334B S2 and NH 709, where the hazardous and non-hazardous CDFs largely overlap in the central region, implying similar typical speeds for the majority of drivers. However, separation becomes more evident in the upper-tail, as hazardous CDFs approach unity at higher speeds, and the corresponding PDFs exhibit extended right tails. This suggests that, on these highways, hazardous roadway differentiation is primarily influenced by a few, but significantly higher-speed, observations rather than a pronounced shift in average speed. The corresponding PDFs highlight this distinction, with HRLs exhibiting flatter peaks and heavier right tails, while non-HRLs have more sharply concentrated peaks around the modal speed range. In contrast, SH 22 shows a more moderately separated CDF for observed speeds at HRLs and non-HRLs, where differences between the two locations are less evident in the central region but become readily apparent in the upper-tail. The PDF comparison for SH 22 similarly shows greater dispersion and tail heaviness at HRLs, indicating that extreme-speed behavior plays a significant role in distinguishing risk-prone locations along this corridor.

SH 16A presents a different scenario in which the hazardous and non-hazardous CDFs and PDFs essentially coincide over most of the distribution. Both the central tendency and dispersion of speeds are comparable, indicating that the speed distribution alone has limited discriminatory power in distinguishing between HRLs and non-HRLs on this highway. This overlap suggests that factors other than speed, such as roadway geometric features, access density, traffic composition, and roadside population exposure, may be more influential in explaining crash occurrence on SH 16A, despite minor differences observed in the extreme tail of the GEV fits.

In summary, HRLs tend to exhibit broader distributions with heavier right tails, suggesting a higher likelihood of extreme-speed events. In contrast, non-HRLs show peaked distributions concentrated around the modal speed range, typically between 45 and 60 km/h. These distributional shapes are consistent with risk theory, which predicts that higher mean speeds and greater variability are associated with increased crash propensity.

Speed-Based Differentiation Between Hazardous and Non-Hazardous Locations

Table 9 presents a comparative analysis of mean vehicular speeds and their corresponding 95% confidence intervals across HRLs and non-HRLs for all highways under investigation. A systematic pattern is evident across almost all data fractions and highways, where HRLs consistently have higher mean speeds than non-HRLs. This pattern supports the hypothesis that high speeds are strongly associated with crash-prone locations, confirming speed as a major contributor to crash risk ( 63 – 66 ). The two-sample t-test confirms the statistical significance of speed differentials across HRLs and non-HRLs, with the majority of highways showing significant differences (p < 0.05) across all data fractions.

Table 9

Comparison of Mean Speeds and Confidence Intervals (CIs) Across Data Fractions for Hazardous and Non-Hazardous Locations on Different Highways

Highway	Hazardous location					Non-hazardous location
		Mean speed (km/h)Data fractions					Data fractions
	Parameter	25%	50%	75%	100%	Parameter	25%	50%	75%	100%
NH 709	Mean speed (km/h)	54.9	54.5	58.1	54.6	Mean speed (km/h)	52.5	51.8	48.5	51.9
	Lower CI	53.5	53.2	56.6	53.9	Lower CI	51.1	50.7	47.3	51.2
	Upper CI	56.6	55.7	59.4	55.4	Upper CI	53.8	52.9	49.6	52.6
	t-statistic	2.32	3.03	10.04	5.35		na	na	na	na
	p-value	0.021	0.002	< 0.000	< 0.000		na	na	na	na
NH 334B S1	Mean speed (km/h)	53.5	53.4	53.3	53.9	Mean speed (km/h)	49.3	49.5	49.6	49.7
	Lower CI	52.5	52.7	52.7	53.4	Lower CI	48.3	48.7	49.1	49.2
	Upper CI	54.5	54.1	53.8	54.4	Upper CI	50.4	50.3	50.3	50.3
	t-statistic	3.53	6.07	5.36	11.15		na	na	na	na
	p-value	0.004	< 0.000	< 0.000	< 0.000		na	na	na	na
NH 334B S2	Mean speed (km/h)	54.8	54.1	54.4	54.5	Mean speed (km/h)	51.3	51.5	51.5	51.7
	Lower CI	53.2	52.6	53.1	53.8	Lower CI	50.1	50.5	50.5	51.1
	Upper CI	56.3	55.4	55.7	55.3	Upper CI	52.6	52.5	52.6	52.3
	t-statistic	2.74	3.91	3.74	5.59		na	na	na	na
	p-value	0.006	< 0.000	< 0.000	< 0.000		na	na	na	na
SH 14	Mean speed (km/h)	53.2	54.0	53.6	53.0	Mean speed (km/h)	48.1	48.5	48.1	48.3
	Lower CI	51.9	52.8	52.4	52.3	Lower CI	46.9	47.5	47.1	47.7
	Upper CI	54.6	55.3	54.8	53.7	Upper CI	49.2	49.6	49.2	48.9
	t-statistic	5.12	5.43	5.42	6.61		na	na	na	na
	p-value	0.002	< 0.000	< 0.000	< 0.000		na	na	na	na
SH 16A	Mean speed (km/h)	50.7	51.2	51.2	50.9	Mean speed (km/h)	48.9	49.8	49.9	50.1
	Lower CI	49.3	49.9	49.9	50.3	Lower CI	47.5	48.7	48.6	49.3
	Upper CI	52.1	52.4	52.4	51.6	Upper CI	50.2	51.1	51.1	50.7
	t-statistic	0.16	0.68	6.61	1.77		na	na	na	na
	p-value	0.875	0.495	< 0.000	0.078		na	na	na	na
SH 22	Mean speed km/h)	51.7	51.3	51.1	51.3	Mean speed (km/h)	48.1	46.9	47.6	47.6
	Lower CI	50.4	50.1	49.9	50.5	Lower CI	46.7	45.8	46.6	47.1
	Upper CI	53.2	52.5	52.3	52.1	Upper CI	49.3	47.9	48.7	48.3
	t-statistic	3.87	5.30	4.15	7.35		na	na	na	na
	p-value	< 0.000	< 0.000	< 0.000	< 0.000	na	na	na	na

Note: NH = National Highway; SH = State Highway; na = not applicable.

Particularly higher statistics, such as those observed for NH 709 at the 75% fraction (t = 10.04), NH 334B S1 at the full data set (t = 11.15), and SH 22 across all data fractions (t range 3.87–7.35), provide strong evidence that the speed distributions at HRLs significantly differ from those at non-HRLs. These findings confirm that speed profiling is a reliable indicator of hazardous roadway environments.

Along with the overall consistency, some highways exhibit localized variations. For example, NH 709 exhibits an unusual increase in average speed at HRLs at the 75% data fraction. This localized variation may be because of context-specific anomalies, such as temporal changes in traffic composition or inconsistencies in speeding behavior over the data collection period ( 4 ). A similar consistency in trends has been observed in SH 14 and SH 22, where speeds at non-HRLs have remained relatively low, confirming the generalizability of speed-based risk differentiation ( 67 ). An exception occurs in SH 16A, where the speed differences between HRLs and non-HRLs are not significant at the 25% (t = 0.16, p = 0.875), 50% (t = 0.68, p = 0.495), and 100% (t = 1.77, p = 0.078) data fractions. This suggests that for SH 16A, speed may not be a strong predictor of road safety risk, implying that other contributing factors, such as geometric design inconsistencies, sight-distance constraints, driver behavior, road environment conditions, and vehicle malfunctioning, play a more significant role in shaping crash occurrence.

Congestion-Adjusted Desired Speed Analysis Using Modified KM-Based Speed Distribution

In this study, spot speed measurements have been collected during 2 h observation windows that may have experienced varying levels of traffic interaction. Differences in observed speeds between HRLs and non-HRLs could reflect temporary congestion rather than intrinsic speed behavior. To account for this possibility, a KM framework is used to estimate congestion-adjusted desired speed distributions. In this approach, time headways are first computed for each vehicle observation, and a site-specific headway threshold $h^{*}$ is determined using a piecewise speed–headway regression method that identifies the breakpoint separating interaction-dominated from free-flow regimes. Observations with headways below the threshold are classified as right-censored (constrained), implying that the true desired speed may exceed the observed value because of interaction with a leading vehicle; however, observations above the threshold are considered unconstrained. The KM estimator then reconstructs the underlying desired speed distribution while accounting for censored observations. Therefore, the KM approach serves as a traffic interaction correction procedure, differentiating desired speed behavior from speeds suppressed by vehicle-following.

Table 10 presents observed speed statistics and the congestion-adjusted desired speed estimates obtained using the modified KM framework, HRLs and non-HRLs across the analyzed highway corridors. The sample sizes n are from approximately 2,723 to 5,008 observations per location, providing a robust empirical basis for estimating the observed and the interaction-adjusted speed distributions. The mean observed speeds represent operating speeds influenced by vehicle interactions, and the modified KM desired speed $E_{d}$ estimates the underlying speed distributions after accounting for right-censored observations caused by vehicle-following conditions. Therefore, the difference between these two measures quantifies the extent to which vehicle interactions suppress the observed speed distribution. The constrained fraction $ϕ$ indicates the proportion of observations affected by vehicle interactions, reflecting the presence of platooning or traffic interference rather than unconstrained flow conditions.

Table 10.

Comparison of Observed Speed Statistics and Kaplan–Meier (KM) Desired Speed Estimates for HRLs and non-HRLs

Location ID	Location type	Sample size (n)	Constrained fraction (ϕ)	Mean observed speed (km/h)	KM desired speed $(E_{d} [$ km/h])	SD $(V_{0} [$ km/h])	Maximum speed $(τ [$ km/h])
NH 709	HRL	3,275	0.626	54.63	92.11	31.24	135
NH 709	Non-HRL	3,182	0.578	51.89	85.32	24.86	105
NH 334B S1	HRL	5,008	0.782	53.91	72.02	30.63	116
NH 334B S1	Non-HRL	4,175	0.664	49.75	55.31	24.30	109
NH 334B S2	HRL	3,094	0.631	54.53	99.41	50.65	123
NH 334B S2	Non-HRL	4,014	0.249	51.71	65.32	28.46	111
SH 14	HRL	3,300	0.753	53.01	91.58	39.42	119
SH 14	Non-HRL	3,179	0.705	48.27	83.53	22.39	109
SH 16A	HRL	3,235	0.492	50.95	74.64	19.36	121
SH 16A	Non-HRL	3,222	0.247	50.05	55.84	36.16	117
SH 22	HRL	2,723	0.775	51.31	71.21	35.97	128
SH 22	Non-HRL	3,032	0.329	47.65	48.66	39.92	97

Note: HRL = hazardous road location; SD = standard deviation.

The constrained fraction $ϕ$ further indicates that a considerable proportion of observations are influenced by vehicle-following interactions across the HRL and non-HRL segments. In several corridors, HRLs exhibit constrained fractions comparable to or even higher than those of non-HRLs (e.g., NH 709, NH 334B S1, NH 334B S2, and SH 14), suggesting that HRLs are not systematically associated with lower traffic interaction levels or free-flow conditions. The speed differential between HRLs and non-HRLs is relatively smaller on SH 16A, NH 334B S2, and NH 709.

The distinction between interaction-influenced observed speeds and congestion-adjusted desired speeds is further illustrated by the CDFs shown in Figures 9 –14. The CDF plots illustrate the observed speed distributions along with the KM desired speed distributions for HRLs and non-HRLs across the six NH and SH corridors. In each figure, Figures 9 to 14 the red curve represents the observed speed CDF at HRLs, the green curve represents the observed speed CDF at non-HRLs, the black curve represents the KM estimated desired speed CDF for HRLs, and the blue curve represents the corresponding KM estimated desired speed CDF for non-HRLs. The observed curves reflect interaction-affected operating speeds, and the KM curves represent congestion-adjusted desired speed distributions, accounting for right-censored observations because of vehicle-following conditions. A comparison between the observed HRL and non-HRL curves shows that, across all six corridors, the HRL distributions are generally shifted toward higher speeds than the corresponding non-HRL distributions. This pattern is most evident on NH 334B S1, SH 14, and SH 22, where the red curves lie consistently to the right of the green curves over most of the central speed range, indicating that vehicles traversing hazardous segments often operate at higher speeds than those in non-hazardous locations. A similar pattern is observed across NH 709, NH 334B S2, and SH 16A, although with varying degrees of separation. These CDF representations are consistent with the observed mean speeds reported in Table 10, where HRLs exhibit higher operating speeds compared with non-HRLs across all six highways, with the largest observed differences on SH 14 (53.01 versus 48.27 km/h), NH 334B S1 (53.91 versus 49.75 km/h), and SH 22 (51.31 versus 47.65 km/h).

Figure 9.

Comparison of observed speed CDFs for HRLs and non-HRLs with the modified KM estimated desired speed distribution for NH 709.

Figure 10.

Comparison of observed speed CDFs for HRLs and non-HRLs with the modified KM estimated desired speed distribution for NH 334B S1.

Figure 11.

Comparison of observed speed CDFs for HRLs and non-HRLs with the modified KM estimated desired speed distribution for NH 334B S2.

Figure 12.

Comparison of observed speed CDFs for HRLs and non-HRLs with the modified KM estimated desired speed distribution for SH 14.

Figure 13.

Comparison of observed speed CDFs for HRLs and non-HRLs with the modified KM estimated desired speed distribution for SH 16A.

Figure 14.

Comparison of observed speed CDFs for HRLs and non-HRLs with the modified KM estimated desired speed distribution for SH 22.

The intra-location comparison between observed and KM curves further highlights the effect of traffic interaction on observed speeds. In all corridors, the KM desired speed curves are shifted to the right of the corresponding observed curves, indicating that the desired speed distribution is higher than the directly observed operating-speed distribution once vehicle-following effects are accounted for. The pattern is apparent in the HRL curves for NH 334B S1, NH 334B S2, SH 14, and SH 22, where the significant deviation between the red and black curves indicates a substantial suppression of observed speeds because of traffic interaction. A comparable but generally smaller separation is visible between the green and blue curves for non-HRLs. These CDF placements are consistent with the numerical differences between the mean observed speed and the KM desired speed listed in Table 10. For example, on NH 334B S2, the observed mean speed is 54.53 km/h, while the KM desired speed is 99.41 km/h; similarly, on SH 14, the observed mean speed is 53.01 km/h, whereas the KM desired speed is 91.58 km/h. This suggests that field-observed spot speeds underestimate the underlying desired speed potential when vehicle-following effects are present.

The most important comparison, however, is between the KM-adjusted HRL and non-HRL curves, because this directly tests whether the speed differences persist after accounting for congestion and platooning effects. Across all six corridors, the KM HRL curve (black) is predominantly shifted to the right of the KM non-HRL curve (blue), indicating that hazardous locations have a higher desired speed distribution even after interaction-induced suppression is accounted for. This is particularly evident on NH 334B S1, NH 334B S2, SH 14, SH 16A, and SH 22, where the black curve consistently maintains a superior speed position compared with the blue curve for the majority of the distribution. The corridor-specific differences in separation magnitude suggest that the extent to which hazardous locations differ from non-hazardous locations is not uniform across corridors; however, the overall pattern remains consistent: HRLs generally exhibit higher desired speed potential than non-HRLs. This indicates that the observed speed differences between HRLs and non-HRLs cannot be attributed solely to variations in instantaneous traffic density during the measurement windows, but rather reflect intrinsic roadway and behavioral characteristics, including longer uninterrupted tangents, higher perceived design speeds, or delayed speed adaptation near intersections, roadside minor accesses, fuel stations, and merging and diverging sections.

In conclusion, the observed speed statistics, the KM desired speed estimates, and the corresponding CDF visualizations provide a consistent understanding of speeding behavior across roadway segments. The mean observed speeds reflect operational conditions influenced by interactions, and the KM-based desired speeds isolate the intrinsic desired speed potential by including right-censored observations because of vehicle-following. The CDF plots further confirm that the speed distributions at HRLs systematically extend into higher-speed ranges than those at non-HRLs. This indicates that higher operating speeds and a greater likelihood of extreme-speed events characterize hazardous segments. These patterns align with well-established road safety theory, which links higher desired speeds and greater speed variability to higher crash risk ( 33 , 34 ). Therefore, the combined evidence suggests that the increased speeds observed near HRLs reflect intrinsic speed-selection behavior linked to roadway characteristics rather than solely variations in traffic density.

Discussion

This study investigates the statistical dynamics of vehicular spot speeds at identified HRLs and non-HRLs across six NH and SH corridors using a multi-stage data fractionation (i.e., at 25%, 50%, 75%, and 100%) and a multi-layered analytical framework. The framework combines continuous probability distribution modeling, inferential statistical testing, and congestion-adjusted desired speed estimation. This study aims to examine if there is a statistically significant difference in speed behavior between identified HRLs and non-HRLs on rural two-lane highways under mixed traffic conditions.

The distribution fitting results demonstrate that the GEV distribution consistently provides the best statistical representation of the observed spot speed data across all highways and sampling fractions. The GEV model’s superiority is supported by penalized fit criteria such as AIC and BIC, and subsequently reflects the GEV’s ability to capture skewness, heavy tails, and high-speed extremes. These characteristics are commonly observed in mixed traffic environments with weak lane discipline and frequent overtaking behavior ( 68 ). Furthermore, as the sample fractions increased, GEV parameter estimates showed consistent convergence, indicating reduced estimation variance and improved reliability, a pattern consistent with conventional asymptotic characteristics. The heavy-tail sensitivity of the GEV distribution is particularly useful for capturing rare but critical extreme-speed events that often contribute disproportionately to crash risk ( 4 ). GEV-based modeling enhances the diagnostic capacity of speed-based safety assessments by accurately modeling both central and extreme behavior ( 16 , 18 , 23 ).

The comparative analysis of speed distribution using the KS test provides strong evidence that the distributional form of speeds differs significantly between HRLs and non-HRLs on the majority of highways. Significant KS test outcomes across major highways indicate that HRLs exhibit higher speeds and higher variability and heavier right-tailed characteristics, which are often associated with increased crash potential ( 69 , 70 ). These distributional characteristics were graphically supported by empirical CDF and PDF plots, which showed HRLs with flatter, right-shifted CDFs and broader PDFs, indicating higher variability and a higher likelihood of extreme-speed events. These characteristics align with established risk theory, which implies that crash likelihood increases in environments where the average speeds and variability surpass typical operating conditions ( 64 ). The stability of the KS test results across multiple sampling repetitions further strengthens the conclusion that speed behavior systematically differs across risk categories. This study confirmed that distributional differences were not dependent on any specific data configuration, as analyzed using four mutually independent repetitions for each highway and sample fraction. The consistent statistical significance observed across several highways in different repetitions indicates robustness to temporal fluctuations, traffic heterogeneity, and sampling variability ( 44 , 52 , 71 , 72 ). Repetitions that consistently yielded significant results presumably captured broader traffic conditions characterized by more pronounced behavioral heterogeneity, providing a stronger differential capability. This methodological design, focused on repetition-based validation, provides an additional buffer against overfitting or misinterpretation arising from single-sample analyses, a common shortcoming in earlier road traffic speed-safety research ( 73 ).

In addition to distributional assessments, the comparison of mean speeds between HRLs and non-HRLs revealed a consistent and significant pattern, where hazardous locations consistently exhibit higher mean speeds across almost all data fractions and highways. The two-sample t-tests validate the statistical significance of these differences, with several highways, such as NH 709, NH 334B S1, and SH 22, exhibiting notably high t-statistics, indicating substantial divergence in central tendencies ( 74 , 75 ). These findings provide empirical support for the long-standing understanding that high speed remains a principal determinant of crash likelihood and crash severity ( 76 , 77 ). However, the joint consideration of KS and the t-test results highlights subtle differences. The majority of highways showed good alignment between the two tests, while certain corridors, such as SH 16A, showed significant differences between the two tests. This discrepancy suggests that speed behavior alone does not adequately explain the potential crash risk in these locations, implying the influence of geometric design factors, road environment conditions, vehicle conditions, traffic volume, and road user behavior ( 78 , 79 ). However, NH 709 showed cases where the t-test was significant, while the KS test was not, highlighting situations in which mean speeds differ despite similar distributional shapes. These discrepancies align with theoretical expectations: t-tests capture differences in central tendency, and the KS test reveals a shift in the overall distribution. These findings highlight the significance of using both statistical measures to capture complementary dimensions of speed behavior.

Given the underlying roadway environment and traffic characteristics, the observed speed across identified HRLs and non-HRLs during short-term measurements is more likely to reflect transient congestion than intrinsic speed behavior. Vehicles that operate in platoons or under the car-following conditions may experience speed suppression because of preceding vehicles, which can obscure drivers’ intrinsic speed-choice behavior. To address this issue, this study incorporates the KM framework to estimate congestion-adjusted desired speed distributions. In this approach, vehicles traveling under interaction conditions are treated as right-censored observations and unconstrained vehicles are treated as uncensored observations. The comparison of mean observed speeds with KM estimated desired speeds reveals that traffic interactions substantially suppress the observed speed distributions in HRLs and non-HRLs. In several corridors, the KM-derived desired speed estimates significantly surpassed the corresponding observed mean speeds, indicating that interaction effects over the observation period influence the measured spot speeds. However, the KM-based results also show that HRLs generally maintain higher desired speeds and exhibit broader upper-tail speed characteristics, even after accounting for these interaction effects. This finding suggests that the higher-speed behavior observed at hazardous locations cannot be attributed solely to temporary variations in traffic density during the observation period. Instead, it reflects intrinsic roadway characteristics and behavioral responses associated with hazardous roadway environments. The KM-based CDFs further illustrate the distinction between interaction-influenced operating speeds and the desired speed. In the majority of corridors, the desired speed CDF lies to the right of the observed speed CDFs, confirming that the KM approach effectively reconstructs the suppressed portion of the speed distribution. Furthermore, HRL distributions frequently exhibit greater dispersion and more pronounced upper-tail characteristics than non-HRL segments, reinforcing the notion that hazardous segments correlate with higher speed potential and more heterogeneous driver behavior.

The combined evidence from distribution modeling, inferential testing, and KM-based desired speed analysis provides a comprehensive characterization of speed behavior across hazardous and non-hazardous roadway environments. Hazardous locations are characterized by higher mean speeds and broader dispersion, heavier upper tails, and higher desired speed potential. These characteristics indicate that hazardous segments allow or encourage speed choices that are inconsistent with the surrounding roadway environment or conflict density. Therefore, this study’s results validate the adopted methodological design comprising repeated subsampling, KS and t-test comparisons, and continuous probability distribution fitting, and KM-based desired speed distributions as a robust framework for analyzing road user speeding behavior in hazardous and non-hazardous roadway environments under heterogeneous traffic conditions.

From a traffic management perspective, these findings indicate that uniform corridor-level speed limits are insufficient to address localized risk mechanisms ( 80 , 81 ). Speed management strategies should be driven by distribution-based diagnostics, especially those capturing upper-tail behavior. Previous studies indicate that crashes on rural highways are disproportionately associated with speed variance and extreme speeds rather than mean speeds alone ( 33 , 82 , 83 ). Therefore, the fitted GEV parameters and percentile-based speed thresholds derived from HRLs can be used to identify locations where a small proportion of high-speed vehicles substantially increase crash risk ( 84 ). Given the crash history on these NHs and SHs, the HRLs are concentrated near roadside eateries, fuel stations, minor access points, and culverts, where merging and diverging movements interact directly with uninterrupted through traffic.

The findings support the implementation of location-specific speed transition strategies before access-dense sections. Empirical evidence from rural highway studies suggests that gradual speed reduction zones, supported by advance warning signs and perceptual speed reduction measures, are more effective than sudden speed limit changes ( 85 , 86 ). Distributional speed analysis can suggest the placement and length of these transition zones by identifying the distance at which upper-tail speeds start to deviate from non-hazardous patterns ( 87 – 89 ). This approach correlates speed control with observed driving behavior rather than relying solely on prescriptive geometric standards.

The heavy-tailed speed distributions observed at hazardous segments also provide a strong empirical justification for targeted speed enforcement strategies. Rather than enforcing compliance uniformly across all vehicles, enforcement efforts can be concentrated on the upper-tail of the speed distribution where marginal reductions provide the most significant safety benefits ( 90 , 91 ). Automated speed enforcement, mobile enforcement units, or dynamic speed feedback displays can be strategically placed before approaching an identified hazardous location, prioritized using GEV-based exceedance probability. Data-driven and targeted interventions should be widely accepted among road users to significantly improve enforcement efficiency, particularly in resource-constrained rural areas ( 92 , 93 ).

Conclusions

This study aims to examine whether there is a statistically significant difference in speed behavior between identified HRLs and non-HRLs on rural two-lane highways under mixed traffic conditions. The HRLs and non-HRLs were systematically identified using an NB-L model and a KDEB approach, based on police-reported, georeferenced crash data obtained from the State Police CCTNS for 2017–2019. This study uses an integrated analytical framework that incorporates two-sample KS tests, t-tests, and seven continuous probability distributions to model spot speeds. Seven continuous probability distribution functions, including the normal, lognormal, gamma, logistic, Weibull, Burr, and GEV distributions, were applied to determine the best-fitting distributions for distinct locations on NHs and SHs. KS tests, as well as t-tests, are used to determine statistically significant differences in spot speeds across HRLs and non-HRLs, utilizing a multifractional data and repetition design framework. This study treats each measurement site as an independent observational unit, maintains a constant observation duration, and focuses on location-specific speed behavior rather than the effects of spatial or temporal coverage. The combination of GoF metrics (AIC and BIC), distributional parameter estimates (location, scale, and shape), and inferential testing (two-sample KS) has allowed for a statistically robust and empirically validated evaluation of speed behavior concerning road safety.

In addition, the modified KM analysis is used to distinguish the desired speeds from the observed speeds, accounting for the effects of traffic density and congestion. The KM technique facilitates the reconstruction of the underlying desired speed distribution for each location by differentiating between unconstrained and interaction-constrained observations using headway-based censoring. The resulting desired speed estimates showed that, even after accounting for vehicle-following effects, HRLs generally maintain higher desired speed potential, greater speed dispersion, and stronger upper-tail speed behavior than non-HRLs. This confirms that the higher-speed characteristics at hazardous segments cannot be attributed solely to differences in instantaneous traffic density during the observation period but are also associated with roadway characteristics and behavioral factors.

This study provides statistically stable and practically relevant insights on speed-related risk differences between HRLs and non-HRLs. The results demonstrate that the GEV distribution consistently outperforms other distributions in HRLs and non-HRLs, highlighting its empirical adaptability and suitability for modeling spot speed data and capturing extreme tail behavior in heterogeneous traffic conditions, where standard normality assumptions for vehicular speed data may be invalidated. The comparative findings across various data fractions indicate that larger data sizes improve model reliability, parameter stability, and inferential robustness, underscoring the statistical benefits of larger sample sizes in transportation safety research. Furthermore, the KS test findings indicate that spot speeds significantly differ across HRLs and non-HRLs on the majority of highways, with HRLs exhibiting higher speeds, higher variability and heavier right-tailed characteristics, which are often associated with increased crash risk. The stability of the KS test results across multiple sampling repetitions further strengthens the notion that speed behavior varies substantially between HRLs and non-HRLs. The combined results from the KS test, t-test, and GEV model indicate that speed behavior, characterized by distributional shape, central tendency, variability, and tail behavior, serves as a robust, statistically validated criterion for distinguishing the associated risk between HRLs and non-HRLs.

This study demonstrates speed distribution analysis, extending beyond simple descriptive characterization to facilitate targeted, data-driven safety interventions on rural two-lane highways. The findings indicate that HRLs are more consistently characterized by increased dispersion and a propensity for extreme speeds than by uniformly higher mean speeds, particularly in areas with frequent roadside access and complex merging and diverging conditions. For safety management, the findings support a shift toward segment-specific speed management strategies that prioritize locations exhibiting heavy-tailed speed distribution, especially near eateries, fuel stations, minor access roads, and culverts. Distribution-informed interventions, such as graduated speed transition zones, upper-tail-focused enforcement, and context-sensitive speed feedback systems, offer a scalable, evidence-based approach to reduce crash risk in heterogeneous traffic environments. By explicitly linking speed distribution characteristics to roadway environment and access-related risk mechanisms, the proposed framework provides a realistic decision-support tool for improving safety on rural undivided highways.

Despite its empirical rigor, this study has certain limitations. This study exclusively focused on traffic speed analysis as a univariate distributional analysis, failing to capture the full range of dynamics that affect the road user’s risk level, speeding behavior, and crash likelihood, which is influenced by traffic flow, geometric design characteristics, meteorological conditions, lighting, enforcement intensity, and driver gender and age composition. Therefore, future studies may extend the spot speed analysis to include multilane highways and integrate the aforementioned factors that influence crash risk, speeding behavior, and crash propensity, using a multivariate modeling approach to achieve a more comprehensive understanding of speed-risk relationships. A longitudinal approach that considers temporal variations in speeding behavior during day and night, peak and off-peak periods, and seasonal variations such as summer, winter, and rainy seasons may uncover dynamic patterns that are frequently overlooked in static analyses. The observed stabilization of distributional estimates after approximately three-fourths of the full data provides empirical insight for conducting efficient spot speed analysis underlying traffic and roadway environments. Further studies may focus on determining universal minimum sample size thresholds, using longer monitoring periods and continuous data sources to improve model robustness.

Supplemental Material

sj-docx-1-trr-10.1177_03611981261448538 – Supplemental material for Evaluating the Reliability and Consistency of Statistical Models for Speed Distribution Analysis at Hazardous and Non-Hazardous Roadway Locations: Multifraction Data Set Approach

Supplemental material, sj-docx-1-trr-10.1177_03611981261448538 for Evaluating the Reliability and Consistency of Statistical Models for Speed Distribution Analysis at Hazardous and Non-Hazardous Roadway Locations: Multifraction Data Set Approach by Parveen Kumar, Geetam Tiwari and Sourabh Bikas Paul in Transportation Research Record

Footnotes

Acknowledgements

We would like to extend our gratitude to all faculty members and researchers who have provided invaluable support throughout our research work. We would also like to express our gratitude to Parvinder Ghanghas for his help with data collection through the site surveys.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: Parveen Kumar; data collection: Parveen Kumar; analysis and interpretation of results: Parveen Kumar, Geetam Tiwari, and Sourabh Bikas Paul; draft manuscript preparation: Parveen Kumar, Geetam Tiwari, and Sourabh Bikas Paul. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors acknowledge the research fellowship from the Ministry of Housing and Urban Affairs, (MoHUA), erstwhile the Ministry of Urban Development (MoUD), the Government of India (IITD/TRIPP/MUL/2023-2024/25436).

ORCID iDs

Parveen Kumar

Geetam Tiwari

Sourabh Bikas Paul

Data Accessibility Statement

The data will be available on reasonable request.

Supplementary Material

Supplemental material for this article is available online.

References

Bhalla

Mohan

O’Neill

How Much Would Low- and Middle-Income Countries Benefit from Addressing the Key Risk Factors of Road Traffic Injuries?

International Journal of Injury Control and Safety Promotion, Vol. 27, No. 1, 2020, pp. 89–90. https://doi.org/10.1080/17457300.2019.1708411

Peden

Scurfield

Sleet

Mohan

Hyder

A. A.

Jarawan

Mathers

World Report on Road Traffic Injury Prevention. World Health Organization, Geneva 27, Switzerland, 2004.

Bhalla

Mohan

O’Neill

What Can We Learn from the Historic Road Safety Performance of High-Income Countries?

International Journal of Injury Control and Safety Promotion, Vol. 27, No. 1, 2020, pp. 27–34. https://doi.org/10.1080/17457300.2019.1704789

Sarkar

D. R.

Kumar

An Investigation of Traffic Speed Distributions for Uninterrupted Flow at Blackspot Locations in a Mixed Traffic Environment. IATSS Research, Vol. 48, No. 2, 2024, pp. 180–188. https://doi.org/10.1016/j.iatssr.2024.03.004

MoRTH. Road Accidents in India 2022. Transport Research Wing, Ministry of Road Transport and Highways, New Delhi, 2023.

Kumar

Tiwari

Paul

S. B.

Road Safety Studies at Micro, Meso, and Macroscopic Levels: A Systematic Review. IATSS Research, Vol. 49, No. 1, 2025, pp. 10–26. https://doi.org/10.1016/j.iatssr.2024.12.001

Williams

A. F.

Kyrychenko

S. Y.

Retting

R. A.

Characteristics of Speeders. Journal of Safety Research, Vol. 37, No. 3, 2006, pp. 227–232. https://doi.org/10.1016/j.jsr.2006.04.001

Møller

Haustein

Peer Influence on Speeding Behaviour among Male Drivers Aged 18 and 28. Accident Analysis and Prevention, Vol. 64, 2014, pp. 92–99. https://doi.org/10.1016/j.aap.2013.11.009

Atombo

Zhong

Zhang

Investigating the Motivational Factors Influencing Drivers Intentions to Unsafe Driving Behaviours: Speeding and Overtaking Violations. Transportation Research Part F: Traffic Psychology and Behaviour, Vol. 43, 2016, pp. 104–121. https://doi.org/10.1016/j.trf.2016.09.029

10.

Horswill

M. S.

Coster

M. E.

The Effect of Vehicle Characteristics on Drivers’ Risk-Taking Behaviour. Ergonomics, Vol. 45, No. 2, 2002, pp. 85–104. https://doi.org/10.1080/00140130110115345

11.

Wang

Quddus

M. A.

Ison

S. G.

The Effect of Traffic and Road Characteristics on Road Safety: A Review and Future Research Direction. Safety Science, Vol. 57, 2013, pp. 264–275. https://doi.org/10.1016/j.ssci.2013.02.012

12.

Fernandes

Hatfield

Soames Job

R. F.

A Systematic Investigation of the Differential Predictors for Speeding, Drink-Driving, Driving While Fatigued, and Not Wearing a Seat Belt, among Young Drivers. Transportation Research Part F: Traffic Psychology and Behaviour, Vol. 13, No. 3, 2010, pp. 179–196. https://doi.org/10.1016/j.trf.2010.04.007

13.

Yeh

M. S.

Tseng

C. M.

Liu

H. H.

Tseng

L. S.

The Factors of Female Taxi Drivers’ Speeding Offenses in Taiwan. Transportation Research Part F: Traffic Psychology and Behaviour, Vol. 32, 2015, pp. 35–45. https://doi.org/10.1016/j.trf.2015.04.005

14.

Mahmoud

Abdel-Aty

Cai

Factors Contributing to Operating Speeds on Arterial Roads by Context Classifications. Journal of Transportation Engineering, Part A: Systems, Vol. 147, No. 8, 2021. https://doi.org/10.1061/jtepbs.0000548

15.

Dey

P. P.

Chandra

Gangopadhaya

Speed Distribution Curves under Mixed Traffic Conditions. Journal of Transportation Engineering, Vol. 132, No. 6, 2006, pp. 475–481. https://doi.org/10.1061/(ASCE)0733-947X(2006)132:6(475)

16.

Jun

Understanding the Variability of Speed Distributions under Mixed Traffic Conditions Caused by Holiday Traffic. Transportation Research Part C: Emerging Technologies, Vol. 18, No. 4, 2010, pp. 599–610. https://doi.org/10.1016/j.trc.2009.12.005

17.

Vadeby

Forsman

Å.

Changes in Speed Distribution: Applying Aggregated Safety Effect Models to Individual Vehicle Speeds. Accident Analysis and Prevention, Vol. 103, 2017, pp. 20–28. https://doi.org/10.1016/j.aap.2017.03.012

18.

Mondal

Gupta

Speed Distribution for Interrupted Flow Facility under Mixed Traffic. Physica A: Statistical Mechanics and Its Applications, Vol. 570, 2021, p. 125798. https://doi.org/10.1016/j.physa.2021.125798

19.

Haight

F. A.

Mosher

W. H.

A Practical Method for Improving the Accuracy of Vehicular Speed Distribution Measurements. Highway Research Record, Vol. 341, 1962, pp. 92–116.

20.

Pillai

K. S.

Ramanayya

T. V.

Traffic Inputs for Simulation of Mixed Traffic on a Digital Computer. Proc., National System Conference, Indore, India, 1977, pp. 66–71.

21.

McLean

J. R.

Observed Speed Distributions and Rural Road Traffic Operations. In Australian Road Research Board Conference Proc, vol. 9, no. 5, 1978.

22.

Hashim

I. H.

Analysis of Speed Characteristics for Rural Two-Lane Roads: A Field Study from Minoufiya Governorate, Egypt. Ain Shams Engineering Journal, Vol. 2, No. 1, 2011, pp. 43–52. https://doi.org/10.1016/j.asej.2011.05.005

23.

Zou

Zhang

Use of Skew-Normal and Skew-t Distributions for Mixture Modeling of Freeway Speed Data. Transportation Research Record: Journal of the Transportation Research Board, 2011. 2260(1): 67–75. https://doi.org/10.3141/2260-08

24.

Harding

S. E.

Badami

M. G.

Reynolds

C. C. O.

Kandlikar

Auto-Rickshaws in Indian Cities: Public Perceptions and Operational Realities. Transport Policy, Vol. 52, 2016, pp. 143–152. https://doi.org/10.1016/j.tranpol.2016.07.013

25.

Saha

Roy

Sarkar

A. K.

Pal

Speed Distribution on Two-Lane Rural Highways with Mixed Traffic: A Case Study in North East India. Journal of The Institution of Engineers (India): Series A, Vol. 98, 2017, pp. 107–113. https://doi.org/10.1007/s40030-017-0208-0

26.

Maurya

A. K.

Das

Dey

Nama

Study on Speed and Time-Headway Distributions on Two-Lane Bidirectional Road in Heterogeneous Traffic Condition. Transportation Research Procedia, Vol. 17, 2016, pp. 428–437. https://doi.org/10.1016/j.trpro.2016.11.084

27.

Atombo

Zhong

Zhang

28.

Sarkar

D. R.

Rao

K. R.

Chatterjee

Automatic Traffic Safety Analysis Using Unmanned Aerial Vehicle Technology at Unsignalized Intersections in Heterogeneous Traffic. Transportation Research Record: Journal of the Transportation Research Board, 2025. 2679(2): 1274–1290. https://doi.org/10.1177/03611981241266838

29.

Hauer

Identification of Sites with Promise. Transportation Research Record: Journal of the Transportation Research Board, 1996(1542): 54–60. https://doi.org/10.3141/1542-09

30.

Bisht

L. S.

Tiwari

Identification of Road Traffic Crashes Hotspots on an Intercity Expressway in India Using Geospatial Techniques. IATSS Research, Vol. 47, No. 3, 2023, pp. 349–356. https://doi.org/10.1016/j.iatssr.2023.07.003

31.

Huang

Chin

H. C.

Haque

Empirical Evaluation of Alternative Approaches in Identifying Crash Hot Spots. Transportation Research Record: Journal of the Transportation Research Board, 2009. 2103: 32–41. https://doi.org/10.3141/2103-05

32.

Meng

A Note on Hotspot Identification for Urban Expressways. Safety Science, Vol. 66, 2014, pp. 87–91. https://doi.org/10.1016/j.ssci.2014.02.006

33.

Wang

Zhou

Quddus

Fan

Fang

Speed, Speed Variation and Crash Relationships for Urban Arterials. Accident Analysis and Prevention, Vol. 113, 2018, pp. 236–243. https://doi.org/10.1016/j.aap.2018.01.032

34.

Aljanahi

A. A. M.

Rhodes

A. H.

Metcalfe

A. V.

Speed, Speed Limits and Road Traffic Accidents under Free Flow Conditions. Accident Analysis and Prevention, Vol. 31, No. 1–2, 1999, pp. 161–168.

35.

Sarkar

D. R.

Rao

K. R.

Chatterjee

A Review of Surrogate Safety Measures on Road Safety at Unsignalized Intersections in Developing Countries. Accident Analysis and Prevention, Vol. 195, 2024, p. 107380. https://doi.org/10.1016/j.aap.2023.107380

36.

Hoogendoorn

S. P.

Unified Approach to Estimating Free Speed Distributions. Transportation Research Part B: Methodological, Vol. 39, No. 8, 2005, pp. 709–727. https://doi.org/10.1016/j.trb.2004.09.001

37.

Hoogendoorn

S. P.

Vehicle-Type and Lane–Specific Free Speed Distributions on Motorways a Novel Estimation Approach Using Censored Observations. Transportation Research Record: Journal of the Transportation Research Board, 2005. 1934(1): 148–156.

38.

Directorate of Census Operations, Haryana Census 2011. Ministry of Home Affairs, Government of India. https://haryana.census.gov.in/census

39.

MoRTH. Road Transport Year Book (2019-20), M. Transport Research Wing, Ministry of Road Transport and Highways, IDA Building, Jamnagar House, Shahjahan Road, New Delhi, 110011, New Delhi, 2021.

40.

Haryana Public Works Department (B&R). Annual Administrative Report of Public Works Department, Haryana for the Year of 2020-21. Haryana Public Works (B&R) Department, Nirman Sadan, Plot No. 1 & 2, Dakshin Marg, Sector-33 A, Chandigarh-160020, 2020.

41.

MoRTH. Annual Report 2022-2023. Transport Research Wing, Ministry of Road Transport and Highways, IDA Building, Jamnagar House, Shahjahan Road, New Delhi, 110011, 2023.

42.

Kumar

Sarkar

D. R.

Conflict-Based Crash Risk Estimation of Heterogeneous Lane-Changing Traffic at the Panipat Toll Plaza (NH-44, India) Using Surrogate Safety Measures and UAV-Based Trajectory Data. IATSS Research, Vol. 49, No. 4, 2025, pp. 580–592. https://doi.org/10.1016/j.iatssr.2025.11.005

43.

Eluru

Chakour

Chamberlain

Miranda-Moreno

L. F.

Modeling Vehicle Operating Speed on Urban Roads in Montreal: A Panel Mixed Ordered Probit Fractional Split Model. Accident Analysis & Prevention, Vol. 59, 2013, pp. 125–134. https://doi.org/10.1016/j.aap.2013.05.016

44.

Afghari

A. P.

Haque

M. M.

Washington

Applying Fractional Split Model to Examine the Effects of Roadway Geometric and Traffic Characteristics on Speeding Behavior. Traffic Injury Prevention, Vol. 19, No. 8, 2018, pp. 860–866. https://doi.org/10.1080/15389588.2018.1509208

45.

Ignatiadis

Klaus

Zaugg

J. B.

Huber

Data-Driven Hypothesis Weighting Increases Detection Power in Genome-Scale Multiple Testing. Nature Methods, Vol. 13, No. 7, 2016, pp. 577–580. https://doi.org/10.1038/nmeth.3885

46.

Uddin

Dataset Meta-Level and Statistical Features Affect Machine Learning Performance. Scientific Reports, Vol. 14, No. 1, 2024. https://doi.org/10.1038/s41598-024-51825-x

47.

McNeish

D. M.

Stapleton

L. M.

The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration. Educational Psychology Review, Vol. 28, 2016, pp. 295–314. https://doi.org/10.1007/s10648-014-9287-x

48.

Viechtbauer

Hypothesis Tests for Population Heterogeneity in Meta-Analysis. British Journal of Mathematical and Statistical Psychology, Vol. 60, No. 1, 2007, pp. 29–60. https://doi.org/10.1348/000711005X64042

49.

Ziakopoulos

Spatial Analysis of Harsh Driving Behavior Events in Urban Networks Using High-Resolution Smartphone and Geometric Data. Accident Analysis & Prevention, Vol. 157, 2021, p. 106189. https://doi.org/10.1016/j.aap.2021.106189

50.

Ziakopoulos

Vlahogianni

Antoniou

Yannis

Spatial Predictions of Harsh Driving Events Using Statistical and Machine Learning Methods. Safety Science, Vol. 150, 2022. https://doi.org/10.1016/j.ssci.2022.105722

51.

Bramich

D. M.

Menéndez

Ambühl

Fitting Empirical Fundamental Diagrams of Road Traffic: A Comprehensive Review and Comparison of Models Using an Extensive Data Set. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, No. 9, 2022, pp. 14104–14127. https://doi.org/10.1109/TITS.2022.3142255

52.

Deng

Liao

Understanding the Distribution Characteristics of Bus Speed Based on Geocoded Data. Transportation Research Part C: Emerging Technologies, Vol. 82, 2017, pp. 337–357. https://doi.org/10.1016/j.trc.2017.07.004

53.

Drobinski

Coulais

Jourdier

Surface Wind-Speed Statistics Modelling: Alternatives to the Weibull Distribution and Performance Evaluation. Boundary-Layer Meteorology, Vol. 157, 2015, pp. 97–123. https://doi.org/10.1007/s10546-015-0035-7

54.

Kumar

Tiwari

Paul

S. B.

Segment Length Optimization for Crash Frequency Modelling: Evaluating Power Spectral Segment Length in Safety Performance Assessment. Accident Analysis and Prevention, Vol. 219, 2025. https://doi.org/10.1016/j.aap.2025.108122

55.

Roy

Saha

Headway Distribution Models of Two-Lane Roads under Mixed Traffic Conditions: A Case Study from India. European Transport Research Review, Vol. 10, 2018, pp. 1–12. https://doi.org/10.1007/s12544-017-0276-2

56.

Akgül

F. G.

Şenoʇlu

Arslan

An Alternative Distribution to Weibull for Modeling the Wind Speed Data: Inverse Weibull Distribution. Energy Conversion and Management, Vol. 114, 2016, pp. 234–240. https://doi.org/10.1016/j.enconman.2016.02.026

57.

Huang

Shu

Chan

P. W.

Copula-Based Joint Distribution Analysis of Wind Speed and Wind Direction: Wind Energy Development for Hong Kong. Wind Energy, Vol. 9, 2023, pp. 900–922. https://doi.org/10.1002/we.2847

58.

Wang

Chen

Bai

Chen

Wang

New Estimation Method of Wind Power Density with Three-Parameter Weibull Distribution: A Case on Central Inner Mongolia Suburbs. Wind Energy, Vol. 25, No. 2, 2022, pp. 368–386. https://doi.org/10.1002/we.2677

59.

Cam

Lo Yang

Asymptotics in Statistics–Some Basic Concepts, Springer, New York, NY, 2000.

60.

Afify

A. Z.

Mohamed

O. A.

A New Three-Parameter Exponential Distribution with Variable Shapes for the Hazard Rate: Estimation and Applications. Mathematics, Vol. 8, No. 1, 2020, p. 135. https://doi.org/10.3390/math8010135

61.

Gómez

Y. M.

Gallardo

D. I.

Marchant

Sánchez

Bourguignon

An In-Depth Review of the Weibull Model with a Focus on Various Parameterizations. Mathematics, Vol. 12, No. 1, 2024, p. 56. https://doi.org/10.3390/math12010056

62.

Lai

C. D.

Murthy

D. N.

Xie

Weibull Distributions and Their Applications. In Springer Handbook of Engineering Statistics ( Pham

, ed.), Springer Handbooks Springer, London, 2006, pp. 63–78.

63.

Hussain

Feng

Grzebieta

Brijs

Olivier

The Relationship between Impact Speed and the Probability of Pedestrian Fatality during a Vehicle-Pedestrian Crash: A Systematic Review and Meta-Analysis. Accident Analysis and Prevention, Vol. 129, 2019, pp. 241–249. https://doi.org/10.1016/j.aap.2019.05.033

64.

Aarts

Van Schagen

Driving Speed and the Risk of Road Crashes: A Review. Accident Analysis and Prevention, Vol. 38, No. 2, 2006, pp. 215–224. https://doi.org/10.1016/j.aap.2005.07.004

65.

Khanuja

R. K.

Tiwari

Safety-in-Numbers for Route Choice of Bicycle Trips: A Choice Experiment Approach for Commuters. Accident Analysis and Prevention, Vol. 203, 2024. https://doi.org/10.1016/j.aap.2024.107624

66.

Khanuja

R. K.

Tiwari

A Comparative Study of the Spatial Variability of Cyclist, Pedestrian, and Motorised Two-Wheeler Rider Fatalities in an Urban Area. Journal of Transport and Health, Vol. 47, 2026. https://doi.org/10.1016/j.jth.2025.102228

67.

Kutela

Ngeni

Ruseruka

Chengula

T. J.

Novat

Shita

Kinero

The Influence of Roadway Characteristics and Built Environment on the Extent of Over-Speeding: An Exploration Using Mobile Automated Traffic Camera Data. International Journal of Transportation Science and Technology, Vol. 17, 2025, pp. 120–130. https://doi.org/10.1016/j.ijtst.2024.03.003

68.

Khanuja

R. K.

Tiwari

Exploring Determinants of Bicycle Fatalities in Urban Mixed-Traffic Road Segments: A Bayesian Microscopic Study. IATSS Research, Vol. 50, 2026, pp. 801–810. https://doi.org/10.1016/j.iatssr.2026.02.010

69.

Damani

Vedagiri

Multivariate Analysis of Following and Filtering Manoeuvres of Motorized Two Wheelers in Mixed Traffic Conditions. IATSS Research, Vol. 47, No. 2, 2023, pp. 121–133. https://doi.org/10.1016/j.iatssr.2023.05.004

70.

Bin-Nun

A. Y.

Lizarazo

Panasci

Madden

R. J.

Tebbens. What Do Surrogate Safety Metrics Measure? Understanding Driving Safety as a Continuum. Accident Analysis and Prevention, Vol. 195, 2024. https://doi.org/10.1016/j.aap.2023.107245

71.

Cam

Lo Yang

Asymptotics in Statistics–Some Basic Concepts, Springer, New York, NY, 2000.

72.

Bali

T. G.

The Generalized Extreme Value Distribution. Economics Letters, Vol. 79, No. 3, 2003, pp. 423–427. https://doi.org/10.1016/S0165-1765(03)00035-1

73.

Thams

Saengkyongam

Pfister

Peters

Statistical Testing under Distributional Shifts. Journal of the Royal Statistical Society. Series B: Statistical Methodology, Vol. 85, No. 3, 2023, pp. 597–663. https://doi.org/10.1093/jrsssb/qkad018

74.

Hou

Edara

Sun

Speed Limit Effectiveness in Short-Term Rural Interstate Work Zones. Transportation Letters, Vol. 5, No. 1, 2013, pp. 8–14. https://doi.org/10.1179/1942786712Z.0000000002

75.

Pudasaini

Haule

Y.-J.

Empirical Analysis of Dilemma Zone Using High-Resolution Event Data. Transportmetrica B: Transport Dynamics, Vol. 12, No. 1, 2024. https://doi.org/10.1080/21680566.2024.2379376

76.

Ewing

Hamidi

Grace

J. B.

Urban Sprawl as a Risk Factor in Motor Vehicle Crashes. Urban Studies, Vol. 53, No. 2, 2016, pp. 247–266. https://doi.org/10.1177/0042098014562331

77.

Charlton

S. G.

Starkey

N. J.

Risk in Our Midst: Centrelines, Perceived Risk, and Speed Choice. Accident Analysis and Prevention, Vol. 95, 2016, pp. 192–201. https://doi.org/10.1016/j.aap.2016.07.019

78.

Gargoum

S. A.

El-Basyouny

Exploring the Association between Speed and Safety: A Path Analysis Approach. Accident Analysis and Prevention, Vol. 93, 2016, pp. 32–40. https://doi.org/10.1016/j.aap.2016.04.029

79.

Soltani

Afshari

Amiri

M. A.

Time-Series Projecting Road Traffic Fatalities in Australia: Insights for Targeted Safety Interventions. Injury, Vol. 56, No. 3, 2025. https://doi.org/10.1016/j.injury.2025.112166

80.

Keller

M. E.

Watson

Kaye

S. A.

King

Lewis

Experts’ Perspectives on Shared Responsibility for Speed Management: A Thematic Analysis Informed by Systems Thinking. Accident Analysis and Prevention, Vol. 221, 2025. https://doi.org/10.1016/j.aap.2025.108185

81.

Sahebi

Abrishami

Mokhtarian

Meskar

Spatial-Temporal Planning of Road Traffic Speed Management Mobile Resources: Enhancing Road Traffic Safety by Optimizing Resource Utilization. Traffic Injury Prevention, 2025, pp. 1–10. https://doi.org/10.1080/15389588.2025.2509826

82.

Quddus

Exploring the Relationship Between Average Speed, Speed Variation, and Accident Rates Using Spatial Statistical Models and GIS. Journal of Transportation Safety and Security, Vol. 5, No. 1, 2013, pp. 27–45. https://doi.org/10.1080/19439962.2012.705232

83.

Figueroa Medina

Tarko

Speed Factors on Two-Lane Rural Highways in Free-Flow Conditions. Transportation Research Record: Journal of the Transportation Research Board, 2005. 1912: 39–46. https://doi.org/10.1177/0361198105191200105

84.

Sarkar

D. R.

Rao

K. R.

Chatterjee

Crash Risk Assessment at Unsignalized Intersections Using Vehicle Trajectory Data. IATSS Research, Vol. 49, No. 4, 2025, pp. 459–469. https://doi.org/10.1016/j.iatssr.2025.09.001

85.

Gehlert

Schulze

Schlag

Evaluation of Different Types of Dynamic Speed Display Signs. Transportation Research Part F: Traffic Psychology and Behaviour, Vol. 15, No. 6, 2012, pp. 667–675. https://doi.org/10.1016/j.trf.2012.07.004

86.

Liu

Zhu

Wang

Xia

Literature Review and Prospect on the Study of Perceptual Speed Reduction. IEEE International Conference on Service Operations and Logistics, and Informatics, Beijing, China, October 12–15, 2008, pp. 342–346.

87.

Russo

Biancardo

S. A.

Dell’Acqua

Consistent Approach to Predictive Modeling and Countermeasure Determination by Crash Type for Low-Volume Roads. Baltic Journal of Road and Bridge Engineering, Vol. 9, No. 2, 2014, pp. 77–87. https://doi.org/10.3846/bjrbe.2014.10

88.

Matírnez

Mántaras

D. A.

Luque

Reducing Posted Speed and Perceptual Countermeasures to Improve Safety in Road Stretches with a High Concentration of Accidents. Safety Science, Vol. 60, 2013, pp. 160–168. https://doi.org/10.1016/j.ssci.2013.07.003

89.

Ghadiri

S. M. R.

Prasetijo

Sadullah

A. F.

Hoseinpour

Sahranavard

Intelligent Speed Adaptation: Preliminary Results of on-Road Study in Penang, Malaysia. IATSS Research, Vol. 36, No. 2, 2013, pp. 106–114. https://doi.org/10.1016/j.iatssr.2012.08.001

90.

Elvik

A Re-Parameterisation of the Power Model of the Relationship between the Speed of Traffic and the Number of Accidents and Accident Victims. Accident Analysis and Prevention, Vol. 50, 2013, pp. 854–860. https://doi.org/10.1016/j.aap.2012.07.012

91.

Soole

D. W.

Watson

B. C.

Fleiter

J. J.

Effects of Average Speed Enforcement on Speed Compliance and Crashes: A Review of the Literature. Accident Analysis and Prevention, Vol. 54, 2013, pp. 46–56. https://doi.org/10.1016/j.aap.2013.01.018

92.

Sohail

Cheema

M. A.

Ali

M. E.

Toosi

A. N.

Rakha

H. A.

Data-Driven Approaches for Road Safety: A Comprehensive Systematic Literature Review. Safety Science, Vol. 158, 2023, pp. 1–23. https://doi.org/10.1016/j.ssci.2022.105949

93.

Wegman

Berg

H. Y.

Cameron

Thompson

Siegrist

Weijermars

Evidence-Based and Data-Driven Road Safety Management. IATSS Research, Vol. 39, pp. 19–25.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.05 MB

0.00 MB