Comprehensive Investigation of Commercial Motor Vehicle Crashes along Roadway Segments in Kentucky

Abstract

The negative binomial (NB) model, traditionally used for safety performance function (SPF) development, suffers from a fixed over-dispersion parameter and is only valid for over-dispersed data (i.e., data exhibiting greater variance than the mean). A more flexible approach that handles over-dispersed data, under-dispersed data, and excess zero counts, in addition to exhibiting varying dispersion parameter as a function of site-specific characteristics, is the zero-inflated heterogeneous Conway–Maxwell–Poisson (ZI-HTCMP) model, which is an extension of Conway–Maxwell–Poisson (CMP)-based models. This study develops fatal + injury (FI) commercial motor vehicle (CMV) crash-specific SPFs along four roadway segment facilities in Kentucky, U.S., (urban multilane, rural multilane, urban two-lane, and rural two-lane segments). The traditional NB and newly introduced CMP-based models—ZI-HTCMP, zero-inflated Conway–Maxwell–Poisson (ZI-CMP), heterogeneous Conway–Maxwell–Poisson (HTCMP), and CMP—were compared using 14,967 CMV-related crashes on Kentucky’s road segments (between 2015 and 2019) and various roadway variables, for example, shoulder width, annual average daily traffic (AADT), and heavy vehicle percentage (HVP). From the developed SPFs, AADT and HVP >10% significantly increased FI CMV-related crashes on all four segment facilities. Various goodness-of-fit (GOF) statistics, including Akaike information criterion (AIC), mean absolute deviance (MAD), and mean square prediction error (MSPE), were used for model assessment and selection. For all four roadway facilities, CMP-based models showed better model fitting and prediction performance than the NB models. Furthermore, ZI-HTCMP was the best-fit model for urban multilane segments, which had high representation of zero-crash sites. CMP-family models could be used for effectively predicting FI CMV-related crashes (with excess zeros) on road segments.

Keywords

safety commercial vehicles general truck and bus data trucks

Roadway segments are considered critical locations along the transportation network, mainly because of the increased speeding violations ( 1 ). Segment-related crashes could be even more severe when they involve commercial motor vehicles (CMVs) (i.e., large trucks and commercial buses), mainly because of their larger size, weight, and sight distance obstruction.

Figure 1 shows the distribution of “severe” CMV crashes (i.e., “K + A” or “fatal + suspected serious injury” crashes), and “fatal + injury” (FI) CMV crashes (i.e., “K + A + B + C” crashes) (using the KABCO scale of injury severity) in Kentucky, U.S., between 2015 and 2019, broken down by three main road entities (segments, intersections, and other, e.g., “ramps and driveways”). Note that the percentage of each of severe (K + A) and FI (K + A + B + C) CMV crashes was not calculated based on having the same total crash count in the denominator. For example, the percentage of severe (K + A) CMV crashes on road segments was estimated to be 68.3%, which was calculated by dividing the number of severe CMV crashes on road segments (i.e., 705) by the total number of severe CMV crashes (i.e., 1,032) (that is, 705 × 100/1,032 = 68.3%). Similarly, the percentage of FI (K + A + B + C) CMV crashes along road segments was estimated as 66.6% (i.e., 3,335 × 100/5,010). As seen, roadway segments constitute the highest percentage of both severe and FI CMV crashes in Kentucky. Compared to intersections, roadway segments have more than double the share of each of severe and FI CMV crashes.

Figure 1.

Distribution of severe and fatal + injury (FI) commercial motor vehicle (CMV) crashes by roadway entity in Kentucky (2015–2019).

The safety investigation of different crash types along roadway segments is well-represented in the literature. However, there exist limited studies that specifically analyzed CMV-related crashes along roadway segments, especially for crash frequency analysis and safety performance function (SPF) development. For example, the majority of CMV safety studies have focused on analyzing the crash injury severity (see, for example, Azimi et al., and Bernard and Mondy) ( 2 , 3 ). To the authors’ knowledge, CMV-crash-specific SPFs have been very rarely developed along roadway segments to serve as the benchmark for CMV crash prediction along those hazardous roadway entities.

For SPF (or crash prediction model) development, the negative binomial (NB) model is the primary approach adopted in the Highway Safety Manual (HSM) ( 4 ). However, the traditional NB model restricts the over-dispersion parameter to be fixed across the sites. Moreover, the traditional NB model is only valid for over-dispersed data (i.e., data with variance greater than the mean). A more flexible modeling approach that could handle both over-dispersed and under-dispersed data and exhibits a varying dispersion parameter is the heterogeneous Conway–Maxwell–Poisson (HTCMP) model, which is an extension of the traditional Conway–Maxwell–Poisson (CMP) model. A more-refined form of the HTCMP model that could handle excess zeros while fitting an SPF (e.g., for fitting FI CMV crashes) is the zero-inflated heterogeneous Conway–Maxwell–Poisson (ZI-HTCMP) model. The ZI-HTCMP model incorporates varying parameter distribution as part of the “zero-model portion,” in addition to varying dispersion parameter as a function of site-specific characteristics, as well as over-dispersed and under-dispersed data handling flexibility. Despite the advantages of CMP-family models, they have not been applied in the safety literature for developing FI CMV crash SPFs (i.e., models with excess zero counts).

This study takes the initiative and has two objectives: 1) to conduct preliminary analysis of FI CMV-related crashes along roadway segments in Kentucky; and 2) to develop SPFs for FI CMV crash-specific SPFs on roadway segments using each of the traditional NB and the newly-introduced CMP-based models (including ZI-HTCMP, ZI-CMP, HTCMP, and CMP) for four segment facilities in the HSM (urban multilane, rural multilane, urban two-lane, and rural two-lane). To accomplish these objectives, comprehensive data were used, including 14,967 CMV-related crashes along Kentucky’s road segments (between 2015 and 2019) and a myriad of roadway variables (e.g., right shoulder width, median barrier type, and functional classification).

Literature Review

CMV-Related Studies

There have been limited studies that investigated CMV crash frequencies along roadway segments. For example, Daniel and Chien investigated truck crashes on urban arterials in New Jersey ( 5 ). The NB and Poisson regression models were developed using truck-crash data collected from 1998 to 2000. The results showed that the NB model was found to better fit the over-dispersed crash data than the Poisson model. Also, applying the NB and Poisson models, Dissanayake and Amarasingha investigated the factors contributing to large truck collisions on highway sections within Kansas between 2005 and 2010 ( 6 ). They found that lane width, shoulder width, and roadway grade significantly influenced large-truck crashes. Dong et al. identified those characteristics related to truck-involved crashes on Tennessee highways between 2004 and 2007 using the bivariate zero-inflated NB model ( 7 ). Similar to Dissanayake and Amarasingha, lane width and right shoulder width significantly affected truck crashes on Tennessee highways ( 6 ). Another CMV-related study on road segments was conducted by Zhou et al. ( 8 ). However, the study mainly dealt with the operations side (and not the traffic safety aspect) by estimating passenger car equivalents on level freeway segments in Nebraska.

As noted from the safety literature, the majority of CMV-related studies have focused on analyzing the crash injury severity. For example, Azimi et al. examined the severity of large truck crashes on state highways in Florida using crashes from 2007 to 2016, applying the random-parameter ordered logit models ( 2 ). The authors found that roadway characteristics, such as shoulder type (paved versus unpaved) significantly affected the severity. Bernard and Mondy used crash data from 2002 to 2012 in Missouri and applied the chi-square automatic interaction detector decision tree models to investigate the relationship between driver gender and the corresponding injury severity in large-truck-related crashes in Missouri ( 3 ). They found that female drivers following too closely, with physical impairment, and improperly passing were majorly correlated with an increased crash injury severity.

Safety Studies Performed on Roadway Segments

Alarifi et al. analyzed roadway-corridor-related crashes (including segments) that took place between 2010 and 2012 in Florida, using the multivariate hierarchical Poisson-lognormal spatial joint model ( 9 ). The most significant factor that affected corridor-related crashes has been the segment’s annual average daily traffic (AADT). In another study in Florida, Pande et al. used classification-tree-based models to analyze crash frequency between 2004 and 2008 on one corridor in Pasco County, Florida ( 10 ). The authors found that the higher the heavy vehicle percentage (HVP), the higher the collision risk was. Other segment-related safety studies can be found in Zheng et al., Zhang et al., and Afghari et al. ( 11 – 13 ).

SPF-Related Studies

Lord and Park developed SPFs while applying the standard NB and generalized NB (GNB) models using crashes at rural three-legged intersections in California for 5 years (1997–2001) ( 14 ). The results showed that the GNB model showed better statistical performance in relation to the goodness-of-fit (GOF) measures, as well as identification of hazardous sites. Lu et al. developed and compared the simple (or AADT-only) and full SPF models while applying the standard NB model ( 15 ). Both SPFs were developed for total and fatal/injury crashes that occurred on urban four-lane freeway interchange influence areas in Florida from 2007 to 2010. They showed that both models resulted in similar predictive performance and network screening results.

The Conway–Maxwell–Poisson (CMP) modeling approach has rarely been used for SPF development. Recently, Shirani-bidabadi et al. used the CMP model to develop separate SPFs for five facilities, including urban two-lane undivided segments, urban four-lane divided/undivided segments, rural two-lane undivided segments, urban four-leg and three-leg signalized intersections, and urban four-leg and three-leg stop-controlled intersections ( 16 ). The CMP models were compared to the multivariate adaptive regression splines (MARS) data mining technique. A total of 1,311 bicycle-vehicle crashes collected from 2011 through 2015 in Alabama were used. The MARS models were found to outperform the corresponding CMP models for all five roadway facilities.

Some safety studies have applied zero-inflated models to analyze crash count data characterized by excessive zeros. For example, using crash data collected in the state of Washington from 2002 to 2005, Easa and You developed different SPFs, including Poisson, NB, zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) models to investigate the safety impacts of geometric design elements on traffic crashes that occurred on two-lane rural highways ( 17 ). The ZIP model was found to be the best-fit approach. The application of the ZINB model can be also found in Apronti et al. and Anastasopoulos ( 18 – 19 ). For example, Apronti et al. examined the safety effectiveness of steep grade advance warning signs used for truck safety on Wyoming’s mountainous passes ( 18 ). The ZINB model was developed as an SPF tool using 10-year (2006–2015) crash records in Wyoming. The study showed that the advance warning systems used did not significantly reduce truck crash risks at high-risk locations.

Literature Review Summary and Study Contribution to the State-of-the-Art

The literature review revealed limited studies that specifically investigated CMV (or large truck and bus) crash frequencies along roadway segments. In addition, to the authors’ knowledge, there exist relatively limited, if any, studies that developed SPFs for predicting FI CMV crashes along roadway segments. This study attempts to fill such research gap by analyzing FI CMV-related crashes along roadway segments in Kentucky and developing SPFs tailored to FI CMV crashes on Kentucky’s roadway segments while applying each of the traditional NB and a myriad of CMP-based models (i.e., ZI-HTCMP, ZI-CMP, HTCMP, and CMP) for four segment facilities in the HSM (urban multilane, rural multilane, urban two-lane, and rural two-lane) using 14,967 CMV-related crashes along Kentucky’s roadway segments (2015–2019).

Data Collection and Preparation

General Data Preparation for Preliminary CMV Crash Investigation

The data used were collected from two different databases provided by the Kentucky Transportation Cabinet (KYTC). These were the roadway inventory database (containing information on roadway characteristics and traffic volume for all roadways in the state of Kentucky), as well as the crash database (consisting of 5-year [2015–2019] CMV-related crashes). The 5-year CMV crash data was then integrated with the roadway inventory data based on the milepost of each crash, roadway ID, and the beginning and ending milepost of each road segment. Since this study investigates CMV crashes on road segments only, all intersection-related crashes, occurring within 250 ft of the center of the intersection or ramp terminal, were excluded.

After removing incomplete crash records, a total of 14,967 CMV crashes remained in the finally merged database. Of the 14,967 CMV crashes, 11,482 crashes occurred on multilane road segments, of which 6,304 crashes (or 55%) occurred in urban areas, and the remaining 5,178 crashes (45%) occurred in rural areas. Of the remaining 3,485 crashes on two-lane roadways, 2,899 (83%) took place in rural areas and the remaining 586 crashes (17%) occurred in urban areas. To estimate chi-square of independence and Z-test of proportions for the preliminary investigation, all continuous variables (e.g., speed limit, HVP, and right shoulder width) were transformed into dummy (or indicator) variables.

Data Preparation for CMV Crash SPF Development

To develop FI CMV-related SPFs, CMV-related crashes were screened to only include FI crashes. These crashes were then aggregated into roadway segments. In this regard, roadway sections were split into homogeneous segments based on the homogeneity in roadway geometric and traffic characteristics. To this end, the right shoulder width, number of traffic lanes, area type (urban or rural), roadway type (divided or undivided), and AADT were used to produce the final homogeneous segments. A homogeneous segment ends where at least one of these variables changes and a new homogeneous segment starts afterwards. This segmentation regime produces road segments in which geometric design, traffic conditions, or both, are consistent for each homogeneous segment.

Figure 2 shows how eight imaginary homogeneous segments were generated based on any changes in the five roadway characteristics. For example, it can be seen that the first homogeneous segment (Homog. Seg. 1) ended where the roadway changed from divided to undivided, while other roadway characteristics remained unchanged (i.e., area type is rural, AADT = 2,000, number of traffic lanes = 2, and right shoulder width = 4 ft). Similarly, the second homogeneous segment (Homog. Seg. 2) ended where area type changed from rural to urban, while other roadway attributes were still consistent. Similar explanations can be given for the other homogeneous segments (Homog. Seg. 3 through Homog. Seg. 8). Segments shorter than 0.1 mi were removed from the database to avoid the issue of low exposure ( 20 ).

Figure 2.

Schematic diagram of homogeneous roadway segmentation in this study.

After the segmentation process, a total of 3,139 homogeneous segments were produced with lengths varying from 0.102 to 11.08 mi and an average length of 1.85 mi. Of the 3,139 homogeneous segments, 63 zero-crash segments were randomly collected across the state of Kentucky to develop valid SPFs for different roadway facilities. For each roadway facility, the homogeneous segments were then randomly spilt into the calibration (75% or 2,354 segments) and validation (25% or 785 segments) datasets, where the former was used to develop crash frequency models and the latter was used to assess the GOF of the fitted models. Table 1 shows descriptive statistics of explanatory variables used in SPF development for the four roadway segment facilities.

Table 1.

Descriptive Statistics of Explanatory Variables

Variable	Variable description	Urban multilane				Rural multilane				Urban two-lane				Rural two-lane
Variable	Variable description	Min.	Max.	Mean	SD	Min.	Max.	Mean	SD	Min.	Max.	Mean	SD	Min.	Max.	Mean	SD
Continuous variables (calibration dataset):
Ln(AADT)	Natural logarithm of AADT	7.32	12.26	10.43	0.89	7.27	11.19	9.79	0.77	5.56	10.79	8.65	0.80	3.37	9.91	7.34	1.08
HDC	Horizontal degree of curve	0	11	0.35	1.26	1	11	0.26	1.11	0	28	2.29	5.61	0	28	3.02	6.05
Vertical grade (%)	Vertical grade (in percent)	0.20	7.50	1.79	1.28	0.20	5.50	1.90	1.12	0.20	7.5	1.08	1.37	0.20	8.50	1.38	1.79
IRI	Road IRI	27	399	95.48	48.57	28	167	63.69	25.12	42	400	131.71	61.97	30	483	113.86	43.80
SL (mph)	Speed limit (mph)	25	70	53.75	9.57	35	70	65.03	7.33	25	55	44.29	8.85	25	55	53.49	4.77
RSW (ft)	Right shoulder width (ft)	0	14	7.36	4.19	0	14	8.46	2.97	0	14	3.97	3.03	0	14	4.14	2.64
Lane width (ft)	Lane width (ft)	9	13	11.9	0.39	10	13	11.98	0.21	8	21	10.69	1.42	6	15	9.93	1.17
Number of lanes*	Total number of lanes	3	12	4.90	1.54	3	7	4.40	0.84	na*	na	na	na	na	na	na	na
Median width (ft)*	Median width (ft)	0	98	25.86	21.88	0	98	39.57	20.96	na	na	na	na	na	na	na	na
		Urban multilane				Rural multilane				Urban two-lane				Rural two-lane
Variable	Variable description	No. of sites		No. of FI CMV crashes		No. of sites		No. of FI CMV crashes		No. of sites		No. of FI CMV crashes		No. of sites		No. of FI CMV crashes
Number of sites and FI CMV crashes (calibration dataset)	Site and crash summary statistics (calibration dataset)	458		816		292		761		251		61		1,353		478
Number of sites and FI CMV crashes (validation dataset)	Site and crash summary statistics (validation dataset)	153		237		98		220		83		23		451		162
Indicator dummy variables (calibration dataset):
Road functional class
Interstate	1 if true, otherwise 0	161		686		146		620		1		2		0		0
Principal arterial	1 if true, otherwise 0	215		104		125		122		21		11		107		53
Minor arterial	1 if true, otherwise 0	77		25		15		10		138		30		275		132
Major collector	1 if true, otherwise 0	4		1		6		9		64		14		513		195
Minor collector	1 if true, otherwise 0	1		0		0		0		16		1		358		77
Local	1 if true, otherwise 0	0		0		0		0		11		3		100		21
Paved shoulder	1 if true, otherwise 0	339		765		284		752		204		54		1,326		473
HVP >10%	1 if HVP >10%, otherwise 0	199		558		248		732		66		23		445		201
Flat terrain type	1 if true, otherwise 0	186		314		89		194		78		20		115		43
Undivided*	1 if roadway is not divided, otherwise 0	88		34		12		13		na		na		na		na
Type of median barrier*:
Cable barrier	1 if true, otherwise 0	47		161		34		191		na		na		na		na
Concrete barrier	1 if true, otherwise 0	109		471		58		270		na		na		na		na
Guardrail	1 if true, otherwise 0	4		7		3		1		na		na		na		na

Note: AADT = annual average daily traffic; CMV = commercial motor vehicle; FI = fatal + injury; HVP = heavy vehicle percentage; IRI = international roughness index; Min. = minimum; Max. = maximum; SD = standard deviation; na = not applicable.

for urban and rural two-lane road segments.

Figure 3 shows histograms of FI CMV crash frequencies for the four different facilities. The figure illustrates that almost all the roadway facilities, especially rural two-lane segments and urban multilane segments, possess excess zeros in the crash data. Under such conditions, the standard count models (such as the NB and CMP models) might not be able to handle excess zeros. To better fit the crash data characterized by excess zero counts, the CMP-based models (including ZI-HTCMP, ZI-CMP, HTCMP, and CMP) were introduced and compared with the NB model in this study.

Figure 3.

Histograms of fatal + injury (FI) commercial motor vehicle (CMV) crash frequencies along road segments for: (a) urban multilane, (b) rural multilane, (c) urban two-lane, and (d) rural two-lane.

Methodology

Preliminary Analysis

The preliminary analysis was performed using the Z-test of proportions, chi-square test of independence, and odds ratio (OR) analysis. The Z-test of proportions was used to assess whether there was a significant difference between two proportions (e.g., the proportion of FI CMV crashes in urban versus rural areas). The Z-statistic can be calculated using Equations 1 and 2.

Z - statistic = \frac{\hat{P_{1}} - \hat{P_{2}}}{\sqrt{\frac{\bar{P} \bar{q}}{n_{1}} + \frac{\bar{P} \bar{q}}{n_{2}}}}

(1)

\bar{P} = \frac{number of successes in sample 1 + number of succeses in sample 2}{n_{1} + n_{2}}

(2)

where

$\hat{P_{1}}$ and $\hat{P_{2}}$ = the two sample proportions,

n₁ and n₂ = sample sizes, and

$\bar{q}$ = 1 – $\bar{P}$ .

The chi-square test of independence was performed to determine if there was a significant correlation between two categorical variables, for example, “crash injury severity” and various crash and roadway features. Equation 3 shows the calculation of the chi-square test statistic.

X^{2} = \sum_{i = 1}^{n} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}}

(3)

where

X ² = chi-square test statistic,

O = observed CMV crash frequency,

E = expected CMV crash frequency,

i = specific cell ID, and

n = CMV crash sample size.

As part of the preliminary safety investigation, the OR was used to assess the relative risk of FI CMV-related crashes between two groups, for example, urban two-lane facilities versus rural two-lane facilities. OR indicates how likely it is that FI CMV-related crashes happen in urban two-lane facilities versus rural two-lane facilities. Equation 4 shows a sample example of how to calculate OR for FI CMV-related crashes along urban two-lane segment facilities:

O R_{Urb .} = \frac{\frac{N_{FI urban 2 - lane}}{N_{non - FI urban 2 - lane}}}{\frac{N_{FI non - urban 2 - lane}}{N_{non - FI non - urban 2 - lane}}}

(4)

where

$N_{FI urban 2 - lane}$ = number of FI CMV-related crashes on urban two-lane facilities,

$N_{FI non - urban 2 - lane}$ = number of FI CMV-related crashes on non-urban two-lane facilities, $N_{non - FI urban 2 - lane}$ = number of non-FI CMV crashes on urban two-lane facilities,

$N_{non - FI non - urban 2 - lane}$ = number of non-FI CMV crashes on non-urban two-lane facilities.

A similar formula for OR_Rur. can be derived for rural facilities while replacing “urban” with “rural” and “non-urban” with “non-rural” in Equation 4. Note that, whenever the OR is >1, FI CMV-related crashes on a specific segment facility type are more likely to occur, and vice versa.

SPF Model Development

The NB model has traditionally been used to serve as an SPF tool in multiple safety studies, and readers are referred to Lord and Mannering for more details ( 21 ). The NB model can handle over-dispersion appropriately; however, it is unable to accommodate the issue of under-dispersion where the crash variance is less than its mean. Another limitation of the NB model is that it fails to properly estimate traffic crashes characterized by small sample size ( 7 ). The CMP regression model can appropriately account for the above issues. An advantage of the CMP model over the NB model is that the former can model both under-dispersed and over-dispersed count data ( 16 ). The probability mass function of a CMP distribution is presented by:

P (y) = \frac{λ^{y}}{{(y_{i}!)}^{ν} Z (λ, ν)}, y = 0, 1, 2, \dots

(5)

\ln (λ) = β X_{i}

(6)

where

$λ$ approximates the mean of observations,

$β$ = vector of coefficients to be estimated,

$X_{i}$ = set of explanatory variables,

$ν$ = dispersion parameter (if $ν = 1$ , a CMP model reduces to the standard Poisson model),

$ν > 1$ = under-dispersion,

$ν < 1$ = over-dispersion), and

$Z (λ, ν) = \sum_{s = 0}^{\infty} \frac{λ^{s}}{{(s_{i}!)}^{ν}}$ , which normalizes the distribution ( 22 ).

Since this study aimed to model FI CMV-related crash frequency separately for different roadway facilities, there were many segments on which no FI CMV crashes occurred during the study period, which resulted in zero inflation in the crash data (refer to Figure 3). This study thus adopted rarely used approaches, the ZI-HTCMP, HTCMP, and ZI-CMP models. ZI-HTCMP is an extension of the ZI-CMP model, which allows the dispersion parameter to vary across observations. ZI-CMP was initially introduced by Sellers and Raim ( 23 ). The ZI-HTCMP model is a mixture of a zero-degenerated distribution with probability P, and a heterogeneous CMP distribution with probability (1 −P) ( 24 – 26 ). An advantage of ZI-HTCMP over its ZINB counterpart is that it is more flexible in handling under-dispersion or over-dispersion while still accounting for zero-inflation, in addition to the varying dispersion parameter ( 26 ). The probability density function for the ZI-HTCMP model is given as:

P (Y = y) = {\begin{matrix} p + (1 - p) \frac{1}{Z (λ, ν)} if y = 0 \\ (1 - p) \frac{λ^{y}}{{(y_{i}!)}^{ν} Z (λ, ν)} if y \geq 1 \end{matrix}

(7)

logit (p) = \log (\frac{p}{1 - p}) = ω K_{i}

(8)

\log (ν) = γ ψ_{i}

(9)

where

$p$ ( $0 < p < 1$ ) = probability of being in the zero-crash-state.

$K_{i}$ = vector of explanatory variables (which are not necessarily the same as those used for estimating $λ$ ),

$ω$ = vector of regression coefficients corresponding to covariates $K_{i}$ ,

$ψ_{i}$ = vector of explanatory variables, and

$γ$ = vector of regression coefficients corresponding to covariates $ψ_{i}$ .

The parameters of the ZI-HTCMP model can be estimated via maximizing the log-likelihood function of Equation 7, as follows ( 26 ):

\log (L (λ, ν, p | X, ψ, K)) = \sum_{i = 1}^{n} {\begin{matrix} u_{i} \log (p_{i} Z (λ_{i}, v_{i}) + (1 - p_{i}) + (1 + u_{i}) \\ [\log (1 - p_{i}) + y_{i} \log λ_{i} - v_{i} \log (y_{i}!)] - m \log (Z (λ_{i}, v_{i})) \end{matrix}}

(10)

where

$u_{i}$ = indicator value when $Y_{i} = 0$ (i.e., $u_{i} = 1$ ) or not ( $u_{i} = 0$ ).

ZI-HTCMP is a more flexible approach that makes it possible to understand how the dispersion and zero-inflation portions of the model affect the distributional form of the response variable (crash frequency) ( 26 ). Note that ZI-HTCMP reduces to the ZI-CMP model whenever no variables are significant in the model’s dispersion part. Likewise, ZI-HTCMP reduces to the HTCMP model if no variables are found significant in the model’s zero part.

Model Comparison and Selection

The likelihood ratio test (LRT) was applied to make comparisons between the fitted ZI-HTCMP and ZI-CMP models, or between the HTCMP and CMP models, in which the latter models are nested to their former counterparts. LRT follows a chi-square distribution with degrees of freedom equal to the number of variables significantly found in the dispersion part, as follows:

LRT = 2 \times (L L_{ZI - HTCMP or HTCMP} - L L_{ZI - CMP or CMP}) ≅ χ_{(d . f . = p)}^{2}

(11)

where

$p$ = number of parameters in the dispersion part of a ZI-HTCMP or HTCMP model.

A significant value for LRT indicates that the model’s dispersion parameter is not fixed, but rather varies from one site to another.

For each roadway facility, the Akaike information criterion (AIC) and Bayesian information criterion (BIC) were used to compare between the NB model and its CMP-based counterpart (i.e., ZI-HTCMP, HTCMP, or ZI-CMP), as follows:

AIC = - 2 LL + 2 P

(12)

BIC = - 2 LL + P (\ln (n))

(13)

where

LL = model’s log-likelihood at convergence,

P = number of model parameters, and

n = number of observations.

Best models are those with the least AIC and BIC values. All models were fitted using R version 3.6.3 ( 27 ).

Two additional evaluation measures were used to compare between the prediction performance of the NB and CMP-based models (i.e., ZI-HTCMP, HTCMP, or ZI-CMP) using the validation dataset, namely the mean absolute deviance (MAD) and mean square prediction error (MSPE). These two measures are calculated as follows:

MAD = \frac{\sum_{i = 1}^{n} | μ_{i} - y_{i} |}{n}

(14)

MSPE = \frac{\sum_{i = 1}^{n} {(μ_{i} - y_{i})}^{2}}{n}

(15)

where

$y_{i}$ = observed number of crashes on segment $i$ ,

$μ_{i}$ = predicted number of crashes on segment $i$ , and

$n$ = sample size in the validation dataset.

Models with lower MAD and MSPE values provide better prediction performance.

Preliminary FI CMV Crash Analysis

Chi-Square Test, ORs, and Z-test of Proportions Results

Tables 2 and 3 show the results for those significant variables (at the 10% significance level) from the conducted chi-square test and Z-test of proportions for multilane segment facilities and two-lane segment facilities, respectively. In both tables, a negative sign for the Z-test statistic indicates the FI CMV crash percentage in rural facilities was higher than that percentage in urban facilities, and vice-versa. As seen, rural facilities (either two-lane or multilane) had consistently higher FI CMV crash involvement compared with urban facilities. The only exception was interstate/principal arterial, which had significantly higher FI CMV crash involvement along urban two-lane segment facilities (39.58%), as opposed to rural two-lane facilities (25.77%).

Table 2.

Significant Chi-Square Test and Z-Test of Proportions Results for Commercial Motor Vehicle (CMV)-Related Crashes on Urban and Rural Multilane Segment Facilities

Variable	Chi-square test				Z-test
	Urban multilane		Rural multilane		Urban multilane	Rural multilane	Z-test stat.	P-value
	OR_Urb.	P-value (Chi-sq.)	OR_Rur.	P-value (Chi-sq.)	FI CMV crash percentage	FI CMV crash percentage	Z-test stat.	P-value
Crash characteristics
Multi-vehicle crash	N/S*	N/S	1.27	0.007	17.68	20.07	–2.961	0.003
Characteristics of most-severely injured occupant:
Young age (≤30 years)	1.57	<0.001	1.4	<0.001	22.59	23.39	N/S	N/S
Middle age (31–59 years)	0.75	<0.001	0.89	0.100	15.72	18.53	−3.017	0.003
Old age (≥60 years)	N/S	N/S	0.82	0.053	16.55	16.77	N/S	N/S
Male	0.45	<0.001	0.52	<0.001	14.02	16.57	–3.250	0.001
Restraint used	0.27	<0.001	0.23	<0.001	16.56	17.80	−1.712	0.087
Environmental characteristics
Rainy	1.18	0.053	1.27	0.008	19.77	22.52	N/S	N/S
Clear/cloudy	0.86	0.060	0.84	0.030	17.15	18.60	–1.789	0.074
Roadway characteristics
Road functional class:
Minor arterial	N/S	N/S	1.75	0.051	16.12	29.31	–2.384	0.017
Dry road	0.75	<0.001	0.87	0.065	16.56	18.61	–2.468	0.014
Wet road	1.28	0.002	1.24	0.008	20.57	22.01	N/S	N/S
Icy road	1.33	0.073	N/S	N/S	21.90	17.51	N/S	N/S
Median width >30 ft	N/S	N/S	0.61	<0.001	17.32	18.42	N/S	N/S
Concrete median barrier	N/S	N/S	0.82	0.009	17.82	17.36	N/S	N/S
Median with no barrier	N/S	N/S	1.43	<0.001	17.97	23.58	–3.422	<0.001
Right shoulder width >6 ft	N/S	N/S	0.85	0.037	17.51	18.50	N/S	N/S
AADT/1,000/lane >5	N/S	N/S	0.61	<0.001	N/S	N/S	N/S	N/S
HVP (>10%)	N/S	N/S	0.60	0.021	17.54	19.07	–1.946	0.052
Daytime light condition	0.81	0.002	N/S	N/S	16.62	19.18	–2.870	0.004
Dark without streetlight	1.53	<0.001	N/S	N/S	24.02	19.35	2.108	0.035
Work zone	0.67	0.082	N/S	N/S	12.64	16.49	N/S	N/S
At-fault driver and vehicle characteristics
Driver characteristics:
Kentucky resident	1.47	<0.001	1.95	<0.001	22.22	30.29	–3.478	<0.001
Young age (≤30 years)	1.42	<0.001	1.23	0.015	21.71	21.85	N/S	N/S
Middle age (31–59 years)	N/S	N/S	1.13	0.084	17.57	20.08	–2.534	0.011
Driver action before crash:
Going straight ahead	1.69	<0.001	1.35	<0.001	21.45	20.83	N/S	N/S
Changing lanes	0.67	<0.001	0.65	<0.001	13.54	14.07	N/S	N/S
Merging	0.58	<0.001	0.59	0.013	11.35	12.50	N/S	N/S
Backing up	0.28	0.003	0	0.002	5.62	0.00	N/S	N/S
Turning	N/S	N/S	1.49	0.089	16.82	26.04	–1.888	0.059
Driver behavior:
Speeding	2.80	0.001	2.16	<0.001	35.88	32.79	N/S	N/S
Aggressive driving	N/S	N/S	1.28	0.022	18.26	22.82	–2.345	0.019
Distracted driving	N/S	N/S	1.25	0.003	18.09	21.48	–2.687	0.007
Alcohol use	3.14	0.001	2.24	<0.001	39.71	34.44	N/S	N/S
Drug use	5.92	0.001	3.93	<0.001	55.32	47.92	N/S	N/S
Drowsiness	2.02	0.001	1.32	0.069	29.90	23.69	N/S	N/S
Vehicle characteristics:
Passenger car	1.44	0.001	1.58	<0.001	21.70	25.19	–2.16	0.031
Light truck	1.31	0.003	1.58	<0.001	21.19	26.12	–2.235	0.025
Large truck/bus	0.67	0.001	0.56	<0.001	15.12	15.83	N/S	N/S

Note: AADT = annual average daily traffic; FI = fatal + injury; HVP = heavy vehicle percentage; OR_Rur. = odds ratio for FI CMV crashes along rural multilane roadways, OR_Urb. = odds ratio for FI CMV crashes along urban multilane roadways.

N/S = not significant at 10% significance level.

Table 3.

Significant Chi-Square Test and Z-Test of Proportions Results for Commercial Motor Vehicle (CMV)-Related Crashes on Urban and Rural Two-Lane Segment Facilities

Variable	Chi-square test				Z-test
	Urban two-lane		Rural two-lane		Urban two-lane	Rural two-lane	Z-test stat.	P-value
	OR_Urb _.	P-value (Chi-sq.)	OR_Rur.	P-value (Chi-sq.)	FI CMV crash percentage	FI CMV crash percentage	Z-test stat.	P-value
Crash characteristics
Multi-vehicle crash	3.37	<0.001	N/S*	N/S	21.01	22.83	N/S	N/S
Characteristics of most-severely injured occupant:
Young age (≤30 years)	N/S	N/S	1.19	0.091	18.25	25.75	–1.860	0.063
Middle age (31–59 years)	0.66	0.060	0.75	0.001	14.07	21.18	–2.974	0.003
Old age (≥60 years)	1.86	0.018	1.45	0.001	24.51	29.18	N/S	N/S
Male	0.40	<0.001	0.56	<0.001	13.38	21.50	–3.962	<0.001
Restraint used	0.37	0.008	0.24	<0.001	15.55	21.08	–2.952	0.003
Environmental characteristics
Autumn	N/S	N/S	1.19	0.069	13.25	25.57	–3.278	0.001
Winter	N/S	N/S	0.74	0.006	15.44	19.44	N/S	N/S
Roadway characteristics
Road functional class:
Interstate/principal arterial	3.85	<0.001	N/S	N/S	39.58	25.77	1.980	0.048
Local road	N/S	N/S	0.60	0.022	13.64	15.69	N/S	N/S
Minor arterial	N/S	N/S	1.34	0.004	15.57	27.54	–4.195	<0.001
Minor collector	N/S	N/S	0.68	0.001	7.41	18.28	N/S	N/S
Icy road	N/S	N/S	0.55	0.040	10.00	14.58	N/S	N/S
Grade >1.5%	0.51	0.053	N/S	N/S	10.00	21.61	–2.689	0.007
IRI >72	0.33	0.001	0.76	0.019	15.00	22.55	–3.887	<0.001
Lane width >10 ft	N/S	N/S	1.17	0.095	18.73	25.35	–2.270	0.023
Speed limit ≤45 mph	0.60	0.024	0.63	0.003	14.10	16.61	N/S	N/S
AADT/1,000/lane >5	1.68	0.040	N/S	N/S	18.81	25.05	–2.247	0.025
HVP (>10%)	N/S	N/S	0.84	0.065	14.59	22.29	–2.945	0.003
Daytime light condition	N/S	N/S	1.39	0.003	16.74	24.63	–3.644	<0.001
Dark with streetlights	N/S	N/S	0.38	0.060	12.20	10.53	N/S	N/S
Dark without streetlights	N/S	N/S	0.76	0.024	19.15	19.44	N/S	N/S
At-fault driver and vehicle characteristics
Driver characteristics:
Kentucky resident	1.86	0.005	1.54	<0.001	22.44	29.30	–1.946	0.052
Young age (≤30 years)	N/S	N/S	1.21	0.064	16.41	26.17	–2.333	0.020
Middle age (31–59 years)	N/S	N/S	0.85	0.073	17.61	22.15	–1.853	0.064
Old age (≥60 years)	N/S	N/S	1.43	0.002	15.91	29.02	–2.536	0.011
Driver action before crash:
Going straight ahead	1.77	0.020	1.38	0.002	19.29	24.71	–2.253	0.024
Parked	N/S	N/S	0.44	0.056	13.33	12.00	N/S	N/S
Backing up	0.00	0.002	0.10	<0.001	0.00	3.13	N/S	N/S
Driving wrong way	N/S	N/S	4.95	0.052	0.00	60.00	N/S	N/S
Driver behavior:
Speeding	N/S	N/S	3.34	<0.001	42.86	49.35	N/S	N/S
Aggressive driving	N/S	N/S	1.48	0.018	15.79	30.43	–2.444	0.015
Distracted driving	N/S	N/S	1.19	0.061	15.38	25.58	–3.341	0.001
Drowsiness	N/S	N/S	2.79	<0.001	50.00	45.28	N/S	N/S
Vehicle characteristics:
Passenger car	2.05	0.006	2.62	<0.001	26.04	41.18	–2.686	0.007
Light truck	N/S	N/S	1.31	0.054	22.06	28.00	N/S	N/S
Large truck/bus	0.48	0.001	0.51	<0.001	13.35	20.30	–3.288	0.001

Note: AADT = annual average daily traffic; FI = fatal + injury; HVP = heavy vehicle percentage; IRI = international roughness index; OR_Urb. = odds ratio for FI CMV crashes along urban two-lane roadways; OR_Rur. = odds ratio for FI CMV crashes along rural two-lane roadways.

N/S = not significant at 10% significance level.

Note that both tables also highlight the resulted OR as part of the chi-square test. For the four studied roadway segment facilities, the six highest OR values were presented in each for comparison purposes, as shown in Figure 4. As seen in Figure 4, driver-related factors, such as driving under the influence of drugs or alcohol, had the highest OR values (affecting FI CMV crashes) along two main segment facilities (urban and rural multilane facilities). Specifically, these two variables were only found on multilane roadways, likely because of the improved and wider roadway conditions. In addition, another driver-related factor, speeding, was found among the highest OR value estimates on rural two-lane, urban multilane, and rural multilane facilities. Because of increased speeds, the drivers’ ability to control their vehicles was reduced significantly.

Figure 4.

Highest six fatal + injury (FI) commercial motor vehicle (CMV) crash odds ratios for the four segment facilities: (a) urban multilane, (b) rural multilane, (c) urban two-lane, and (d) rural two-lane.

Also found significant among the highest OR values among three of the four facilities is “At-Fault Driver: Kentucky Resident,” possibly since such local drivers are more familiar with the roadway circumstances, allowing them to drive aggressively in some instances. From Figure 4, “At-Fault Driver in Passenger Car” was a common significant variable among the highest seven OR values in two facilities, urban two-lane and rural two-lane segments. This is possibly because passenger cars are smaller and provide less protection than a CMV or light truck, leading to serious injury crashes when colliding with CMVs. As expected, the associated OR value for “At-Fault Driver in Light Truck” was lower than that for “At-Fault Driver in Passenger Car” since light trucks have a larger size—fairly comparable with large trucks—which could enable light truck drivers to sustain less severe injury crashes (i.e., lower OR estimate).

Interestingly, “At-Fault Driver: Going Wrong Way” and “At-Fault Driver: Driving Aggressively” were ranked among the top seven OR values only on rural two-lane facilities (and not urban areas). Overtaking the lead vehicles is a common maneuver in rural two-lane roadways and aggressive driving is also common in rural areas, which could explain why these two variables were found as contributing factors to FI CMV-related crashes on rural facilities.

FI CMV Crash Safety Performance Function (SPF) Results

Table 4 shows parameter estimates of the best-fitted CMP-based models (i.e., best model from ZI-HTCMP, ZI-CMP, HTCMP, or CMP), as well as the NB models for the four different roadway segment facilities in the HSM. The NB was specifically presented since NB is the standard approach used for developing SPFs in the HSM. The deviance statistic (D) for all the fitted models was significant at the 1% level, which rejected the null hypothesis that the fitted models yielded the same performance as their constant-only models. Moreover, for the four roadway facilities, all GOF measures, including the AIC, BIC, MAD, and MSPE statistics, were lower for the CMP-based models when compared with their NB counterparts, indicating that the CMP family models could be used as SPF tools for effectively predicting FI CMV-related crashes on roadway segments. It should be noted that the dispersion parameter of the NB model for urban two-lane segments was not statistically significant at the 10% level. However, this model was retained for comparison purposes with the best CMP-based model.

Table 4.

Parameter Estimates for Fatal + Injury (FI) Commercial Motor Vehicle (CMV) Crash-Specific Safety Performance Functions (SPFs) along Different Roadway Segment Facilities in Kentucky

Variable	Urban multilane								Rural multilane						Urban two-lane						Rural two-lane
	Best CMP-based model: ZI-HTCMP						NB		Best CMP-based model: HTCMP				NB		Best CMP-based model: ZI-CMP				NB		Best CMP-based model: ZI-CMP				NB
	Count part		Zero part		Disp. part		Count part		Count part		Disp. part		Count part		Count part		Zero part		Count part		Count part		Zero part		Count part
Constant	–15.038	***	–3.429^a		9.673	**	–14.257	***	–6.467	***	5.464	***	–7.771	***	–10.413	***	0.578^a		–11.437	***	–5.297	***	–3.261	***	–6.710	***
Ln(AADT)	1.359	***	N/S^b		–0.863	**	1.288	***	0.859	***	N/S		0.922	***	0.831	***	N/S		0.691	***	0.309	***	N/S		0.409	***
HVP >10%	0.491	***	N/S		N/S		0.458	***	0.618	**	N/S		N/S		0.675	***	N/S		0.639	**	0.174	*	N/S		0.214	*
Hor. degree of curvature	0.244	***	N/S		N/S		0.190	***	N/S		N/S		N/S		0.041	**	N/S		N/S		N/S		N/S		N/S
Flat terrain type	N/S		N/S		N/S		N/S		N/S		–0.703	**	N/S		N/S		N/S		N/S		0.679	***	1.083	***	0.360	*
Speed limit (mph)	N/S		–0.297	***	N/S		N/S		–0.061	***	N/S		–0.044	***	N/S		N/S		N/S		N/S		N/S		N/S
Road functional class:
Principal arterial	N/S		N/S		N/S		–0.396	**	–0.445	***	N/S		–0.514	***	N/S		–1.602	**	N/S		N/S		N/S		N/S
Minor arterial	N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		–0.772	**	N/S		N/S		N/S
Minor collector	N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		–0.318	**	N/S		–0.348	**
IRI	N/S		N/S		N/S		N/S		N/S		N/S		0.004	*	0.008	***	0.007	**	N/S		N/S		N/S		0.003	*
Right shoulder width (ft)	–0.048	***	N/S		N/S		–0.039	**	N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S
Paved shoulder	N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		–1.103	*	0.822	*	N/S		N/S		N/S
Vertical grade (%)	N/S		N/S		0.504	**	N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S		N/S
Undivided	N/S		1.892	**	N/S		–0.606	**	N/S		N/S		N/S		na^c	na	na		na		na		na		na
Number of lanes	N/S		N/S		N/S		N/S		N/S		–1.109	***	N/S		na		na		na		na		na		na
Median width (ft)	–0.010	***	N/S		N/S		–0.008	***	–0.004	*	N/S		–0.006	**	na		na		na		na		na		na
Type of median barrier:
Cable barrier	N/S		N/S		–0.792	*	N/S		N/S		N/S		N/S		na	na	na	na	na		na		na		na
Concrete barrier	N/S		N/S		N/S		N/S		N/S		1.838	***	N/S		na	na	na	na	na		na		na		na
Lane width (ft)	N/S		1.251	*	N/S		N/S		N/S		N/S		N/S		N/S		N/S		0.202	*	N/S		N/S		N/S
Ln(fixed disp. parameter)							–1.501	***					–1.804	***	–2.283	**			–2.319^a		–0.288	**			–0.507	*

GOF statistics	Urban multilane		Rural multilane		Urban two-lane		Rural two-lane
GOF statistics	ZI-HTCMP	NB	HTCMP	NB	ZI-CMP	NB	ZI-CMP	NB
No. of segments	458	458	292	292	251	251	1,353	1,353
No. of sign. parameters	14	9	10	7	10	7	8	7
$L L_{β}$ (Log-Lik. at Conv.)	–599.8	–621.9	–501.6	–509.8	–149.2	–157.7	–1107.8	–1131.5
Deviance (D) statistic(P-value)	737 (<0.001)	298.2 (<0.001)	353.0 (<0.001)	148.2 (<0.001)	42.0 (<0.001)	24.2 (<0.001)	183.6 (<0.001)	98.4 (<0.001)
LRT (ZI-HTCMP versus ZI-CMP or HTCMP versus CMP): Chi-sq.(deg. of freedom, P-value)	ZI–HTCMP versus ZI–CMP LRT: Chi–sq. = 12.21 (3, 0.007)	na	HTCMP versus CMP LRT: Chi–sq. = 13.47 (3, 0.004)	na	na	na	na	na
AIC	1227.6	1261.9	1023.1	1033.7	318.4	329.3	2231.6	2277.0
BIC	1285.4	1299.0	1059.9	1059.4	353.7	354.0	2273.3	2313.5
MAD (validation data)	3.343	3.410	6.328	6.934	0.265	0.270	0.391	0.460
MSPE (validation data)	1.089	1.100	1.537	1.634	0.361	0.362	0.480	0.508

Note : AADT = annual average daily traffic; AIC = Akaike information criterion; BIC = Bayesian information criterion; CMP = Conway–Maxwell–Poisson; GOF = goodness-of-fit; HTCMP = heterogeneous Conway–Maxwell–Poisson; HVP = heavy vehicle percentage; LRT = likelihood ratio test; MAD = mean absolute deviance; MSPE = mean square prediction error; na = not applicable; NB = negative binomial; N/S = not significant; ZI-CMP = zero-inflated Conway–Maxwell–Poisson; ZI-HTCMP = zero-inflated heterogeneous Conway–Maxwell–Poisson.

Statistically significant at level 10% level.

Statistically significant at 5% level.

***

Statistically significant at 1% level.

Not statistically significant at 10% level (ONLY for constant term or constant dispersion parameter).^bN/S - not statistically significant at 10% level (for any parameter other than constant term).^cna = not applicable.

As shown in Table 4, ZI-HTCMP was the best-fit model for urban multilane segments, which had high representation of zero-crash sites (refer to Figure 3a). The superiority of the ZI-HTCMP model for urban multilane segments indicates that this model, with a varying dispersion parameter, could appropriately capture the presence of excess zeros and over-dispersion in the crash data by incorporating different sets of variables in the zero and dispersion parts. For example, the ZI-HTCMP model for urban multilane segments shows that speed limit, lane width, and undivided sections explained the presence of excess zero counts, while the natural logarithm of AADT (Ln(AADT)), vertical grade, and median cable barrier presence significantly explained the over-dispersion part. For example, undivided urban multilane segments increased the likelihood of zero FI crashes compared with divided ones (possibly since drivers tended to travel less aggressively and at lower speeds along undivided roads compared with divided ones). This was also reported by Abegaz et al. ( 28 ). Furthermore, wider lanes were found to reduce the probability of at least one FI CMV-involved crash along urban multilane segments.

The HTCMP model outperformed the other CMP-based and NB models for rural multilane segments (possibly since that facility had the least zero-crash representation compared with the other three facilities). For both rural and urban two-lane segments, the ZI-CMP model was found to best fit the data. The ZI-CMP model, on urban and rural two-lane segments, was capable of handling zero inflation and over-dispersion in the crash data, although it had a fixed dispersion parameter.

In relation to the parameter signs in the models, Ln(AADT) and HVP >10% significantly increased FI CMV crashes in all four models. These findings are intuitive and also in agreement with prior research because, when traffic volume and HVP increase, FI CMV crash exposure increases as a result ( 29 , 30 ). Not surprisingly, the horizontal degree of curvature was associated with an increased number of FI CMV-related crashes on urban multilane and urban two-lane segments. This is because sharp curves increase the risk of loss of vehicle control and roadside hazard strike (e.g., guardrails and concrete barriers), which increases the FI CMV crash risk. Compared with other terrain types, there were significantly more FI CMV-related crashes on flat terrains along rural two-lane segments. This might be because the probability of aggressive driving on flat road segments is higher than on rolling or mountainous terrain, therefore increasing the FI CMV crash risk (also reported by Hosseinpour et al. and Uddin and Huynh) ( 31 , 32 ).

There was also a connection between a higher speed limit and a reduced frequency of FI CMV crashes on rural multilane roadways. Generally, higher speed limits are posted on road segments with better geometric design features (e.g., more traffic lanes and wider lanes), where the number of FI CMV crashes could be reduced. This was also reported by Vadlamani et al. and Hauer et al. ( 29 , 33 ).

Principal arterials saw a reduction in the frequency of FI CMV-related crashes on rural multilane segments, whereas minor collectors saw a reduction in the number of FI CMV-related crashes on rural two-lane segments. This is possibly because principal arterials and minor collectors might exhibit safer geometric design at specific road sections, therefore reducing FI CMV crashes. Higher road international roughness index (IRI) was associated with an FI CMV crash frequency increase on urban and rural two-lane segments. Higher IRI indicates rougher road surface, which could be to the detriment of road safety and lead to an increase the frequency of FI CMV-related crashes (also found in Dong et al. and Yuan et al.) ( 7 , 34 ).

The increase in the right shoulder width was associated with a reduction in FI CMV-related crashes along urban multilane segments. This result is reasonable, as wider shoulders provide more recovery room for errant vehicles running off the road ( 7 ). A similar finding was reached for median width, where wider medians contributed to fewer FI CMV-related crashes on both urban and rural multilane segments. This is because wider medians reduce the risk of FI median crossover CMV-related crashes.

Conclusions and Recommendations

This study took the initiative and developed FI CMV crash-specific SPFs along four roadway segment facilities in Kentucky (urban multilane, rural multilane, urban two-lane, and rural two-lane segments). The traditional NB and newly introduced CMP-based models (including ZI-HTCMP, ZI-CMP, HTCMP, and CMP) were compared using 14,967 CMV-related crashes on Kentucky’s road segments (between 2015 and 2019) and various roadway variables (e.g., shoulder width, median width, median barrier type, AADT, and HVP). Various GOF statistics, including AIC, BIC, MAD, and MSPE, were used for model assessment and selection.

Before developing the SPFs, a preliminary investigation was conducted for analyzing FI CMV-related crashes in Kentucky using Z-test of proportions and OR analysis. As expected, rural facilities (either two-lane or multilane) had consistently higher FI CMV crash involvement compared with urban facilities, mainly because of speeding and aggressive driving. Interestingly, driver-related factors, such as driving under the influence of drugs or alcohol, had the highest OR values (affecting FI CMV crashes) along two main facilities (urban and rural multilane segments).

From the developed SPFs, AADT and HVP >10% significantly increased FI CMV-related crashes on all four segment facilities. On the other hand, some variables were found significant in one segment facility, but not in the others. For example, principal arterials were associated with an FI CMV crash frequency reduction on rural multilane segments, whereas minor collectors saw a reduced number of FI CMV-related crashes on rural two-lane segments. Not surprisingly, the horizontal degree of curvature was associated with an increased number of FI CMV-related crashes on urban multilane and urban two-lane segments. The increase in the right shoulder and median widths were associated with a reduction in FI CMV-related crashes along urban multilane segments.

For all four roadway facilities, CMP-based models showed better model fitting (i.e., lower AIC and BIC) and better prediction performance (i.e., lower MAD and MSPE) than the NB models. Furthermore, ZI-HTCMP was the best-fit model for urban multilane segments, which had high representation of zero-crash sites. The superiority of the ZI-HTCMP model for urban multilane segments indicates that this model, with a varying dispersion parameter, could appropriately capture the presence of excess zeros and over-dispersion in the crash data by incorporating different sets of variables in the zero and dispersion parts. For example, the ZI-HTCMP model for urban multilane segments showed that speed limit, lane width, and undivided sections explained the presence of excess zero counts, while Ln(AADT), vertical grade, and median cable barrier presence significantly explained the over-dispersion part. Furthermore, from the ZI-HTCMP model, wider lanes were found to reduce the probability of at least one FI CMV-involved crash along urban multilane segments.

In general, it is recommended to use CMP-based models for effectively predicting FI CMV-related crashes (with excess zeros) on road segments. More specifically, sites experiencing a larger representation of zero-crash counts will be best modeled using the ZI-HTCMP or ZI-CMP models, which are characterized by varying parameter estimates in the “zero part” of the model.

This study could help suggest specific geometric design recommendations to reduce FI CMV-related crashes. For example, it is recommended to design wider right shoulders and medians for urban multilane segments since they were associated with fewer FI CMV crashes. Furthermore, installing advance warning signs before sharp horizontal curves and installing rumble strips on curved sections could reduce FI CMV crashes along curved sections. To build on this study, future research can compare the SPF model results along the same four segment facilities in other states in the U.S. and worldwide. Specifically, it will be interesting to test the transferability of the proposed models to other U.S. states and other countries.

Footnotes

Acknowledgements

The authors would like to extend their gratitude to the Kentucky Transportation Cabinet (KYTC) for providing the necessary crash and roadway data used in this research.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: K. Haleem; data collection: M. Hosseinpour; analysis and interpretation of results: M. Hosseinpour, R. Love, B. Williams; draft manuscript preparation: M. Hosseinpour, R. Love, B. Williams, K. Haleem. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors would like to acknowledge the Kentucky Transportation Cabinet (KYTC) for the grant provided to conduct this research.

The opinions, findings, and conclusions in this paper are those of the authors and not necessarily those of the Kentucky Transportation Cabinet (KYTC).

References

Zhang

Bayesian Approach Based on Geographic Information System to Identify Hazardous Roadway Segments for Traffic Crashes. Transportation Research Record: Journal of the Transportation Research Board, 2007. 2024: 63–72.

Azimi

Asgari

Rahimi

Jin

Investigation of Heterogeneity in Severity Analysis for Large Truck Crashes. Presented at 98th Annual Meeting of the Transportation Research Board, Washington, D.C., 2019. (In Print)

Bernard

Mondy

Correlation of Driver Gender With Injury Severity in Large Truck Crashes in Missouri. Transportation Research Record: Journal of the Transportation Research Board, 2016. 2585:49–58.

American Association of State Highways and Transportation Officials. Highway Safety Manual. Washington, DC: AASHTO, 2010.

Daniel

Chien

S. I. J.

Truck Safety Factors on Urban Arterials. ASCE: Journal of Transportation Engineering, Vol. 130, 2004, pp. 742–752.

Dissanayake

Amarasingha

Effects of Geometric Design Features on Truck Crashes on Limited-Access Highways. Final Report #MATC-KSU: 454. Mid-America Transportation Center, Romeoville, IL, 2012.

Dong

Nambisan

Richards

Assessment of the Effects of Highway Geometric Design Features on the Frequency of Truck Involved Crashes Using Bivariate Regression. Transportation Research Part A: Policy and Practice, Vol. 75, 2015, pp. 30–41.

Zhou

Rilett

Jones

Chen

Estimating Passenger Car Equivalents on Level Freeway Segments Experiencing High Truck Percentages and Differential Average Speeds. Transportation Research Record: Journal of the Transportation Research Board, 2018. 2672: 44–54.

Alarifi

Abdel-Aty

Lee

A Bayesian Multivariate Hierarchical Spatial Joint Model for Predicting Crash Counts by Crash Type at Intersections and Segments along Corridors. Accident Analysis & Prevention, Vol. 119, 2018, pp. 263–273.

10.

Pande

Abdel-Aty

Das

A Classification Tree Based Modeling Approach for Segment Related Crashes on Multilane Highways. Journal of Safety Research, Vol. 41, No. 5, 2010, pp. 391–397.

11.

Zheng

Liu

Wang

Investigating the Predictability of Crashes on Different Freeway Segments Using the Real-Time Crash Risk Models. Accident Analysis & Prevention, Vol. 159, 2021, p. 106213.

12.

Zhang

King

Liu

Chen

Yan

Xing

Zhang

A Crash Risk Identification Method for Freeway Segments With Horizontal Curvature Based on Real-Time Vehicle Kinetic Response. Accident Analysis & Prevention, Vol. 150, 2021, p. 105911.

13.

Afghari

Haque

Washington

Applying a Joint Model of Crash Count and Crash Severity to Identify Road Segments With High Risk of Fatal and Serious Injury Crashes. Accident Analysis & Prevention, Vol. 144, 2020, p. 105615.

14.

Lord

Park

Investigating the Effects of the Fixed and Varying Dispersion Parameters of Poisson-Gamma Models on Empirical Bayes Estimates. Accident Analysis & Prevention, Vol. 40, No. 4, 2008, pp. 1441–1457.

15.

Haleem

Alluri

Gan

Full versus Simple Safety Performance Functions: Comparison Based on Urban Four-Lane Freeway Interchange Influence Areas in Florida. Transportation Research Record: Journal of the Transportation Research Board, 2013. 2398: 83–92.

16.

Shirani-bidabadi

Mallipaddi

Haleem

Anderson

Developing Bicycle-Vehicle Crash-Specific Safety Performance Functions in Alabama Using Different Techniques. Accident Analysis & Prevention, Vol. 146, 2020, p. 105735.

17.

Easa

You

Collision Prediction Models for Three-Dimensional Two-Lane Highways: Horizontal Curves. Transportation Research Record: Journal of the Transportation Research Board, 2009. 2092: 48–56.

18.

Apronti

Saha

Moomen

Ksaibati

Truck Safety Evaluation on Wyoming Mountain Passes. Accident Analysis & Prevention, Vol. 122, 2019, pp. 342–349.

19.

Anastasopoulos

Random Parameters Multivariate Tobit and Zero-Inflated Count Data Models: Addressing Unobserved and Zero-State Heterogeneity in Accident Injury-Severity Rate and Frequency Analysis. Analytic Methods in Accident Research, Vol. 11, 2016, pp. 17–32.

20.

Xiong

Abdel-Aty

A Correlated Random Parameter Approach to Investigate the Effects of Weather Conditions on Crash Risk for a Mountainous Freeway. Transportation Research Part C: Emerging Technologies, Vol. 50, 2015, pp. 68–77.

21.

Lord

Mannering

The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives. Transportation Research Part A: Policy and Practice, Vol. 44, No. 5, 2010, pp. 291–305.

22.

Conway

Maxwell

A Queuing Model with State Dependent Service Rates. Journal of Industrial Engineering, Vol. 12, No. 2, 1962, pp. 132–136.

23.

Sellers

Raim

A Flexible Zero-Inflated Model to Address Data Dispersion. Computational Statistics & Data Analysis, Vol. 99, 2016, pp. 68–80.

24.

Choo-Wosoba

Levy

Datta

Marginal Regression Models for Clustered Count Data Based on Zero-Inflated Conway–Maxwell–Poisson Distribution With Applications. Biometrics, Vol. 72, No. 2, 2016, pp. 606–618.

25.

Sim

Gupta

Ong

Zero-Inflated Conway-Maxwell Poisson Distribution to Analyze Discrete Data. The International Journal of Biostatistics, Vol. 14, No. 1, 2018, p. 20160070.

26.

Sellers

Young

Zero-Inflated Sum of Conway-Maxwell-Poisson (ZISCMP) Regression. Journal of Statistical Computation and Simulation, Vol. 89, No. 9, 2019, pp. 1649–1673.

27.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2019.

28.

Abegaz

Berhane

Worku

Assrat

Assefa

Effects of Excessive Speeding and Falling Asleep While Driving on Crash Injury Severity in Ethiopia: A Generalized Ordered Logit Model Analysis. Accident Analysis & Prevention, Vol. 71, 2014, pp. 15–21.

29.

Vadlamani

Chen

Ahn

Washington

Identifying Large Truck Hot Spots Using Crash Counts and PDOEs. ASCE: Journal of Transportation Engineering, Vol. 137, No. 1, 2011, pp. 11–21.

30.

Dong

Huang

Nambisan

Estimating Factors Contributing to Frequency and Severity of Large Truck–Involved crashes. ASCE: Journal of Transportation Engineering, Part A: Systems, Vol. 143, No. 8, 2017, p. 04017032.

31.

Hosseinpour

Sahebi

Zamzuri

Yahaya

Ismail

Predicting Crash Frequency for Multi-Vehicle Collision Types Using Multivariate Poisson-Lognormal Spatial Model: A Comparative Analysis. Accident Analysis & Prevention, Vol. 118, 2018, pp. 277–288.

32.

Uddin

Huynh

Factors Influencing Injury Severity of Crashes Involving HAZMAT Trucks. International Journal of Transportation Science and Technology, Vol. 7, No. 1, 2018, pp. 1–9.

33.

Hauer

Council

Mohammedsha

Safety Models for Urban Four-lane Undivided Road Segments. Transportation Research Record: Journal of the Transportation Research Board, 2004. 1897: 96–105.

34.

Yuan

Abdel-Aty

Yue

Eluru

Developing Safety Performance Functions for Freeways at Different Aggregation Levels Using Multi-State Microscopic Traffic Detector Data. Accident Analysis & Prevention, Vol. 151, 2021, p. 105984.