Application of the Bayesian Model Averaging in Predicting Motor Vehicle Crashes
YAJIE ZOU
DOMINIQUE LORD, PH.D.
YUNLONG ZHANG, PH.D.
YICHUAN PENG
Zachry Department of Civil Engineering
Texas A&M University, 3136 TAMU
College Station, TX 77843-3136
Phone: 979/595-5985,
Fax: 979/845-6481
Email: yajiezou@tamu.edu
KEYWORDS: crash model, Poisson, negative binomial, Bayesian model averaging, prediction
Abstract
Developing reliable statistical models is critical for predicting motor vehicle crashes in highway safety studies. However, the conventional statistical method ignores model uncertainty. Transportation safety analysts typically select a single "best" model from a series of candidate models (called model space) and proceed as if the selected model is the true model. This paper proposes a new approach for deriving more reliable and robust crash prediction models than the conventional statistical modeling method. This approach uses the Bayesian model averaging (BMA) to account for model uncertainty. The derived BMA crash model is an average of the candidate models included in the model space weighted by their posterior model probabilities. To examine the applicability of BMA to the Poisson and negative binomial (NB) regression models, the approach is applied to the crash data collected on 338 rural interstate road sections in Indiana over a five-year period (1995 to 1999). The results show that BMA was successfully applied to Poisson and NB regression models. More importantly, in the presence of model uncertainty, the proposed approach can provide better prediction performance than single models selected by conventional statistical techniques. Thus, this paper provides transportation safety analysts with an alternative methodology to predict motor vehicle crashes when model uncertainty is suspected to exist.
Introduction
In highway safety analysis, regression models play a significant role in identifying relationships between motor vehicle crashes and different explanatory variables, predicting accident frequency and screening variables. Up to now, a large number of analysis tools and models for analyzing crash data have been proposed by transportation safety analysts (Lord and Mannering 2010). Among these models, the negative binomial (NB) model remains the most frequently used tool for crash-frequency modeling (e.g., Lord and Mannering 2010; Miaou 1994; Miaou and Lord 2003; Malyshkina et al. 2009). Recently, some new methodologies and models have been proposed for the purpose of modeling and predicting motor vehicle crashes. For example, artificial neural networks (ANN) have been suggested as an alternative method for analyzing and predicting accident frequency (e.g., Abdelwahab and Abdel-Aty 2002; Chang 2005). However, these models can sometimes overfit the data. To overcome this problem, a few researchers (Xie et al. 2007) have examined the Bayesian neural networks (BNN) and concluded that BNN are more efficient than NB models for predicting crashes. The support vector machine (SVM) model (Li et al. 2008) was recently applied to crash data collected in Texas and was found to predict crashes more accurately than both NB and BNN models. Haleem et al. (2010) used the multivariate adaptive regression splines (MARS) technique to predict motor vehicle crashes and showed that the MARS predicts crashes almost as effectively as the traditional NB models, and its goodness-of-fit performance seems to show promise for adequately predicting crashes.
Despite extensive efforts on modeling and predicting crash data, the conventional statistical approach faces a few important challenges. The selection of subsets of explanatory variables is a basic part for building a crash prediction model. Given the dependent variable accident frequency y_{i} and the candidate explanatory variables X_{1},..., X_{k} , the general routine is to find the "best" regression model based on a selected number of variables to describe the crash frequency. In highway safety research, one typical approach is often to select a single "best" model based on some model selection criteria, such as log-likelihood, Akaike information criterion (AIC), Bayesian information criterion (BIC), Deviance information criterion (DIC), etc. (e.g., Park and Lord 2009; Pei et al. 2011). After the model is selected, further inferences are made with the assumption that the selected model is the true model. However, this approach neglects the uncertainty associated with the choice of models, especially those from the same category (e.g., Poisson model, NB or Poissonlognormal model) but with different combinations of explanatory variables. The uncertainty between models may be important in making inference particularly in the cases where more than one models are considered plausible but differ in predictions (Li and Shi 2010). If the uncertainty about the model is ignored, the quantities of interest (accident frequency) may be underestimated. BMA combines and averages all possible models (models with different combinations of explanatory variables) when making inferences about the quantities of interest (crash frequency) (Raftery et al. 1997). By computing the average over many different competing models, BMA incorporates model uncertainty into modeling output related to the parameter estimation and prediction. BMA has been applied successfully in various fields including engineering (Li and Shi 2010), meteorology (Raftery et al. 2005), epidemiology (Viallefont et al. 2001), water resources (Duan et al. 2007), etc., and in most cases, BMA can improve the prediction performance. In this study, we have two objectives: the first objective is to examine the applicability of BMA to the Poisson and NB regression models for traffic accident analysis (the most basic models for count data); the second objective is to compare the model prediction performance between BMA and the conventional statistical approach used in transportation safety analysis. To accomplish these two objectives, BMA is examined using accident data collected on 338 rural interstate road sections in Indiana. The next section outlines the methodology used in this study.
Methodology
This section describes the characteristics of the NB regression and BMA, as well as the Occam's Window Method. This latter method is used for discarding models that predict much more poorly than their competitors in the model space.
Negative Binomial Regression
Because the crash-frequency data on a highway section are non-negative and discrete integers, the most basic model for modeling crash data is the Poisson regression model. The advantage of Poisson regression model is that it is easy to estimate the parameters. However, past studies (Lord and Mannering 2010) have indicated that the Poisson regression model cannot accommodate observed over-dispersion in crash data. Moreover, this model (and its sister the NB model described below) can be adversely influenced by the low sample-mean and small sample size bias (Lord, 2006). The NB regression model is an extension of the Poisson regression model and is used for handling the over-dispersion often observed in crash data. The derivation of the NB regression model is as follows: the number of crashes yi at roadway entity i during some time period is assumed to be Poisson distributed and independent over all entities, which
is defined by:
(1) P(y_{i}) = (λ_{i}^{yi} exp(-λ_{i})) / y_{i}!
where P(y_{i}) is the probability of roadway entity ^{i} having y_{i} crashes for a given time period and λ_{i} is the expected accident frequency E[y_{i}] for roadway entity i . The expected accident frequency λ_{i} is structured as a function of explanatory variables,
λ_{i} = exp(β X_{i})
where X_{i} is a vector of explanatory variables and β is a vector of estimable coefficients.
The NB regression model arises if we assume that the parameter λ_{i} follows a gamma distribution. A gamma-distributed error term is added to the parameter λ_{i} and equation (2) is rewritten as follows:
λ_{i} = exp( β X_{i} + ε_{i})
where exp( ) is the added error term with mean 1 and variance α , and α is the dispersion parameter. With this new structure, the mean is allowed to differ from the variance such that VAR[y_{i}] = E[y_{i}] [1 + αE[y_{i}]] = E[y_{i}] + αE[y_{i}]^{2} . Despite the documented limitations (Lord and Mannering 2010; Hilbe 2011; Zou et al., 2012), the NB model is popular for modeling crash data for several reasons. First, most statistical software programs have built-in functions that can handle such models. Second, two types of analysis commonly used in highway safety are available within the NB modeling framework. The first type of analysis is Empirical Bayesian method, and the second one is related to the estimation of confidence and prediction intervals for NB models (see Lord 2006). Besides, Hauer (1997) also oncluded
that the NB model is the most common distribution used for modeling crash data because its marginal distribution has a closed form and this mixture results in a conjugate model.
Bayesian Model Averaging
When describing BMA, consider a model space M of K models M (k=1, 2,..., K) k and let y denote the quantity of interest (a future observation of the accident frequency using new input data). The posterior distribution of y given the observed data D , is
(4) p(y|D) = k∑k=1 p(y|M_{k},D) p(M_{k}|D)
where p(y | M_{k},D) is the posterior distribution of y under model M_{k} given data D and p(M_{k} | D) is the likelihood of M_{k} being the correct prediction model given the observational data D, which is also known as the posterior model probability (PMP). The output of BMA method is an average of the posterior distribution p(y | M_{k},D) weighted by the corresponding posterior probabilities, w_{k} = p(M_{k} | D). For any model space, the sum of w_{k} equals 1. The posterior model probability is given by:
(5) p(M_{k}|D) = (p(M_{k})p(D|M_{k})) / k∑l=1 p(M_{l})p(D|M_{l})
where p(M_{k}) is the prior probability that M_{k} is the true model and p(D|M_{k}) is the corresponding marginal model likelihood. In this study, the model space M is initially considered to be equal to the set of all possible combinations of explanatory variables. For a given set of N models, the results of the BMA approach depend on the specification of prior probability. When there is little prior information about the relative plausibility of the models considered, the assumption that all models are equally likely a priori is a reasonable "neutral" choice (Hoeting et al. 1999). The marginal model likelihood p(D | M_{k}) calculated by:
(6) p(D|M_{k}) = ∫p(D | θ_{k},M_{k})p(θ_{k}|M_{k})dθ_{k}
where θ_{k} is the vector of parameters in model M_{k}, p(θ_{k} | M_{k}) is the prior density of θ_{k} under model M_{k}, and p(D | θ_{k}, M_{k}) is the likelihood.
The posterior mean and variance of the BMA prediction can be defined as follows:
(7) E[y|D] = k∑k=1 E(y|D,M_{k})w_{k}
(8) Var[y|D] = k∑k=1 (Var[y|D,M_{k}]+E[y|D,M_{k}]^{2})w_{k}=E[y|D]^{2}
Although BMA is theoretically attractive, two practical difficulties need to be solved before its implementation. First, the results of BMA heavily rely on the model space and it is necessary to select a proper set of candidate models. One obvious approach is to include all possible models. However, when the number of possible models is large, the process of the BMA method becomes very time-consuming. Currently, two approaches are available to solve this problem. One approach is called the Occam's window method, which will be introduced in the following section. The other approach, the Markov chain Monte Carlo model composition (MC3), uses a Markov chain Monte Carlo method to directly approximate model space in equation (4) (see Madigan and York 1995). The implementation of the MC3 is very complicated and the Occam's window tends to be much faster computationally (Raftery et al. 1997). Thus, we adopted the Occam's window method.
The second difficulty associated with the BMA approach is that the marginal model likelihood may be analytically intractable especially in many cases where no closed form integral is available. Several alternative methods have been proposed in the literature to calculate or approximate the likelihood (Gibbons et al. 2008):
(i) The most popular approximation of the marginal likelihood is the Laplace approximation which can be calculated at the posterior mode or at the maximum likelihood parameter estimates; (ii) Another approximation of the marginal likelihood is the harmonic mean estimator. This estimator is relatively simple, but it is quite
unstable and sensitive to small likelihood values and hence is not recommended in this study; (iii) Kass and Wasserman (1995) derived the Bayesian Information Criterion as a rough but adequate approximation, and this BIC approximation was used in this paper.
Occam's Window Method
Because the number of terms in equation (4) can be very large, the Occam's window approach was used to discard models that predict much poorer than their competitors. The Occam's window algorithm was first developed by Madigan and Raftery (1994). Raftery et al. (1997) later applied this method to linear regression models. There are two basic principles under the Occam's window method. First, if a model predicts the data far poorer than the model which provides the best predictions, then this model should be excluded from the model space and no longer be considered. Those models not belonging to
(9) A' = {M_{k} : max_{l}{p(M_{l}|D)} / p(M_{k}|D) ≤C}
should be discarded in equation (4). The max_{l} {p (M_{1} | D)} is the model with the highest PMP and the value of C is determined by the data analyst. Usually, the value of C is equal to 20 and we also used C=20 in this study.
The second (optional) principle is called Occam's razor and this method is used to exclude complex models that receive less support from the data than any of their simpler submodels. Those models excluded from model space belong to
(10) B={M_{k}:∃M_{l} ∈M, M_{l} ⊂M_{k}, (p(M_{l} | D)) / (p(M_{k}|D)) >1}
This method can significantly reduce the number of models in the sum in equation (4). Typically the number of terms in equation (4) can be reduced to fewer than 20 models and often to as few as one to two models. The equation (4) can be rewritten as:
(11) p(y|D) = ∑M_{k}∈A p(y|M_{k},D)p(M_{k}|D)
where A = A' \ B ∈M .
To implement the proposed principles, this study adopted the leaps and bounds algorithm as the search strategy. For more details about the search strategy, interested readers should read Raftery's paper (1995).
Data Description
The dataset used for this study contains crash data collected on 338 rural interstate road sections in Indiana over a five-year time period from 1995 to 1999. The data have been investigated in previous studies (e.g., Anastasopoulos et al. 2008; Geedipally et al. 2012). Explanatory variables in table 1 are considered to construct a set of model space M for the Bayesian model averaging in the study. The available highway geometric design information includes length of section, minimum friction reading, pavement surface type, median width, presence of median barrier, presence of interior shoulder and interior shoulder width; while the available traffic information contains average daily traffic (ADT) of various vehicle types and truck percentage. During the fiveyear study period, there were 5,737 crashes. The summary statistics for the model variables are presented in table 1. As shown in this table, the observed crash frequency ranges from 0 to 329, and the mean frequency is 16.97. For a complete list of variables in this dataset, interested readers can consult (Washington et al. 2011).
Results and Discussion
This section describes the modeling results for Poisson regression and NB regression models using the BMA approach. Despite the fact that Poisson regression model has significant disadvantages and is now rarely used for analyzing crash data (Lord and Mannering 2010), this study considers this regression model as an example to demonstrate the usefulness of BMA. When analyzing the crash data, we consider the segment length as an offset term which means that the number of crashes is linearly proportional to the segment length. Thus, we have 8 candidate explanatory variables, and these variables can potentially result in 2^{8} =256
different models. For the model averaging strategies, all possible combinations of candidate explanatory variables are assumed to be equally likely a priori. The Occam's window method is implemented to exclude the models with poor prediction performance. The results show that the BMA approach can provide additional insight in interpreting the explanatory variables and averaging over the selected models provides better prediction performance than basing inference on a single model in the NB regression example. All statistical analyses were carried out in an R package.
Poisson Regression Model
The BMA approach was performed using the leaps and bounds algorithm and the results are provided in tables 2 and 3. Table 2 contains the selected models with the highest posterior probabilities using the Occam's window method. As shown in this table, only two models are selected based on the Occam's window method. The model with the higher posterior model probability accounts for 90% of the total posterior probability. Although the amount of model uncertainty is not significant for this case, there still exists model uncertainty to some degree. Compared to Model 1, Model 2 excludes the variable X_{7}, presence of interior shoulder. Table 3 lists the posterior means of β | D , standard deviations of β | D and posterior effect probabilities P( β ≠ 0 | D) for the coefficient associated with each variable using the BMA approach. The posterior effect probability P( β ≠ 0 | D) for one explanatory variable is obtained by summing the posterior model probabilities of models that contain that explanatory variable. Using the conventional statistical technique and assuming the full model, the estimates, standard errors and pvalues for the coefficients are also provided in table 3. Note that all standard deviations using the BMA approach are larger than their corresponding standard errors using the full model. This is because those parameter estimates and standard deviations directly incorporate model uncertainty (Hoeting et al. 1999). Another point to note is that the posterior effect probability of coefficient associated with variable X_{7} is 90%, and this is because only Model 1 in the model space includes variable X_{7} in the analysis.
TABLE 1 Summary Statistics of Characteristics for the Data
Variable | Minimum | Maximum | Mean(SD) | Sum |
---|---|---|---|---|
Number of crashes (5 years) X_{1}* | 0 | 329 | 16.97 (36.30) | 5737 |
Average daily traffic over the 5 years ( ADT) X_{2} | 9442 | 143422 | 30237.6 (28776.4) | |
Minimum friction reading in the road section over the 5-year period (FRICTION) X_{3} | 15.9 | 48.2 | 30.51 (6.67) | |
Pavement surface type (1: asphalt, 0: concrete) (PAVEMENT) X_{4} | 0 | 1 | 0.77 (0.42) | |
Median width (in feet) (MW) X_{5} | 16 | 194.7 | 66.98 (34.17) | |
Presence of median barrier (1: present, 0: absent) (BARRIER) X_{6} | 0 | 1 | 0.16 (0.37) | |
Presence of interior shoulder (1: present, 0 absent) (SHOULDER) X_{7} | 0 | 1 | 0.93 (0.26) | |
Interior shoulder width (in feet) (SW) X_{8} | 2.7 | 24.1 | 5.35 (2.80) | |
Percentage of trucks (average daily) (TRUCKS) X_{9} | 7.32% | 44.87% | 31.74% | |
Segment length (in miles) (L) X_{10} | 0.009 | 11.53 | 0.89 (1.48) | 300.09 |
* X_{1} is the serial number of variable number of crashes.
As shown in table 3, we can see that all explanatory variables except variable X_{7} are highly important when predicting the crash frequency. Both posterior effect probabilities and p-values indicate that there is a very strong evidence of an effect. The posterior effect probabilities are all 100% and p-values are less than 0.0001. The estimated coefficient values of the variables from both approaches demonstrate that: first, an increase in ADT is found to be linked to an increase in the crash frequency (although non-linear). Road sections with asphalt surface tend to have more crashes than sections with concrete surface. Second, the increases of other variables are found to be associated with a decrease in the crash frequency. For the explanatory variable X_{7}, the p-value indicates that the effect is significant and posterior effect probability concludes that there is a strong effect.
TABLE 2 Models with Highest Posterior Model Probabilities for Poisson Regression
Model number | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | PMP*** |
---|---|---|---|---|---|---|---|---|---|
1 | T* | T | T | T | T | T | T | T | 0.9 |
2 | T | T | T | T | T | F** | T | T | 0.1 |
* T denotes that the explanatory variable is considered in the corresponding model.
** F means that the explanatory variable is NOT considered in the corresponding model.
*** PMP is the posterior model probability.
TABLE 3 Comparison of BMA Results to Full Model for Poisson Regression
Variable | Bayesian model averaging | Full model | ||||
---|---|---|---|---|---|---|
Mean β | D | SD* β | D | P(≠0 | D) | Estimate | SE** | p value | |
Ln(ADT) X_{2} | 0.7069974 | 0.035122 | 100 | 0.706032 | 0.035035 | < 2e-16 |
FRICTION X_{3} | -0.02241202 | 0.002124 | 100 | -0.02246 | 0.00212 | < 2e-16 |
PAVEMENT X_{4} | 0.32118 | 0.044125 | 100 | 0.322125 | 0.044051 | 2.62E-13 |
MW X_{5} | -0.00342991 | 0.000758 | 100 | -0.0034 | 0.000752 | 6.28E-06 |
BARRIER X_{6} | -3.498241 | 0.357748 | 100 | -3.55938 | 0.315016 | < 2e-16 |
SHOULDER X_{7} | -0.9032813 | 0.437834 | 90 | -0.99939 | 0.340566 | 0.00334 |
SW X_{8} | -0.07734632 | 0.018532 | 100 | -0.07867 | 0.018197 | 0.0000154 |
TRUCKS X_{9} | -1.497767 | 0.164029 | 100 | -1.50363 | 0.163252 | < 2e-16 |
* SD is the standard deviation.
** SE means the standard error.
Negative Binomial Regression Model
The BMA approach was also applied to the NB regression model and the results are presented in tables 4 and 5. As the BMA results in table 4 indicate, the model with the highest posterior model probability accounts for 89.7% of the total posterior probability. Thus, we can conclude that there is a certain amount of model uncertainty. Compared with other selected models, Model 1 is in a dominant position and this model considers only two explanatory variables X_{6} and X_{9}. Table 5 gives the statistics of the coefficient associated with each variable using the BMA approach and the conventional statistical technique. For the two explanatory variables X_{6} and X_{9}, since the corresponding posterior effect probabilities are equal to 100% and the p-values are less than 0.001, both posterior effect probabilities and pvalues demonstrate that they have very strong effects on the crash frequency. For the other six explanatory variables, the results show that there is a qualitative difference between the two methods. If 0.01 is chosen as the significance level, then five variables (X_{2}, X_{4}, X_{5}, X_{7} and X_{8}) are rejected based on the reported pvalues. On the one hand, for the variables X_{2}, X_{4}, X_{5}, X_{7} and X_{8}, the p-values indicate that the effect is insignificant and the posterior effect probabilities conclude that there is a weak or no effect. On the other hand, for the variable X3 , the posterior effect probability indicates that the variable minimum friction has no effect on the crash frequency, while the corresponding p-value shows that the effect of minimum friction on the crash frequency is significant. Overall, the posterior effect probabilities of the four variables (X_{2} , X_{3} , X_{4} and X_{5}) imply weaker evidence for these effects given the corresponding p-values. This is because the p-values from the full model do not take account of model uncertainty, and the pvalues thus overstate the evidence for the effects (Hoeting et al. 1999). For some variables, the posterior mean of coefficients are 0, which means the results shrink the estimates toward zero (Hoeting et al. 1999).
Prediction Performance Comparisons
In problems where model uncertainty is present, BMA can yield prediction performance improvements over single selected models. This conclusion has been verified in various fields, as discussed above. In order to measure the applicability of BMA in predicting the crash data, the mean absolute deviance (MAD), the mean squared predictive error (MSPE) and the logarithmic score (LS) were used to compare the model prediction performance between BMA and the conventional statistical approach. The first two performance indexes (MAD and MSPE) were calculated as follows: MAD = 1/n n∑i=1 | y^_{1}-y_{1}| and MSPE=1/n n∑i=1 (y^_{1}-y_{1})^{2}, where n is the testing data size, and y_{i} and y^_{i} are the observed and predicted numbers of accidents for observation i , respectively (Oh et al. 2003). The LS was introduced by Good (1952) and previous studies (e.g., Hoeting et al. 1999; Madigan and Raftery 1994) have used the LS to measure the prediction performance of BMA. The observed data in this study are randomly split into two subsets. The first subset is referred to as the build data. We apply the BMA method and conventional statistical approach to this subset of data. Then, the second subset defined as the test data is used to measure the prediction performance. The number of sections used for building model is 238, and the number of sections used for testing is 100. The logarithmic score measures the prediction ability of an individual model, M_{i} , using the equation, − ∑y∈D^{T} Ln(p(y|M_{i},D^{B})). D^{B} is the build data and D^{T} is the test data. Then the prediction performance of BMA is examined using the equation, − ∑y∈D^{T} Ln(∑M_{i}∈A p(y|M_{i},D^{B})p(M_{i}|D^{B}) To make the comparison results more convincing, the random data separation process was repeated for four times and four scenarios were considered. Smaller MAD, MSPE and LS values indicate a better overall prediction performance for the given model.
TABLE 4 Models with Highest Posterior Model Probabilities for Negative Binomial Regression
Model number | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | PMP |
---|---|---|---|---|---|---|---|---|---|
1 | F | F | F | F | T | F | F | T | 0.897 |
2 | F | F | F | F | T | T | F | T | 0.052 |
3 | F | F | F | T | T | F | F | T | 0.051 |
TABLE 5 Comparison of BMA Results to Full Model for Negative Binomial Regression
Variable | Bayesian model averaging | Full model | ||||
---|---|---|---|---|---|---|
Mean | SD | pbd | Estimate | SE | p value | |
Ln(ADT) X2 | 0 | 0 | 0 | 0.345986 | 0.161449 | 0.032* |
FRICTION X3 | 0 | 0 | 0 | -0.02822 | 0.010116 | 0.005 |
PAVEMENT X4 | 0 | 0 | 0 | 0.393769 | 1.72E-01 | 0.022* |
MW X5 | -0.00026525 | 0.001242 | 5.1 | -0.00362 | 0.002089 | 0.083* |
BARRIER X6 | -2.830011 | 0.280228 | 100 | -3.08466 | 0.406729 | 3.35E-14 |
SHOULDER X7 | -0.02144128 | 0.142661 | 5.2 | -0.57188 | 0.502978 | 0.256* |
SW X8 | 0 | 0 | 0 | -0.02819 | 0.038981 | 0.470* |
TRUCKS X9 | -3.847502 | 0.627213 | 100 | -2.66588 | 0.779984 | 0.000631 |
* Insignificant at 0.01 level of significance.
Table 6 reports the MAD, MSPE, and LS values of the competing methods for the NB regression model. Bold values in table 6 are the smallest MAD, MSPE, and LS values among selected models. As shown in table 6, except the MAD value in scenario 1, all other goodness-of-fit values indicate that BMA can improve the model prediction accuracy for the test data. The difference in LS of 15.38 (between BMA and full model in scenario 1) can be viewed as an improvement in prediction performance. For example, if the average prediction probability, −∑y∈D^{T} p(y|M,D^{B})/100, is is 25%, and the corresponding logarithmic score is -Ln(0.25) = 1.386 . Then after implementing BMA, the new average prediction probability will be exp(-(1.386- (15.38/100)))=29.2%. This means that BMA can predict the number of crashes 4.2% more accurately than the method using the full model. Although the difference appears small for some of the measures, it is large enough that BMA should be selected over the full model (see a related discussion about GOF and biased models in Lord (2006)). In sum, in predicting the crash frequency of the test data, the proposed BMA model outperforms the conventional models based on MAD, MSPE and LS values. Thus, we conclude that BMA can improve the prediction performance for the NB regression model.
Discussion
In this study, the results showed that BMA can provide better prediction performance than the conventional statistical technique for the NB regression model. The findings suggest that BMA may be an appropriate methodology for predicting crash data; note that BMA should not be used for examining relationships between variables. Thus, further studies are needed to examine the applicability of BMAs to other types of crash model. In previous studies, the conventional statistical techniques were commonly used for traffic accident analysis partially because the built-in functions for crash models are available in many statistical software programs, and usually the analysis results can be easily interpreted and provide clear and valuable information for traffic safety analysts in order to make further inferences. In contrast, the advantage of BMA is that this model overcomes the problem in accounting for model uncertainty by conditioning, not on a single "best" regression model, but on the entire statistical regression model space, and the output of BMA combines inferences and predictions from multiple candidate models. For the NB regression example, a total of 3 models were selected. Another advantage of BMA is that, in the presence of model uncertainty, it can yield prediction performance improvements over single selected models. Despite the above merits of BMA, there are a few limitations associated with this model. First, when the number of explanatory variables in crash data is large, for instance, 20 explanatory variables are included in the analysis, then, the application of the Occam's window method is very time-consuming because there are a total of 2^{20} = 1,048,576 candidates model in the complete model space. Therefore, the efficiency of BMA can be compromised by the number of explanatory variables examined in the analysis. Second, after Occam's window method is applied, in our experience, the number of terms in equation (4) can be reduced to fewer than 20, often to as few as 1 or 2. For example, as illustrated in tables 2 and 4, two or three models are selected based on Occam's window method. This finding may be different from the typical application of BMA and thus understate the value of BMA. To better demonstrate its usefulness, some other statistical models (i.e., Poisson-lognormal, PoissonWeibull, etc.) for analyzing crash data could be used. Another way to increase model uncertainty is to implement BMA without using Occam's window.
TABLE 6 Performance Index Values for Negative Binomial Regression Models
Scenario | Performance Index | Full model | Model with significant variables* | BMA |
---|---|---|---|---|
1 | MAD | 5.82 | 6.35 | 5.85 |
MSPE | 100.63 | 118.88 | 91.21 | |
LS | 292.28 | 288.89 | 276.9 | |
2 | MAD | 7.99 | 8 | 7.76 |
MSPE | 285.77 | 289.96 | 266.41 | |
LS | 311.65 | 308.61 | 298.57 | |
3 | MAD | 8.91 | 8.4 | 7.6 |
MSPE | 357.49 | 314.37 | 231.53 | |
LS | 314.75 | 312.98 | 304.16 | |
4 | MAD | 5.48 | 5.44 | 5.1 |
MSPE | 84.19 | 83.55 | 64.89 | |
LS | 281 | 277.65 | 266.57 |
* Model with significant variables at a significance level of 0.05.
Conclusions
This paper has documented the application of the Bayesian model averaging approach for predicting motor vehicle crashes. Crash data collected on rural interstate road sections in Indiana were analyzed using the proposed approach. Poisson and NB regression models were used to establish the relationship between traffic accident frequency and highway geometric variables and traffic characteristics. The results of this study revealed that the model uncertainty problem can be solved or at least minimized using BMA; and, in the presence of model uncertainty, the proposed approach can provide better prediction performance than single models selected by conventional statistical techniques for the NB models. This study also presented a new methodology in predicting the traffic accident frequency. For future work, since the crash data used in this study were collected at rural interstate roads, an application of BMA to other types of data would be meaningful. Moreover, it would also be interesting to examine the results of applying BMA to more complex crash prediction models, such as the newly introduced Negative Binomial-Lindley model (Geedipally et al. 2012). Finally, this study did not apply the Markov chain Monte Carlo model composition method to directly approximate the terms in equation (4). The Occam's window method and the Markov chain Monte Carlo model composition method should be compared, and their influence on the modeling results investigated.
Acknoledgements
The authors would like to thank Dr. Fred Mannering from Purdue University for graciously providing us with the Indiana data.
References
Abdelwahab, H. T. and M. A. Abdel-Aty. 2002. Artificial Neural Networks and Logit Models for Traffic Safety Analysis of Toll Plazas. Transportation Research
Record 1784:115–125.
Anastasopoulos, P.C., A. Tarko, and F. Mannering. 2008. Tobit Analysis of Vehicle Accident Rates on Interstate Highways. Accident Analysis and Prevention
40(2):768–775.
Chang, L.Y. 2005. Analysis of Freeway Accident Frequencies: Negative Binomial Regression versus Artificial Neural Network. Safety Science 43(8):541–557.
Duan, Q., N.K. Ajami, and S. Sorooshian. 2007. Multimodel ensemble hydrologic prediction using Bayesian model averaging. Advances in Water Resources
30(5):1371–1386.
Geedipally, S.R., D. Lord, and S.S. Dhavala. 2012. The Negative-Binomial-Generalized-Lindley Generalized Linear Model: Characteristics and Application using Crash Data. Accident Analysis & Prevention forthcoming.
Gibbons, J.M., G.M. Cox, A.T.A. Wood, J. Craigon, S.J. Ramsden, D. Tarsitano, and N.J.M. Crout. 2008. Applying Bayesian averaging to mechanistic models:
an example and comparison of methods. Environmental Modelling and Software 23(8):973–985.
Good, I.J. 1952. Rational decisions. Journal of the Royal Statistical Society Series B 14(1):107-114.
Haleem, K., M. A. Abdel-Aty, and J. Santos. 2010. Multiple Applications of Multivariate Adaptive Regression Splines Technique to Predict Rear-End Crashes at Unsignalized Intersections. Transportation Research Record 2165: 33–41.
Hauer, E. 1997. Observational Before–After Studies in Road Safety. Pergamon Press, Elsevier Science Ltd., Oxford, England.
Hilbe, J.M., 2011. Negative Binomial Regression, 2nd Edition, Cambridge University Press, Cambridge, UK.
Hoeting, J.A., D. Madigan, A.E. Raftery, and C.T. Volinsky. 1999. Bayesian model averaging: a tutorial. Statistical Science 14(4):382–417.
Kass, R.E. and L. Wasserman. 1995. A reference Bayesian test for nested hypotheses and its relationship to the Schwarz Criterion. Journal of the American Statistical Association 90(431):928–934.
Li, G. and J. Shi. 2010. Application of Bayesian model averaging in modeling long-term wind speed distributions. Renewable Energy 35(6):1192–1202.
Li, X., D. Lord, Y. Zhang, and Y. Xie. 2008. Predicting Motor Vehicle Crashes Using Support Vector Machine Models. Accident Analysis and Prevention 40(4):1611– 1618.
Lord, D. 2006. Modeling Motor Vehicle Crashes Using Poisson-Gamma Models: Examining the Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed Dispersion Parameter. Accident Analysis and Prevention 38(4):751–766.
Lord, D., and F.L. Mannering. 2010. The Statistical Analysis of Crash-frequency Data: A Review and Assessment of Methodological Alternatives. Transportation Research Part A 44(5):291–305.
Madigan, D. and A.E. Raftery. 1994. Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal of the American Statistical Association 89(428):1535–1545.
Madigan, D. and J. York. 1995. Bayesian graphical models for discrete data. International Statistical Review 63:215–232.
Malyshkina, N.V., F.L. Mannering, and A.P. Tarko. 2009. Markov Switching Negative Binomial Models: an Application to Vehicle Accident Frequencies. Accident Analysis and Prevention 41(2):217–226.
Miaou, S.P. 1994. The Relationship between Truck Accidents and Geometric Design of Road Sections: Poisson versus Negative Binomial Regressions. Accident
Analysis and Prevention 26(4):471–482.
Miaou, S.P. and D. Lord. 2003. Modeling Traffic Crashflow Relationships for Intersections: Dispersion Parameter, Functional Form, and Bayes versus Empirical
Bayes. Transportation Research Record 1840:31–40.
Oh, J., C. Lyon, S.P. Washington, B.N. Persaud, and J. Bared. 2003. Validation of the FHWA Crash Models for Rural Intersections: Lessons Learned. Transportation Research Record 1840:41-49.
Park, B.J. and D. Lord. 2009. Application of Finite Mixture Models for Vehicle Crash Data Analysis. Accident Analysis and Prevention 41(4):683–691.
Pei, X., S.C. Wong, and N.N. Sze. 2011. A joint-probability approach to crash prediction models. Accident Analysis and Prevention 43(3):1160–1166.
Raftery, A.E. 1995. Bayesian model selection in social research. Sociological Methodology 25:111–163.
Raftery, A.E., D. Madigan, and J.A. Hoeting. 1997. Bayesian model averaging for linear regression models. Journal of the American Statistical Association
92(437):179–191.
Raftery, A.E., T. Gneiting, F. Balabdaoui, and M. Polakowski. 2005. Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review
133:1155–1174.
Viallefont, V., A.E. Raftery, and S. Richardson. 2001. Variable selection and Bayesian model averaging in case-control studies. Statistics in Medicine 20:3215–
3230.
Washington, S., M. Karlaftis, and F. Mannering. 2011. Statistical and Econometric Methods for Transportation Data Analysis. Second edition, Chapman and Hall/ CRC, Boca Raton, FL.
Xie, Y., D. Lord, and Y. Zhang. 2007. Predicting Motor Vehicle Collisions Using Bayesian Neural Network Models: An Empirical Analysis. Accident Analysis &
Prevention 39(5):922–933.
Zou, Y., D. Lord, and Y. Zhang, 2012. Analyzing highly dispersed crash data using the Sichel generalized additive models for location, scale and shape. Working paper.