dc.description.abstract |
Road traffic crashes have become one of the leading causes of death worldwide. Bangladesh, a developing country, is rapidly becoming a major victim of road accidents. Due to traffic crashes, different types of injuries eventuate depending on the severity level of the crashes. Double vehicle crashes are the most critical type of road accidents that have the potential to cause serious injuries and fatalities. Unfortunately, Bangladesh is still in nascent stage in dealing with road accidents, especially for double vehicle crashes. A precise prediction of crash severity in road accidents significantly improves traffic safety. Therefore, there has recently been a tactical shift among safety researchers to apply machine learning (ML) algorithms to estimate crash severity due to their superior predictive ability. Although there have been an increasing number of applications of machine learning methods in crash severity research, however there is a limited applicability of these methods in estimating the severity of a double vehicle crashes. As a result, this study aimed to apply machine learning algorithms in predicting double vehicle crash severities in the context of Bangladesh.
The aim of this study is to compare the predicted performance of numerous machine learning and traditional statistical regression techniques in modeling double vehicle crash severities, as well as to identify the contributing components and how they impact crash severity prediction. Using Dhaka's most recent crash record collected from Accident Research Institute (ARI), BUET (2017-2020), this study employed classification and regression tree, support vector machine, random forest, adaptive boosting, logistic regression, and soft voting classifier-based hybrid models. This study compared the performance of logistic regression and other machine learning classifiers using the most commonly known evaluation criteria: Accuracy (ACC), Receiver Operating Characteristics (ROC) Curve, and Area Under the Curve (AUC) Value. The comparison of predictive performance revealed that the hybrid model, built on logistic regression, random forest, and adaptive boosting, outperforms other individual models with a subset of twenty explanatory variables and with an accuracy of 75% and an AUC score of 0.71. With the same subset of features, random forest performs better with an accuracy of 70% and an AUC score of 0.69 within the individual models. This study uses the SHAP (Shapley Additive Explanation) methodology to determine how well the features contribute to the severity prediction, thus finding influential factors. SHAP Global Feature Importance represents the marginal contribution of each feature in the prediction. SHAP Local Explanation identifies how the contributing factors affect double vehicle crash severities. According to the SHAP (Shapley Additive Explanation) technique, the most significant elements of double vehicle crash severities are the day of the week, vehicle type, time of day, vehicle maneuver, road geometry and they have important contribution in predicting crash severities by an average of 10.2, 4.8, 4.6, 3.8 and 2.9 percentage points respectively. This means that the factor day of week alone contributed in predicting whether the double vehicle crash severity would be fatal or not by an average of 10.2 percentage points. In addition, vehicle type is another most critical variables in predicting double vehicle crash severities whether it would be fatal or not by an average of 4.8 percentage points. |
en_US |