Abstract:
Road traffic accidents are a major cause of fatalities in developing countries like Bangladesh, with the country's accident fatality rate significantly exceeding that of neighboring countries. By leveraging police reported accident data from the Accident Research Institute (ARI) at BUET, this study conducts a comprehensive analysis of the determinants of accident severity (AS) in Bangladesh using machine learning (ML) techniques. However, the dataset has been clustered based on area (urban/rural), vehicle involvement (single/two vehicles) and road class (Highways, other roads). Previous studies analyzing AS primarily use traditional statistical models, which are limited by assumptions about data distribution and linear relationships. These studies rarely employ explainable AI methods or cluster-wise analysis to identify significant factors within each cluster. To address these limitations, this study employed Explainable Artificial Intelligence approaches: permutation importance, and SHapley Additive exPlanations (SHAP) method across clusters, using tree-based Random Forest (RF), Extreme Gradient Boosting (XGBoost); classification-based K-Nearest Neighbor (KNN); and hybrid Stack model ML approaches. Analysis depicts that, stack model most effectively capture the complex structure of data for all. The result of the study indicates that, vehicle type, collision type, district, divider, surface quality, location type, time and driver age are the key variables for predicting AS. Based on further analysis this research concludes that common collision scenarios on Bangladeshi roads include hit pedestrian, head on collision, collision between heavy and light vehicles, and incidents involving drivers aged between 31 and 45 years. Based on the analysis, this study provides valuable insights for key organizations in Bangladesh, including the Bangladesh Road Transport Authority (BRTA), Roads and Highway Department (RHD), Bangladesh Police (BP), and Local Government Engineering Department (LGED).