An Automatic Detection Method for Traffic Accidents Based on ADASYN-XGBoost
-
摘要: 基于数据驱动的交通事故自动检测对道路事故的及时救援与降低事故影响具有重要作用。为解决道路交通事故自动检测中的样本不均衡问题,研究了混合自适应过采样技术与极限梯度提升树算法的交通事故自动检测方法(ADASYN-XGBoost)。其中,为从不均衡的交通事故样本中有效挖掘数据的时空特征与事故发生之间的内在关联规律,构建了初始特征变量组合,引入自适应合成过采样方法(adaptive synthetic oversampling method,ADASYN)来平衡事故类与非事故类的样本数量,以增强训练数据的质量;其次,为提高检测效果,构建了基于XGBoost的交通事故检测模型,利用该模型对增强后的数据样本进行特征筛选;最后,为获取最佳参数组合,采用了贝叶斯优化算法对XGBoost进行参数的快速标定。本文使用波特兰高速公路数据集对ADASYN-XGBoost方法进行模型验证与实证研究。结果表明:与先进的基准模型相比,ADASYN-XGBoost的各项检测指标均最优,其F1分数达到94.47%且误检率低至8.95%。在模型训练样本数为2800,500(18%的初始样本量),150(5%的初始样本量)时,ADASYN-XGBoost的F1分数分别为94.47%,88.89%,81.93%。在进一步的消融实验中,均衡正负样本后各基准模型的性能指标提高了2.68%~44.85%。本文提出的方法能够有效解决道路交通事故检测中的样本不均衡问题,同时也为道路交通安全预防与事故处理等提供了技术保障。Abstract: A data-driven approach for automatic detection of road traffic accidents plays an important role in timely rescue and reducing the impact of road accidents. In order to solve the sample imbalance problem in automatic detection of traffic accidents a hybrid adaptive oversampling technique and extreme gradient boosting tree algorithm (ADASYN-XGBoost) is studied. In particular, to effectively mine the intrinsic correlation law between spatio-temporal feature of the data and accident occurrence form the unbalanced traffic accident samples. The initial combinations of feature variable are set. And to improve the quality of the training data, the adaptive synthetic oversampling method (ADASYN) is introduced to balance the number of samples between the accident class and the non-accident class. To improving the detection effect, a traffic accident detection model based on extreme gradient boosting (XGBoost) is developed, which is utilized to filter the features of the enhanced data samples. Finally, to obtain the best combination of parameters, a Bayesian optimization algorithm is used to quickly calibrate the parameters of XGBoost. In this paper, the ADASYN-XGBoost method is validated and investigated using the Portland Freeway dataset. The results show that ADASYN-XGBoost optimizes all detection metrics compared to the state-of-the-art benchmark model. The F1 score reaches 94.47% and the false detection rate is as low as 8.95%. The F1 scores of ADASYN-XGBoost are 94.47%, 88.89%, and 81.93% when the number of model training samples are 2800, 500 (18% of the initial sample size), and 150 (5% of the initial sample size). In further ablation experiments, the performance indexes of each benchmark model after equalizing positive and negative samples are improved by 2.68% to 44.85%. The method proposed in this paper can effectively solve the sample imbalance problem in detection of road traffic accidents, which also provides technical support for road traffic safety prevention and accident management.
-
表 1 样本初始特征变量集及表示方法
Table 1. Initial feature variable set and its representation method
特征变量 表示符号 序号 时间 位置 参数 事故发生1 min前 上游检测器 交通流/交通速度/占有率 b1_up_vol/sped/ocup 1/2/3 下游检测器 交通流/交通速度/占有率 b1_dn_vol/sped/ocup 4/5/6 事故发生2 min前 上游检测器 交通流/交通速度/占有率 b2_up_vol/sped/ocup 7/8/9 下游检测器 交通流/交通速度/占有率 b2_dn_vol/sped/ocup 10/11/12 事故发生3 min前 上游检测器 交通流/交通速度/占有率 b3_up_vol/sped/ocup 13/14/15 下游检测器 交通流/交通速度/占有率 b3_dn_vol/sped/ocup 16/17/18 事故发生1 min后 上游检测器 交通流/交通速度/占有率 b1_up_vol/sped/ocup 19/20/21 下游检测器 交通流/交通速度/占有率 a1_dn_vol/sped/ocup 22/23/24 事故发生2 min后 上游检测器 交通流/交通速度/占有率 a2_up_vol/sped/ocup 25/26/27 下游检测器 交通流/交通速度/占有率 a2_dn_vol/sped/ocup 28/29/30 事故发生3 min后 上游检测器 交通流/交通速度/占有率 a3_up_vol/sped/ocup 31/32/33 下游检测器 交通流/交通速度/占有率 a3_dn_vol/sped/ocup 34/35/36 事故发生时刻 上游检测器 交通流/交通速度/占有率 now_up_vol/sped/ocup 37/38/39 下游检测器 交通流/交通速度/占有率 now_dn_vol/sped/ocup 40/41/42 事故发生时刻 上游检测器 交通流/交通速度/占有率预测值 pred_up_vol/sped/ocup 43/44/45 下游检测器 交通流/交通速度/占有率预测值 pred_dn_vol/sped/ocup 46/47/48 事故发生时刻 上游、下游 交通流/交通速度/占有率的差值 up_dn_vol/sped/ocup 49/50/51 事故发生时刻 上游检测器 3参数预测值与检测值的差值 up_now_pred_vol/sped/ocup 52/53/54 下游检测器 3参数预测值与检测值的差值 dn now pred vol/sped/ocup 55/56/57 表 2 超参数调优说明
Table 2. Hyperparameter tuning instructions
方法 调优过程及结果 LR LR模型无超参数 SVM 核函数采用高斯核 RF 子估计器数量为100,有放回采样 RSKNN RSKNN采用100个K近邻数为5的KNN为基学习器,子空间最大采样率为1.0,采用有放回采样;KNN难以处理高维特征,每个样本仅使用前5个重要特征 BPNN 网络结构为搭配ReLU激活函数的2层神经网络,采用2个隐藏层单元数为30的线性层; 采用交叉熵损失函数.Adam优化器、学习率0.01和200个epochs E-SVM-KNN SVM采用高斯核函数; KNN近邻数为5 FA-WRF 因子分析(factor analysis, FA)提取7个特征, WRF的子空间维度为3, 子估计器数量为100 SASYNO-RF-RSKNN RF仅用于提取重要特征,无重要超参数RSKNN超参数设置同上 ADASYN-XGBoost 通过贝叶斯方法进行超参数搜索,XGBoost的最优参数组合为:树最大深度为6,学习率为0.06, 最小子叶权重为2.46, 正则化系数(gamma)为0.125, 子采样率(subsample)为0.79。面向小样本情形时,仅使用前8个重要特征 表 3 数据增强对不同模型的性能提升对比
Table 3. Comparison of the performance improvement of different models by data augmentation
模型 正负样本比 AACC /% Pprecision /% DDR /% FFDR /% MMCC/% F1/% F1提升比/% LR 1:8.62 94.33 100.00 45.63 54.37 65.36 62.40 44.85 1:1.05 97.93 92.14 88.19 11.82 88.62 89.83 SVM 1:8.62 97.95 100.00 80.35 28.26 79.84 88.66 4.96 1:1.05 98.60 97.97 88.37 11.63 92.23 92.76 RF 1:8.62 95.68 97.53 60.42 39.58 74.67 74.12 13.72 1:1.05 97.09 97.91 73.78 26.22 83.47 83.87 RSKNN 1:8.62 94.24 100.00 52.08 68.59 69.91 68.50 22.72 1:1.05 97.49 97.50 81.25 18.75 87.70 88.64 BPNN 1:8.62 97.84 98.74 79.93 20.07 87.65 88.08 7.20 1:1.05 98.84 97.92 90.55 9.45 93.53 94.07 XGBoost 1:8.62 98.45 97.53 87.20 12.80 91.34 91.96 2.83 1:1.05 98.95 98.60 91.05 8.95 94.10 94.47 表 4 以不同特征数为输入的模型性能
Table 4. Model performance with different number of features as input
特征数 AACC Pprecision DDR FFDR MMCC F1 57 0.979 9 0.942 9 0.846 2 0.153 9 0.882 4 0.891 9 55 0.974 9 0.914 3 0.820 5 0.179 5 0.852 6 0.864 9 50 0.977 4 0.894 7 0.871 8 0.128 2 0.870 7 0.883 1 45 0.979 9 0.942 9 0.846 2 0.153 9 0.882 4 0.891 9 40 0.982 4 0.970 6 0.846 2 0.153 9 0.897 0 0.904 1 35 0.982 4 0.944 4 0.871 8 0.128 2 0.897 9 0.906 6 30 0.989 5 0.986 0 0.910 5 0.089 5 0.941 0 0.944 7 25 0.984 9 0.946 0 0.897 4 0.102 6 0.913 1 0.921 1 20 0.982 4 0.944 0 0.871 8 0.128 2 0.897 9 0.906 7 -
[1] 赵超, 谢天, 辛国容, 等. 基于Seq2Seq自编码器模型的交通事故实时检测与评价[J]. 控制与决策, 2022, 37(8): 2141-2148. https://www.cnki.com.cn/Article/CJFDTOTAL-KZYC202208026.htmZHAO C, XIE T, XIN G R, et al, Real-time traffic accident detection and evaluation based on Seq2Seq and auto-encode model[J]. Control and Decision, 2022, 37(8): 2141-2148. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-KZYC202208026.htm [2] CHEN J Y, WU P, LI J L, et al. More robust and better: Automatic traffic incident detection based on XGBoost[C]. 5th International Symposium on Traffic Transportation and Civil Architecture, Suzhou, China: CRC Press, 2023. [3] 李红伟, 姜桂艳, 李素兰, 等. 基于突变强度的交通事件自动检测算法[J]. 交通运输系统工程与信息, 2019, 19(5): 59-65. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201905009.htmLI H W, JIANG G Y, LI S L, et al. An automatic incident detection algorithm based on mutation strength[J]. Journal of Transportation Systems Engineering and Information Technology, 2019, 19(5): 59-65. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201905009.htm [4] 龙琼, 胡列格, 张谨帆, 等. 基于尖点突变理论模型的交通事故检测[J]. 土木工程学报, 2015, 48(9): 112-116. https://www.cnki.com.cn/Article/CJFDTOTAL-TMGC201509017.htmLONG Q, HU L G, ZHANG J F, et al. Traffic incident detection based on the cusp catastrophe theory model[J]. China Civil Engineering Journal, 2015, 48(9): 112-116. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-TMGC201509017.htm [5] 尹春娥, 陈宽民, 万继志. 基于小波方程的高速公路交通事故自动检测方法[J]. 中国公路学报, 2014, 27(12): 106-112. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201412018.htmYIN C E, CHEN K M, WAN J Z. Automatic detection method for expressway traffic accidents based on wavelet equation[J] China Journal of Highway and Transport, 2014, 27 (12): 106-112. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL201412018.htm [6] LI J L, SUN L J, LI Y S, et al. Rapid prediction of acid detergent fiber content in corn stover based on NIR-spectroscopy technology[J]. Optik, 2019(180): 34-45. [7] CHEU R L, RITCHIE S G. Automated detection of lane-blocking freeway incidents using artificial neural networks[J]. Transportation Research Part C: Emerging Technologies, 1995, 3(6): 371-388. doi: 10.1016/0968-090X(95)00016-C [8] ISHAK S, AL-DEEK H. Performance of automatic ANN-based incident detection on freeways[J]. Journal of Transportation Engineering, 1999, 125(4): 281-290. doi: 10.1061/(ASCE)0733-947X(1999)125:4(281) [9] SRINIVASAN D, JIN X, CHEU R L. Adaptive neural network models for automatic incident detection on freeways[J]. Neurocomputing, 2005(64): 473-496. [10] YUAN F, CHEU R L. Incident detection using support vector machines[J]. Transportation Research Part C: Emerging Technologies, 2003, 11(3-4): 309-328. [11] LIU Q, LU J, CHEN S, et al. Multiple Naïve bayes classifiers ensemble for traffic incident detection[J]. Mathematical Problems in Engineering, 2014(16): 383671. [12] XIAO J. SVM and KNN ensemble learning for traffic incident detection[J]. Physica A: Statistical Mechanics and its Applications, 2019(517): 29-35. [13] JIANG H, DENG H. Traffic incident detection method based on factor analysis and weighted random forest[J]. IEEE Access, 2020(8): 168394-168404. [14] DOGRU N, SUBASI A. Traffic accident detection using random forest classifier[C]. 15th Learning and Technology Conference(L&T), Jeddah, Saudi Arabia: IEEE, 2018. [15] PARSA A B, TAGHIPOUR H, DERRIBLE S, et al. Real-time accident detection: coping with imbalanced data[J]. Accident Analysis & Prevention, 2019(129): 202-210. [16] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002(16): 321-357. [17] XIE T, SHANG Q, YU Y. Automated traffic incident detection: Coping with imbalanced and small datasets[J]. IEEE Access, 2022(10): 35521-35540. [18] HE H, BAI Y, GARCIA E A, et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning[C]. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China: IEEE, 2008. [19] CHEN T, GUESTRIN C. Xgboost: A scalable tree boosting system[C]. The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA: ACM, 2016. [20] 肖宇, 赵建有, 叱干都, 等. 基于XGBoost的短时出租车速度预测模型[J]. 交通信息与安全, 2022, 40(3): 163-170. doi: 10.3963/j.jssn.1674-4861.2022.03.017XIAO Y, ZHAO J Y, CHI G D, et al. A short-term prediction model for taxi speed based on XGBoost[J] Journal of Transport Information and Safety, 2022, 40(3): 163-170. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.03.017