留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

交通场景中基于注意力机制神经网络的人群计数

王丽园 姚韵涛 贾洋 肖进胜 李必军

王丽园, 姚韵涛, 贾洋, 肖进胜, 李必军. 交通场景中基于注意力机制神经网络的人群计数[J]. 交通信息与安全, 2023, 41(6): 107-113. doi: 10.3963/j.jssn.1674-4861.2023.06.012
引用本文: 王丽园, 姚韵涛, 贾洋, 肖进胜, 李必军. 交通场景中基于注意力机制神经网络的人群计数[J]. 交通信息与安全, 2023, 41(6): 107-113. doi: 10.3963/j.jssn.1674-4861.2023.06.012
WANG Liyuan, YAO Yuntao, JIA Yang, XIAO Jinsheng, LI Bijun. Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes[J]. Journal of Transport Information and Safety, 2023, 41(6): 107-113. doi: 10.3963/j.jssn.1674-4861.2023.06.012
Citation: WANG Liyuan, YAO Yuntao, JIA Yang, XIAO Jinsheng, LI Bijun. Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes[J]. Journal of Transport Information and Safety, 2023, 41(6): 107-113. doi: 10.3963/j.jssn.1674-4861.2023.06.012

交通场景中基于注意力机制神经网络的人群计数

doi: 10.3963/j.jssn.1674-4861.2023.06.012
基金项目: 

湖北省重点研发计划项目 2023BAB022

中国交通建设集团有限公司科技研发项目 2019-ZJKJ-ZDZX02

详细信息
    作者简介:

    王丽园(1980—),硕士,正高级工程师. 研究方向:图像与视频处理、智慧交通技术. Email: 13397123890@126.com

    通讯作者:

    肖进胜(1975-),博士,副教授,研究方向:图像与视频处理. Email: xiaojs@whu.edu.cn

  • 中图分类号: TP29

Crowd Count Neural Network Based on Attention Mechanism in Traffic Scenes

  • 摘要: 人群计数是计算机视觉领域的重要任务。交通场景中的人群计数任务对于维护公众出行安全、实现交通智能化具有重要作用。公共交通场景中通常存在行人相互遮挡、背景复杂等现象,给人群计数带来了困难。为了实现高精度的人群计数,研究了基于注意力机制的人群密度估计网络。网络包含3个部分:特征提取模块通过生成多尺度的特征图,增强网络的特征表达能力,提高网络对行人大小变化的鲁棒性;注意力模块通过抑制背景噪声响应,强化人群特征响应,生成特征图中人群区域的概率分布,增强网络区分人群区域与背景区域的能力;密度估计模块在注意力机制的约束下指导网络回归高分辨率的人群密度图,提高网络对人群区域的敏感性。设计了基于背景感知的结构损失函数,能够降低模型的错误识别率,提高模型的计数准确率;采用多级监督机制指导网络进行学习,能够帮助梯度反向传播和减少过度拟合,进一步提高网络的人群计数精度。在公共数据集ShanghaiTech上进行了实验,实验结果表明:与目前最先进的算法相比,在ShanghaiTechA和ShanghaiTechB数据集上,平均绝对误差(mean absolute error,MAE)分别提高了2.4%和1.5%,均方误差(mean square error,MSE)分别提高了3.3%和0.9%,证明了提出的算法在人群拥挤和稀疏的场景中均有更好的准确性和鲁棒性。同时,在真实场景数据集上进行了实验,MAE=7.7,MSE=12.6,证明了提出的算法具有良好的实用性。

     

  • 图  1  本文方法的体系结构

    Figure  1.  The architecture of the proposed method

    图  2  注意力模块具体结构

    Figure  2.  Specific structure of attention module

    图  3  密度估计模块具体结构

    Figure  3.  Specific structure of density map estimation module

    图  4  模型在ShanghaiTech和UCF-QNRF上的估计密度图的可视化结果

    Figure  4.  Visualization of the estimated density map on ShanghaiTech and UCF-QNRF

    图  5  真实场景下的实验结果

    Figure  5.  Experimental results in real scenes

    图  6  注意力图的可视化结果

    Figure  6.  Visualization of the attention map

    图  7  有无注意力模块时的估计密度图可视化

    Figure  7.  Visualization of the estimated density map with and without the attention module

    表  1  在ShanghaiTech和UCF-QNRF上的性能比较

    Table  1.   Performance comparison on ShanghaiTech and UCF-QNRF

    方法 ShanghaiTechA ShanghaiTechB UCF-QNRF
    MAE MSE MAE MSE MAE MSE
    MCNN[15] 110.2 173.2 26.4 41.3 277.0 426.0
    CSRNet[16] 68.2 115.0 10.6 16.0
    CAN[17] 62.3 100.0 7.8 12.2 107.0 183.0
    S-DCNet[21] 58.3 95.0 6.7 10.7 104.4 176.1
    Bayesian[22] 62.8 101.8 7.7 12.7 88.7 154.8
    本文方法 56.9 91.8 6.6 10.6 90.8 155.1
    下载: 导出CSV

    表  2  在ShanghaiTechB上的消融实验结果

    Table  2.   Ablation results on ShanghaiTechB ShanghaiTechB

    网络 ShanghaiTechB
    MAE MSE
    无注意力模块 7.8 12.7
    有注意力模块 6.8 10.6
    下载: 导出CSV
  • [1] 张宇倩, 李国辉, 雷军, 等. FF-CAM: 基于通道注意机制前后端融合的人群计数[J]. 计算机学报, 2021, 44(2): 304-317. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm

    ZHANG Y Q, LI G H, LEI J, et al. FF-CAM: crowd counting based on frontend-backend fusion through channel-attention mechanism[J]. Chinese Journal of Computers, 2021, 44 (2): 304-317. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202102004.htm
    [2] 杜培德, 严华. 基于多尺度空间注意力特征融合的人群计数网络[J]. 计算机应用, 2021, 41(2): 537-543. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202102035.htm

    DU P D, YAN H. Crowd counting network based on multi-scale spatial attention feature fusion[J]. Computer Applications, 2021, 41(2): 537-543. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY202102035.htm
    [3] WANG Z, CHEN J, HOI S. Deep learning for image super-resolution: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10): 3365-3387. doi: 10.1109/TPAMI.2020.2982166
    [4] LEIBE B, SEEMANN E, SCHIELE B. Pedestrian detection in crowded scenes[C]. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR' 05), San Diego, CA, USA. IEEE, 2005.
    [5] LI M, ZJANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection[C]. 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA. IEEE, 2008.
    [6] CHEN K, LOY C C, GONG S, et al. Feature mining for localised crowd counting[C]. British Machine Vision Conference, Guildford, Surrey, UK. 2012, 1(2): 3.
    [7] LOWE D G. Object recognition from local scale-invariant features[C]. Proceedings of the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece. IEEE, 1999.
    [8] OJALA T, PIETIKAINEN M, MAENPAA T. Gray-scale and rotation invariant texture classification with local binary patterns[C]. Computer Vision-ECCV 2000: 6th European Conference on Computer Vision Dublin, Ireland. Springer, 2000.
    [9] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]. 2005 IEEE computer society conference on computer vision and pattern recognition(CVPR'05), San Diego, CA, USA. IEEE, 2005.
    [10] PARAGIOS N, RAMESH V. A MRF-based approach for real-time subway monitoring[C]. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001. IEEE, 2001.
    [11] TIAN Y, SIGAL L, BADINO H, et al. Latent gaussian mixture regression for human pose estimation[C]. Asian Conference on Computer Vision, Berlin, Heidelberg: Springer, 2010.
    [12] LEMPITSKY V, ZISSERMAN A. Learning to count objects in images[OL]. (2010-12-06)[2023-05-15]. https://www.robots.ox.ac.uk/~vgg/publications/2010/Lempitsky10b/lempitsky10b.pdf
    [13] PHAM V Q, KOZAKAYA T, YAMAGUCHI O, et al. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation[C]. IEEE International Conference on Computer Vision, Santiago, Chile: IEEE, 2015.
    [14] 肖进胜, 申梦瑶, 江明俊, 等. 融合包注意力机制的监控视频异常行为检测[J]. 自动化学报, 2022, 48(12): 2953-2961. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202212007.htm

    XIAO J S, SHEN M Y, JIANG M J, et al. Abnormal behavior detection algorithm with video-bag attention mechanism in surveillance video[J]. Acta Automatica Sinica, 2022, 48 (12): 2953-2961. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO202212007.htm
    [15] ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network[C]. IEEE conference on computer vision and pattern recognition, Las Vegas, USA: IEEE, 2016.
    [16] LI Y, ZHANG X, CHEN D. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes[C]. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA: IEEE, 2018.
    [17] LIU W, SALZMANN M, FUA P. Context-aware crowd counting[C]. Conference on Computer Vision and Pattern Recognition, Long Beach, USA: IEEE, 2019.
    [18] RONNEBERGER O, FISCHER P, BROX T. U-net: convolutional networks for biomedical image segmentation[C]. Medical image computing and computer-assisted intervention-MICCAI 2015: 18th international conference, Munich, Germany: Springer, 2015.
    [19] RONG L, LI C. Coarse- and fine-grained attention network with background-aware loss for crowd density map estimation[C]. Winter Conference on Applications of Computer Vision(WACV), Waikoloa, USA: IEEE, 2021.
    [20] IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]. European Conference on Computer Vision (ECCV), Munich, Germany: IEEE, 2018.
    [21] XIONG H, LU H, LIU C, et al. From open set to closed set: counting objects by spatial divide-and-conquer[C]. International Conference on Computer Vision(ICCV), Seoul, Korea(South): IEEE, 2019.
    [22] MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision[C]. International Conference on Computer Vision (ICCV), Seoul, Korea (South): IEEE, 2019. LIU T L, ZHANG C, WANG T G, et al. Effects of friends'information interaction on travel decisions[J]. Journal of Transportation Systems Engineering and Information Technology, 2013, 13(6): 86-93. (in Chinese)
  • 加载中
图(7) / 表(2)
计量
  • 文章访问数:  233
  • HTML全文浏览量:  119
  • PDF下载量:  18
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-08-18
  • 网络出版日期:  2024-04-03

目录

    /

    返回文章
    返回