基于深度强化学习的自动化集装箱码头集成调度方法

尹星; 张煜; 郑倩倩; 唐可心

doi:10.3963/j.jssn.1674-4861.2022.06.009

基于深度强化学习的自动化集装箱码头集成调度方法

doi: 10.3963/j.jssn.1674-4861.2022.06.009

尹星^1,,
张煜^{1, 2},
郑倩倩¹,
唐可心^1, ,

1.
武汉理工大学交通与物流工程学院武汉 430063
2.
广东省内河港航产业研究有限公司广东韶关 512000

基金项目:

国家自然科学基金项目 72174160

详细信息

作者简介:
尹星(1994—), 硕士研究生.研究方向: 港口调度与优化.E-mail: 1641018405@qq.com

通讯作者:
唐可心(1996—), 博士研究生.研究方向: 群智能优化算法、港口调度优化等.E-mail: monxint@whut.edu.cn

中图分类号: U691+.31
计量
- 文章访问数: 1111
- HTML全文浏览量: 331
- PDF下载量: 56
- 被引次数: 0
出版历程
- 收稿日期: 2022-06-13
- 网络出版日期: 2023-03-27

A Study of Integrated Scheduling of Automated Container Terminal Based on DDQN

YIN Xing^1
,,
ZHANG Yu^{1, 2},
ZHENG Qianqian¹,
TANG Kexin^{1
, ,}

1.
School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China
2.
Inland Port and Shipping Industry Research Co. Ltd. of Guangdong Province, Shaoguan 512000, Guangdong, China

摘要

摘要: 针对自动化集装箱码头卸货过程中岸桥、智能运输机器人和场桥设备交互作业, 实际调度环境复杂多变等问题, 以最小化最大完工时间为目标, 构建基于混合流水车间的三阶段集装箱码头集成调度模型, 为解决自动化码头调度环境动态性强的特点, 使用1种深度强化学习算法(DDQN)进行求解。依据码头实际调度情况, 使用神经网络实时拟合动作-值函数, 把各阶段设备状态数据输入模型, 采用经验回放机制训练模型, 把单一启发式规则加复合启发式规则作为设备候选行为, 通过强化学习动作选择与动作评估机制, 得到最优的集装箱-设备组合策略, 并与精确算法和常用的几种元启发式策略进行对比分析。结果表明: 较大规模算例下, 与目前较为先进的粒子群算法相比, 所提方法的总作业时间平均降低了7.84%, 与理论下界值的差距分别为6.0%, 5.6%, 4.6%, 三阶段设备负载较为均衡, 设备平均利用率为89%, 满足实际应用需求; 小规模算例下, 与Gurobi求解器的总完工时间平均误差为1.99%, 且随着算例规模增加, 所提算法在求解时间上显现出一定的优势, 求解时间最大提升59%, 验证了所提方法对于提升自动化集装箱码头运行效率的可行性和高效性。
- 智能交通 /
- 自动化集装箱码头 /
- 三阶段集成调度 /
- 深度强化学习 /
- 混合流水车间
Abstract: The interactive operations of quay cranes, artificial intelligent robots of transportation(ARTs), and yard cranes during automatic container terminal unloading are studied. A three-stages integrated scheduling model of automated container terminal based on hybrid flow shop scheduling problem is proposed, with the criterion of minimizing the makespan. In addition, the scheduling environment requires high real-time response. A deep reinforcement learning algorithm, namely double deep Q-network(DDQN), is used to solve the problem of dynamic characteristics of the automatic terminal scheduling environment. The input of the model is the real-time status data of the equipment at each stage. The neural network is used to fit the value-action function. The model is trained by experience playback mechanism. The single heuristic rule with the compound heuristic rule is taken as the equipment candidate behavior. By strengthening the learning action selection and action evaluation mechanism, the optimal container equipment combination strategy is obtained. According to the actual survey data of Tianjin Port Automation Terminal, different scales cases are designed for experimental comparison and analysis. The results show that: the total operation time of the proposed method is reduced by 7.84% on average compared with the current advanced particle swarm optimization algorithm, and the gap with the theoretical lower bound value is 6.0%, 5.6%, and 4.6%, respectively. In addition, the equipment loading in the three stages is relatively balanced. And the average utilization rate of equipment is 89%, which can meet the actual application requirements. In small-scale examples, the average error of the total completion time obtained by DDQN is 1.99% compared with Gurobi. With the increase of the size of the example, the solving time is increased by 59% at most, which verifies the feasibility and efficiency of the proposed method for improving the operation efficiency of the automated container terminal.
- intelligent transportation /
- automated container terminal /
- three-stage integrated scheduling /
- deep reinforcement learning /
- hybrid flow shop

HTML全文

图 1 码头前沿布局示意图

Figure 1. Layout diagram of wharf front

下载: 全尺寸图片幻灯片

图 2 集装箱卸货过程3阶段流水车间调度示意图

Figure 2. Schematic diagram of three-stage flow shop scheduling about container unloading process

下载: 全尺寸图片幻灯片

图 3 基于深度强化学习的集装箱集成调度机制

Figure 3. Container integrated scheduling mechanism based on deep reinforcement learning

下载: 全尺寸图片幻灯片

图 4 调度算法流程图

Figure 4. Flow chart of scheduling algorithm

下载: 全尺寸图片幻灯片

图 5 算法收敛图

Figure 5. Algorithm convergence graph

下载: 全尺寸图片幻灯片

图 6 调度结果甘特图

Figure 6. Gantt chart of dispatching results

下载: 全尺寸图片幻灯片

图 7 6种策略与理论下界值差距箱型图

Figure 7. Box plot of the gap between six strategies and the theoretical lower bound

下载: 全尺寸图片幻灯片

图 8 启发式行为使用频率统计

Figure 8. Heuristic behavior usage frequency statistics

下载: 全尺寸图片幻灯片

表 1 3阶段混合流水车间调度模型参数及说明

Table 1. Parameters and description of three-stage hybrid flow shop scheduling model

参数名	说明
i/j	作业集序号，i/j ∈ J = {1, 2,…, n}
J	集装箱作业集集合
k	阶段序号，k ∈ K = {1, 2,…, l}
K	操作阶段集合
m	操作设备编号m ∈ M_k = {1, 2,…,l_k}
M_k	阶段k所有设备的集合
V	虚拟作业集
p_ik^m	作业集i在k阶段的m设备上的操作时间
w_k^m	阶段k中设备m可以开始运作的最早时间
t_ijk^m	阶段k中同1个设备m操作先后2个作业集i，j时所需的准备时间
s_ik^m	决策变量：阶段k中，设备m对作业集i的开始操作时间
e_ik^m	决策变量：阶段k中，设备m对作业集i终止操作时间
x_ik^m	决策变量：作业集i在阶段k被设备m操作时为1，否则为0
x_ijk^m	决策变量：在阶段k，设备m操作完作业集i后紧接着对作业集j进行操作时为1，否则为0
Φ	作业集被服务的优先关系集合，(i, j)∈ Φ表示作业集i必须在作业集j之前操作
N	1个极大的正数

下载: 导出CSV

表 2 调度规则

Table 2. Scheduling rules

序号	规则	说明
1	Johnson1	将当前队列Ω_k分为2个子集Q₁和Q₂，然后阶段2设备ART依据SPT规则选择Q₁中的作业集
2	Johnson2	阶段2设备ART依据LPT规则选择Q₂中的作业集
3	Johnson3	将当前队列Ω_k分为2个子集A₁和A₂，然后阶段3设备YC依据LWKR选择A₁中的作业集
4	Johnson4	阶段3设备YC依据MWKR选择A₂中的作业集
5	FIFO	先到先加工原则
6	SPT	优先选择加工时间最短的工件
7	LPT	优先选择加工时间最长的工件
8	LWKR	优先选择剩余加工时间最短的工件
9	MWKR	优先选择剩余加工时间最长的工件

下载: 导出CSV

表 3 集装箱码头各阶段设备运作效率

Table 3. Operation time of equipment at each stage of container terminal

设备名称	作业类型	运作效率
单小车岸桥	单次作业/s	三角分布(93，103 113)
ART	空载速度/(m/s)	9.7
ART	重载速度	7
双悬臂ARMG	单次作业/s	三角分布(102，144 216)

下载: 导出CSV

表 4 码头前沿距各箱区的距离

Table 4. Distance from wharf apron to each container area

码头	箱区
码头	1	2	3	4	5	6	7	8
距离/m	120	170	220	280	330	390	440	500

下载: 导出CSV

表 5 算法参数设置

Table 5. Algorithm parameter setting

参数	取值
折扣因子γ	0.995
学习率α	4×10^-4
经验回放池R的容量D	6 000
采样批量数N	64
Episode/代	1 600
控制校正参数β	0.4
目标Q网络更新频率/代	100

下载: 导出CSV

表 6 单小车岸桥调度方案

Table 6. Single trolley quay crane dispatching scheme

单小车岸桥编号	岸桥操作序列
1	24→8→22→2→13→16→26
2	20→25→18→21→27→30→1→17
3	3→10→29→12→28→14→4→5
4	6→11→23→9→7→15→19

下载: 导出CSV

表 7 ART指派结果

Table 7. Art assignment results

ART小车编号	ART操作序列
1	3→2
2	8→30
3	29→16
4	22→4
5	11→15
6	25→14
7	9→5
8	12→17
9	20→28
10	21→1
11	6→7
12	24→27
13	10→13
14	23→26
15	18→19

下载: 导出CSV

表 8 双悬臂场桥调度方案

Table 8. Double cantilever yard crane dispatching scheme

双悬臂场桥编号	双悬臂场桥操作序列
1	6→23→19
2	24→22→15→26
3	3→18→7→16
4	10→28→30
5	25→9→1
6	20→29→2→14→5
7	8→12→13→17
8	11→21→27→4

下载: 导出CSV

表 9 小规模算例下Gurobi和深度强化学习算法求解结果对比

Table 9. Comparison of solution results between gurobi and deep reinforcement learning algorithms for small-scale examples

算例序号	n × Q × A × M	Gurobi		深度强化学习算法（DDQN）		E/%
算例序号	n × Q × A × M	e₁/s	t₁/s	e₂/s	t₂/s	E/%
1	4x2x6x3	7 524	2.28	7 650	12.75	1.67
2	4x2x8x3	7 511	3.06	7 650	10.83	1.85
3	8x2x6x3	9 808	7.24	10 054	16.44	2.51
4	8x2x8x3	9 513	9.91	9 711	16.34	2.08
5	10x2x6x4	10 910	13.02	11 220	19.89	2.84
6	10x2x8x4	9 526	17.18	9 700	16.75	1.83
7	12x3x9x5	9 825	35.51	9 917	35.09	0.94
8	12x3x12x5	9 523	47.14	9 719	23.51	2.06
9	14x3x9x5	10 885	54.14	10 986	26.56	0.93
10	14x3x12x5	10 022	66.32	10 337	27.03	3.14
平均值						1.99

下载: 导出CSV

表 10 较大规模算例求解对比

Table 10. Comparison of large-scale numerical examples

调度策略	不同规模实例调度结果（总完工时间）平均值/s
调度策略	60×12×26×14	80×12×30×14	100×12×36×14
FIFO	17 828	21 287	28 238
SPT	17 809	21 312	28 176
LPT	17 797	21 337	28 193
LWKR	17 824	21 333	28 160
MWKR	17 852	21 358	28 235
PSO	16 765	18 968	24 032
DDQN	15 582	17 756	22 027
理论下界	14 698	16 804	21 068
差距/%	6.0	5.6	4.6
注：根据文献[23]中的方法，可计算得出3阶段混合流水车间调度问题完工时间的理论下界值。

下载: 导出CSV

参考文献(25)

[1]	高雪峰. 基于深度强化学习的自动化集装箱码头双场桥动态调度研究[D]. 大连: 大连理工大学, 2020. GAO X F. Research on dynamic scheduling of two YC in automated container terminal based on deep reinforcement learning[D]. Dalian: Dalian University of Technology, 2020. (in Chinese)
[2]	丁一, 袁浩, 方怀瑾, 等. 考虑冲突规避的自动化集装箱码头AGV优化调度方法[J]. 交通信息与安全, 2022, 40(3): 96-107. doi: 10.3963/j.jssn.1674-4861.2022.03.010 DING Y, YUAN H, FANG H J, et al. An optimal scheduling method of AGVs at automated container terminal considering conflict avoidance[J]. Journal of Transport Information and Safety, 2022, 40(3): 96-107. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.03.010
[3]	夏孟珏, 史学鑫, 李美贞. 集装箱码头岸桥突发故障情况下装卸船作业重调度研究[J]. 上海海事大学学报, 2022, 43(1): 30-37. https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY202201005.htm XIA M J, SHI X X, LI M Z, et al. Study on handling operation rescheduling under sudden malfunction of container terminal quay cranes[J]. Journal of Shanghai Maritime University, 2022, 43(1): 30-37. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-SHHY202201005.htm
[4]	梁承姬, 杨全业. 双循环操作策略下集装箱码头岸桥与集卡多船作业联合调度[J]. 重庆交通大学学报(自然科学版), 2018, 37(3): 106-114. https://www.cnki.com.cn/Article/CJFDTOTAL-CQJT201803019.htm LIANG C J, YANG Q Y. Container terminal QC & IT integral scheduling model for multi-vessel operation under double-cycle strategy[J]. Journal of Chongqing Jiaotong University(Natural Science), 2018, 37(3): 106-114. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-CQJT201803019.htm
[5]	陈超, 邱建梅, 台伟力, 等. 出口箱随机入港下的集装箱码头泊位调度[J]. 交通信息与安全, 2014, 181(1): 91-96. doi: 10.3963/j.issn.1674-4861.2014.01.019 CHEN C, QIU J M, TAI W L, et al. Berth allocation planning in container terminals for the outbound containers arrival in random order[J]. Journal of Transport Information and Safety, 2014, 181(1): 91-96. (in Chinese) doi: 10.3963/j.issn.1674-4861.2014.01.019
[6]	常祎妹, 朱晓宁. 不确定因素下的集装箱码头车船间装卸作业集成调度[J]. 交通运输工程学报, 2017, 17(6): 115-124. https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC201706017.htm CHANG Y M, ZHU X N. Integrated scheduling of handling operation between train and vessel in container terminal under uncertain factor[J]. Journal of Traffic and Transportation Engineering, 2017, 17(6): 115-124. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC201706017.htm
[7]	KIM K H, PARK Y M. A crane scheduling method for port container terminals[J]. European Journal of Operational Research, 2004, 156(3): 752-768.
[8]	秦天保, 葛浩, 沙梅. 约束规划求解集装箱装卸系统集成调度问题[J]. 系统工程理论与实践, 2015, 35(8): 2127-2136. https://www.cnki.com.cn/Article/CJFDTOTAL-XTLL201508023.htm QIN T B, GE H, SHA M. Constraint programming for the integrated scheduling problem of container handling system in container terminals[J]. System Engineering-Theory & Practice, 2015, 35(8): 2127-2136. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-XTLL201508023.htm
[9]	ZHEN L, YU S C, WANG S A, et al. Scheduling quay cranes and yard trucks for unloading operations in container ports[J]. Annals of Operations Research, 2019, 273(1): 455-478.
[10]	钟祾充, 李文锋, 贺利军, 等. 集装箱码头混合零空闲柔性流水作业调度优化[J/OL]. 计算机集成制造系统: 1-22[2022-10-28]. http://kns.cnki.net/kcms/detail/11.5946.TP.20211228.1358.016.html. ZHONG L C, LI W F, HE L J, et al. Optimization of mixed no-idle flexible flow scheduling in container terminal[J]. Computer Integrated Manufacturing Systems: 1-22[2022-10-28]. http://kns.cnki.net/kcms/detail/11.5946.TP.20211228.1358.016.html. (in Chinese)
[11]	陈超, 张哲, 曾庆成. 集装箱码头混合交叉作业集成调度模型[J]. 交通运输工程学报, 2012, 12(3): 92-100. https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC201203017.htm CHEN C, ZHANG Z, ZENG Q C. Integrated scheduling model of mixed cross-operation for container terminal[J]. Journal of Traffic and Transportation Engineering, 2012, 12(3): 92-100. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC201203017.htm
[12]	杨彩云, 张煜, 徐亚军, 等. 自动化集装箱码头ART动态调速群智决策研究[J]. 武汉理工大学学报, 2022, 44(1): 28-35. https://www.cnki.com.cn/Article/CJFDTOTAL-WHGY202201005.htm YANG C Y, ZHANG Y, XU Y J, et al. Simulation analysis of ART dynamic speed regulation of automated container terminal based on MAS[J]. Journal of Wuhan University of Technology, 2022, 44(1): 28-35. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-WHGY202201005.htm
[13]	SUTTON R S, BARTO A G. Reinforcement learning: an introduction[J]. IEEE Transactions on Neural Networks, 1998, 9(5): 1054-1054.
[14]	张华胜. 基于深度强化学习的自动化码头跨运车集成调度研究[D]. 上海: 上海海事大学, 2021. ZHANG H S. Research on integrated scheduling of shuttle carriers in automated terminals based on deep reinforcement learning[D]. Shanghai: Shanghai Maritime University, 2021. (in Chinese)
[15]	尚晶, 徐长生. 基于强化学习的集装箱码头卡车调度策略研究[J]. 武汉理工大学学报, 2011, 33(3): 72-76. https://www.cnki.com.cn/Article/CJFDTOTAL-WHGY201103015.htm SHANG J, XU C S. Vehicle scheduling in container terminal based on reinforcement learning[J]. Journal of Wuhan University of Technology, 2011, 33(3): 72-76. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-WHGY201103015.htm
[16]	PARK S J, YOO Y H, PYO C W. Applying DQN solutions in fog-based vehicular networks: Scheduling, caching, and collision control[J]. Vehicular Communications, 2022, 33(C): 100397.
[17]	SALVADORl M S. A solution to a special class of flow shop scheduling problems[J]. Symposium on the Theory of Scheduling and Its Applications, 1972(86): 83-91.
[18]	SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. second edition. 北京: 电子工业出版, 2019.
[19]	陆志强, 任逸飞, 许则鑫. 基于深度学习的资源投入问题算法[J]. 计算机集成制造系统, 2021, 27(6): 1558-1568. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJJ202106003.htm LU Z Q, REN Y F, XU Z X. Research on the deep learning algorithm for resource investment problem[J]. Computer Integrated Manufacturing Systems, 2021, 27(6): 1558-1568. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJJ202106003.htm
[20]	HAN B A, YANG J J. Research on adaptive job shop scheduling problems based on dueling double DQN[J]. IEEE Access, 2020(8): 186474-186495.
[21]	WANG Y H, ZHOU X, SHI Y X, et al. Transmission network expansion planning considering wind power and load uncertainties based on multi-agent DDQN[J]. Energies, 2021, 14(19): 6073.
[22]	LIU C L, CHANG C C, TSENG C J. Actor-Critic deep reinforcement learning for solving job shop scheduling problem[J]. IEEE Access, 2020(8): 71752-71762.
[23]	邢曦文. 基于混合流水作业组织的集装箱码头装卸集成调度优化[D]. 大连: 大连海事大学, 2013. XING X W. Optimization of container loading/unloading integrated scheduling in a container terminal based on hybrid flowshop[D]. Dalian: Dalian Maritime University, 2013. (in Chinese)
[24]	肖鹏飞, 张超勇, 孟磊磊, 等. 基于深度强化学习的非置换流水车间调度问题[J]. 计算机集成制造系统, 2021, 27(1): 192-205. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJJ202101018.htm XIAO P F, ZHANG C Y, MENG L L, et al. Non-permutation flow shop scheduling problem based on deep reinforcement learning[J]. Computer Integrated Manufacturing Systems, 2021, 27(1): 192-205. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JSJJ202101018.htm
[25]	YUTTHAPONG T, WERASAK K. Comparing nonlinear inertia weights and constriction factors in particle swarm optimization[J]. International Journal of Knowledge-Based and Intelligent Engineering Systems, 2011, 15(2): 65-70.