Progress on Deep Reinforcement Learning in Path Planning

发布时间：2024-03-11

阅读量：

字号：

[1] 刘志荣，姜树海.基于强化学习的移动机器人路径规划研究综述[J].制造业自动化，2019，41（3）：90-92.
LIU Z R JIANG S H.Research review of mobile robot path planning based on reinforcement learning[J].Manufacturing Automation，2019，41（3）：90-92.
[2] LIU F，CHEN C，Li Z，et al.Research on path planning of robot based on deep reinforcement learning[C]//2020 39th Chinese Control Conference（CCC），Shenyang，27-29 July，2020：3730-3734.
[3] WONG C，CHIEN S Y，FENG H M，et al.Motion planning for dual-arm robot based on soft actor-critic[J].IEEE Access，2021，9：26871-26885.
[4] KANG K，BELKHALE S，KAHN G，et al.Generalization through simulation：Integrating simulated and real data into deep reinforcement learning for Vision-Based autonomous flight[C]//2019 International Conference on Robotics and Automation，2019.
[5] KHATIB O.Real-time obstacle avoidance system for manipulators and mobile robots[J].The International Journal of Robotics Research，1986，5（1）：90-98.
[6] HOTLE R，PEREZ M，ZIMMER R，et al.Hierarchical A*：Searching abstraction hierarchies efficiently[C]//Proceedings of the Thirteenth National Conference on Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conference，1996.
[7] GURUJI A K，AGARWAL H，PARSEDIYA D.Time efficient A* algorithm for robot path planning[J].Procedia Technology，2016，23：144-149.
[8] DORIGO M.The ant system：An autocatalytic optimizing process[C]//Proceedings of the First European Conference on Artificial Life，Paris，1991.
[9] MIRJALILI S，DONG J S，LEWIS A.Ant colony optimizer：Theory，literature review，and application in AUV path planning：Methods and applications[J].Studies in Computational Intelligence，2020，811：7-21.
[10] KARAMI A H，HASANZADEH M.An adaptive genetic algorithm for robot motion planning in 2D complex environments[J].Computers & Electrical Engineering，2015，43：317-329.
[11] 刘志荣，姜树海，袁雯雯，等.基于深度Q学习的移动机器人路径规划[J].测控技术，2019，38（7）：24-28.
LIU Z R，JIANG S H，YUAN W W，et al.Mobile robot path planning based on deep Q learning[J].Measurement and Control Technology，2019，38（7）：24-28.
[12] KOBER J，PETER J.Reinforcement learning in robotics：A survey[J].International Journal of Robotics Research，2013，32（11）：1238-1274.
[13] POLYDOROS A S，NALPANTIDIS L.Survey of model-based reinforcement learning：Applications on robotics[J].Journal of Intelligent & Robotic Systems，2017，86：1-21.
[14] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Playing atari with deep reinforcement learning[J].arXiv：1312.5602，2013.
[15] ZHU Y，ZHAO D，LI X.Iterative adaptive dynamic programming for solving unknown nonlinear zero?sum game based on online data[J].IEEE Transactions on Neural Networks & Learning Systems，2017，28（3）：714-725.
[16] 孙彧，曹雷，陈希亮，等.多智能体深度强化学习研究综述[J].计算机工程与应用，2020，56（5）：13-24.
SUN Y，CAO L，CHEN X L，et al.Research review of multi-agent deep reinforcement learning[J].Computer Engineering and Applications，2020，56（5）：13-24.
[17] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning[J].Nature，2015，518：529-553.
[18] 吴夏铭.基于深度强化学习的路径规划算法研究[D].长春：长春理工大学，2020.
WU X M.Research on path planning algorithm based on deep reinforcement learning[D].Changchun：Changchun University of Science and Technology，2020.
[19] LEI T，MING L.A robot exploration strategy based on Q-learning network[C]//2016 IEEE International Conference on Real-time Computing and Robotics，6-10 June，2016：57-62.
[20] 封硕，舒红，谢步庆.基于改进深度强化学习的三维环境路径规划[J].计算机应用与软件，2021，38（1）：250-255.
FENG S，SHU H，XIE B Q.Path planning for 3D environment based on improved deep reinforcement learning[J].Computer Applications and Software，2021，38（1）：250-255.
[21] JING X，ZhAO H，DING L，et al.Application of deep reinforcement learning in mobile robot path planning[C]//2017 Chinese Automation Congress（CAC），Jinan，20-22 Oct，2018：7112-7116.
[22] 孔松涛，刘池池，史勇，等.深度强化学习在智能制造中的应用展望综述[J].计算机工程与应用，2021，57（2）：49-59.
KONG S T，LIU C C，SHI Y，et al.Overview of the application prospects of deep reinforcement learning in intelligent manufacturing[J].Computer Engineering and Applications，2021，57（2）：49-59.
[23] HASSELT H V，GUEZ A，SILVER D.Deep reinforcement learning with double Q-learning[J].arXiv：1509. 06461，2015.
[24] WANG Z，SCHUAL T，HESSEL M，et al.Dueling network architectures for deep reinforcement learning[C]// Proceedings of International Conference on Machine Learning，2016：1995-2003.
[25] FOERSTER J，NARDELLI N，FARQUHAR G，et al.Stabilising experience replay for deep multi-agent reinforcement learning[J].arXiv：1702.08887，2017.
[26] NAIR A，SRINIVASAN P，BLACKWELL S，et al.Massively parallel methods for deep reinforcement learning[J].arXiv：1507.04296，2015.
[27] ANSCHEL O，BARAM N，SHIMKIN N.Averaged-DQN：Variance reduction and stabilization for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning，2017：176-185.
[28] ANSCHEL O，BARMA N，SHIMKIN N.Deep reinforcement learning with averaged target dqn[J].arXiv：1611. 01929，2016.
[29] LV L，ZHANG S，DING D，et al.Path planning via an improved dqn-based learning policy[J].IEEE Access，2019，7：67319-67330.
[30] 董永峰，杨琛，董瑶，等.基于改进的DQN机器人路径规划[J].计算机工程与设计，2021，42（2）：552-558.
DONG Y F，YANG C，DONG Y，et al.Path planning based on improved DQN robot[J].Computer Engineering and Design，2021，42（2）：552-558.
[31] SUTTON R，BARTO A.Reinforcement learning：An introduction[J].IEEE Transactions on Neural Networks，1998，9（5）：1054.
[32] QIU H，F LIU.A state representation dueling network for deep reinforcement learning[C]//2020 IEEE 32nd International Conference on Tools with Artificial Intelligence，Baltimore，2020：669-674.
[33] HOCHREITER S，SCHMIDHUBER J.Long short-term memory[J].Neural computation，1997，9（8）：1735-1780.
[34] HAUSKNECHT M，STONE P.Deep recurrent Q-learning for partially observable MDPs[J].arXiv：1507.06527，2015.
[35] 翟建伟.基于深度Q网络算法与模型的研究[D].苏州：苏州大学，2017.
ZHAI J W.Research on deep Q network algorithm and model[D].Suzhou：Soochow University，2017.
[36] 刘全，闫岩，朱斐，吴文，等.一种带探索噪音的深度循环Q网络[J].计算机学报，2019，42（7）：1588-1604.
LIU Q，YAN Y，ZHU F，WU WEN，et al.A deep cycle Q network with exploration noise[J].Chinese Journal of Computers，2019，42（7）：1588-1604.
[37] SCHAUL T，QUAN J，ANTONOGLOU I，et al.Prioritized experience replay[C]//Proceedings of International Conference on Learning Representations，2016：1-21.
[38] HORGAN D，QUAN J，BUDDEN D，et al.Distributed prioritized experience replay[J].arXiv：1803.00933，2018.
[39] HESTER T，VEEERIK M，PIETQUIN O，et al.Learning from demonst rations for real world reinforcement learning[J].arXiv：1704.03732，2017.
[40] LV L，ZHANG S，DING D，et al.Path planning via an improved DQN based learning policy[J].IEEE Access，2019，7：67319-67330.
[41] 孙辉辉，胡春鹤，张军国.移动机器人运动规划中的深度强化学习方法[J].控制与决策，2021（6）：1281-1292.
SUN H H，HU C H，ZHANG J G.Deep reinforcement learning method in mobile robot motion planning[J].Control and Decision，2021（6）：1281-1292.
[42] 刘建伟，高峰，罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报，2019，42（6）：1406-1438.
LIU J W，GAO F，LUO X L.Overview of deep reinforcement learning based on value function and policy gradient[J].Chinese Journal of Computers，2019，42（6）：1406-1438.
[43] SUTTON R S，MC ALLESTER D A，SINGH S P，et al.Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems，2000：1057-1063.
[44] LILLICRAP T P，HUNT J J，PRITZEL A，et al.Continuous control with deep reinforcement learning[J].arXiv：1509. 02971，2015.
[45] HOU Z，DONG H，ZHANG K，et al.Knowledge-driven deep deterministic policy gradient for robotic multiple Peg-in-Hole assembly tasks[C]//2018 IEEE International Conference on Robotics and Biomimetics，2019.
[46] ZHENG Z，YUAN C，LIN Z，et al.Self-adaptive double boot-strapped DDPG[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence，2018：3198-3204.
[47] 武曲，张义，郭坤，等.结合LSTM的强化学习动态环境路径规划算法[J].小型微型计算机系统，2021，42（2）：334-339.
WU Q，ZHANG Y，GUO K，et al.Path planning algorithm for reinforcement learning dynamic environment combined with LSTM[J].Journal of Chinese Computer Systems，2021，42（2）：334-339.
[48] SCHULMAN J，LEVINE S，MORITZ P，et al.Trust region policy optimization[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning，2015：1889-1897.
[49] JHA D K，RAGHUNATHAN A U，ROMERES D.Quasi-newton trust region policy optimization[C]//2019 Conference on Robot Learning，2019.
[50] ZHANG H，BAI S，LAN X，et al.Hindsight trust region policy optimization[J].arXiv：1907.12439，2019.
[51] SHANI L，EFRONI Y，MANNORS S.Adaptive trust region policy optimization：global convergence and faster rates for regularized MDPs[J].arXiv：1909.02769，2019.
[52] SHANI L，YEFRONI F，MANNORS S.Adaptive trust region policy optimization：Global convergence and faster rates for regularized MDPs[C]//Proceedings of the AAAI Conference on Artificial Intelligence，2020：5668-5675.
[53] SCHULMAN J，WOLSKI F，DHARIWAL P，et al.Proximal policy optimization algorithm[J].arXiv：1707.06347，2017.
[54] WANG Y，HE H，WEN C，et al.Truly proximal policy optimization[J].arXiv：1903.07940，2019.
[55] MNIH V，BADIA A P，MIRZA M，et al.Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning，2016：1928-1937.
[56] KARTAL B，HERNANDEZ-LEAL P，TAYLOR M E.Terminal prediction as an auxiliary task for deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment，2019：38-44.
[57] LABAO A B，MARTIJA M A M，NAVAL P C.A3C-GS：Adaptive moment gradient sharing with locks for asynchronous actor-critic agents[J].IEEE Transactions on Neural Networks and Learning Systems，2020，99：1-15.
[58] HAARNOJA T，ZHOU A，ABBEEL P，et al.Soft actor-critic：Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning，2018：2976-2989.
[59] DUAN Y，CHEN X，HOUTHOOFT R，et al.Benchmarking deep reinforcement learning for continuous control[C]// Proceedings of the International Conference on Machine Learning，2016.
[60] FU F，KANG Y，ZHANG Z，et al.Soft actor-critic DRL for live transcoding and streaming in vehicular fog computing-enabled IoV[J].IEEE Internet of Things Journal，2021，8（3）：1308-1321.
[61] CHENG Y，SONG Y.Autonomous decision-making generation of UAV based on soft actor-critic algorithm[C]//Proceedings of the 39th Chinese Control Conference（CCC），Shenyang，27-29 July，2020：7350-7355.
[62] TANG H，WANG A，XUE F，et al.A novel hierarchical soft actor-critic algorithm for multi logistics robots task allocation[J].IEEE Access，2021，9：42568-42582.
[63] XIE L，WANG S，MARKHAM A，et al.Towards monocular vision based obstacle avoidance through deep reinforcement learning[J].arXiv：1706.09829.2017.
[64] HESSEL M，MODAYIL J，HASSELT H V，et al.Rainbow：Combining improvements in deep reinforcement learning[J].arXiv：1710.02298，2017.
[65] KULKARNI T D，NARASIMHAN K，SAEEDI A，et al.Hierarchical deep reinforcement learning：integrating temporal abstraction and intrinsic motivation[C]//Advances in Neural Information Processing Systems，2016：3675-3683.
[66] 徐志雄，曹雷，张永亮，等.基于动态融合目标的深度强化学习算法研究[J].计算机工程与应用，2019，55（7）：157-161.
XU Z X，CAO L，ZHANG Y L，et al.Research on deep reinforcement learning algorithm based on dynamic fusion target[J].Computer Engineering and Applications，2019，55（7）：157-161.
[67] 张俊杰，张聪，赵涵捷.重复利用状态值的竞争深度Q网络算法[J].计算机工程与应用，2021，57（4）：134-140.
ZHANG J J，ZHANG C，ZHAO H J.Competitive deep Q network algorithm for reusing state values[J].Computer Engineering and Applications，2021，57（4）：134-140.
[68] AVRACHENKOV K，BORKAR V S，DOLHARE H P，et al.Full gradient DQN reinforcement learning：a provably Convergent Scheme[J].arXiv：2103.05981，2021.
[69] HUI T S，ISHAK M K，MOHAMED M F P，et al.Balancing excitation and inhibition of spike neuron using Deep Q Network（DQN）[C]//Proceedings of the?5th International Conference on Electronic Design（ICED），2020.
[70] PAN J，WANG X，CHENG Y，et al.Multisource transfer double DQN based on actor learning[J].IEEE Transactions on Neural Networks & Learning Systems，2018，29（6）：2227-2238.
[71] SILVER D，HUANG A，MADDISON C J，et al.Mastering the game of go with deep neural networks and tree search[J].Nature，2016，529：484-489.
[72] SILVER D，SCHRITTWIESER J，SIMONYAN K，et al.Mastering the game of Go without human knowledge[J].Nature，2017，550：354-359.
[73] 刘朝阳，穆朝絮，孙长银.深度强化学习算法与应用研究现状综述[J].智能科学与技术学报，2020，2（4）：314-326.
LIU C Y，MU C X，SUN C Y.A review of the research status of deep reinforcement learning algorithms and applications[J].Journal of Intelligent Science and Technology，2020，2（4）：314-326.
[74] MNIH V，KAVUKCUOGLU K，SILVER D，et al.Human-level control through deep reinforcement learning[J].Nature，2015.518：529-533.
[75] BADIA A P，PIOT B，UROWSKI S K P，et al.Agent57：Outperforming the Atari Human Benchmark[J].arXiv：2003.13350，2020.
[76] KEMPKA M，WYDMUC M，RUNC G，et al.Vizdoom：A doom-based AI research platform for visual reinforcement learning[C]//2016 IEEE Conference on Computational Intelligence and Games（CIG），Santorini，20-23 Sept，2016：1-8.
[77] VINYALS O，EWALDS T，BARTUNOV S，et al.Starcraft II：A new challenge for reinforcement learning[J].arXiv：1708.04782，2017.
[78] YE D，LIU Z，SUN M，et al.Mastering complex control in MOBA games with deep reinforcement learning[J].arXiv：1912.09729，2019.
[79] JADERBERG M，MNIH V，CZARNECKI W M，et al.Reinforcement learning with unsupervised auxiliary tasks[J].arXiv：1611.05397，2016.
[80] ZHU Y，MOTTAGHI R，KOLVE E，et al.Target driven visual navigation in indoor scenes using deep reinforcement learning[C]//2017 IEEE International Conference on Robotics and Automation（ICRA），Singapore，29 May-3 June，2017：3357-3364.
[81] KULHANEK J，DERNER E，BABUKA R.Visual navigation in real world indoor environments using end-to-end deep reinforcement learning[J].IEEE Robotics and Automation Letters，2020，3：4345-4352.
[82] 王毅然，经小川，田涛，等.基于强化学习的多Agent路径规划方法研究[J].计算机应用与软件，2019，36（8）：165-171.
WANG Y R，JING X C，TIAN T，et al.Research on multi-agent path planning method based on reinforcement learning[J].Computer Applications and Software，2019，36（8）：165-171.
[83] 梁宸.基于强化学习的多智能体协作策略研究[D].沈阳：沈阳理工大学，2020.
LIANG C.Research on multi-agent cooperative cooperative strategy based on reinforcement learning[D].Shenyang：Shenyang University of Technology，2020.
[84] FOERSTER J，FARQUHAR G，AFOURAS T，et al.Counter factual multi-agent policy gradients[J].arXiv：1705.08926，2017.
[85] MAO H，GONG Z，NI Y，et al.ACCNet：Actor-coordinator-critic net for “learning-to-communicate” with deep multi-agent reinforcement learning[J].arXiv：1706.03235，2017.
[86] SUNEHAG P，LEVER G，GRUSLYS A，et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv：1706.05296，2017.
[87] IQBAL S，SHA F.Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the International Conference on Machine Learning，2019.
[88] 梁星星，冯旸赫，马扬，等.多Agent深度强化学习综述[J].自动化学报，2020，46（12）：2537-2557.
LIANG X X，FENG Y H，MA Y，et al.A review on multi-agent deep reinforcement learning[J].Acta Automatica Sinica，2020，46（12）：2537-2557.
[89] 李航、李国杰、汪可友.基于深度强化学习的电动汽车实时调度策略[J].电力系统自动化，2020，692（22）：166-172.
LI H，LI G J，WANG K Y.Electric vehicle real-time scheduling strategy based on deep reinforcement learning[J].Automation of Electric Power System，2020，692（22）：166-172.
[90] 赵婷婷，孔乐，韩雅杰，等.模型化强化学习研究综述[J].计算机科学与探索，2020，14（6）：918-927.
ZHAO T T，KONG L，HAN Y J.A review of modeling reinforcement learning[J].Journal of Frontiers of Computer Science and Technology，2020，14（6）：918-927.
[91] XIR R，MENGZ，WANG L，et al.Unmanned aerial vehicle path planning algorithm based on deep reinforcement learning in large-scale and dynamic environments[J].IEEE Access，2021，9：24884-24900.
[92] ZHAO W，QUERALTA J P，Westerlund T，et al.Sim-to-real transfer in deep reinforcement learning for robotics：a survey[C]//2020 IEEE Symposium Series on Computational Intelligence（SSCI），Dec 1-4，2021：737-744.

返回列表

上海推出优化营商环境7.0版方案

resource optimization