新闻资讯
Progress on Deep Reinforcement Learning in Path Planning
发布时间:2024-03-11
  |  
阅读量:
字号:
A+ A- A

[1] 刘志荣,姜树海.基于强化学习的移动机器人路径规划研究综述[J].制造业自动化,2019,41(3):90-92.
LIU Z R JIANG S H.Research review of mobile robot path planning based on reinforcement learning[J].Manufacturing Automation,2019,41(3):90-92.
[2] LIU F,CHEN C,Li Z,et al.Research on path planning of robot based on deep reinforcement learning[C]//2020 39th Chinese Control Conference(CCC),Shenyang,27-29 July,2020:3730-3734.
[3] WONG C,CHIEN S Y,FENG H M,et al.Motion planning for dual-arm robot based on soft actor-critic[J].IEEE Access,2021,9:26871-26885.
[4] KANG K,BELKHALE S,KAHN G,et al.Generalization through simulation:Integrating simulated and real data into deep reinforcement learning for Vision-Based autonomous flight[C]//2019 International Conference on Robotics and Automation,2019.
[5] KHATIB O.Real-time obstacle avoidance system for manipulators and mobile robots[J].The International Journal of Robotics Research,1986,5(1):90-98.
[6] HOTLE R,PEREZ M,ZIMMER R,et al.Hierarchical A*:Searching abstraction hierarchies efficiently[C]//Proceedings of the Thirteenth National Conference on Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conference,1996.
[7] GURUJI A K,AGARWAL H,PARSEDIYA D.Time efficient A* algorithm for robot path planning[J].Procedia Technology,2016,23:144-149.
[8] DORIGO M.The ant system:An autocatalytic optimizing process[C]//Proceedings of the First European Conference on Artificial Life,Paris,1991.
[9] MIRJALILI S,DONG J S,LEWIS A.Ant colony optimizer:Theory,literature review,and application in AUV path planning:Methods and applications[J].Studies in Computational Intelligence,2020,811:7-21.
[10] KARAMI A H,HASANZADEH M.An adaptive genetic algorithm for robot motion planning in 2D complex environments[J].Computers & Electrical Engineering,2015,43:317-329.
[11] 刘志荣,姜树海,袁雯雯,等.基于深度Q学习的移动机器人路径规划[J].测控技术,2019,38(7):24-28.
LIU Z R,JIANG S H,YUAN W W,et al.Mobile robot path planning based on deep Q learning[J].Measurement and Control Technology,2019,38(7):24-28.
[12] KOBER J,PETER J.Reinforcement learning in robotics:A survey[J].International Journal of Robotics Research,2013,32(11):1238-1274.  
[13] POLYDOROS A S,NALPANTIDIS L.Survey of model-based reinforcement learning:Applications on robotics[J].Journal of Intelligent & Robotic Systems,2017,86:1-21.
[14] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[J].arXiv:1312.5602,2013.
[15] ZHU Y,ZHAO D,LI X.Iterative adaptive dynamic programming for solving unknown nonlinear zero?sum game based on online data[J].IEEE Transactions on Neural Networks & Learning Systems,2017,28(3):714-725.
[16] 孙彧,曹雷,陈希亮,等.多智能体深度强化学习研究综述[J].计算机工程与应用,2020,56(5):13-24.
SUN Y,CAO L,CHEN X L,et al.Research review of multi-agent deep reinforcement learning[J].Computer Engineering and Applications,2020,56(5):13-24.
[17] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518:529-553.
[18] 吴夏铭.基于深度强化学习的路径规划算法研究[D].长春:长春理工大学,2020.
WU X M.Research on path planning algorithm based on deep reinforcement learning[D].Changchun:Changchun University of Science and Technology,2020.
[19] LEI T,MING L.A robot exploration strategy based on Q-learning network[C]//2016 IEEE International Conference on Real-time Computing and Robotics,6-10 June,2016:57-62.
[20] 封硕,舒红,谢步庆.基于改进深度强化学习的三维环境路径规划[J].计算机应用与软件,2021,38(1):250-255.
FENG S,SHU H,XIE B Q.Path planning for 3D environment based on improved deep reinforcement learning[J].Computer Applications and Software,2021,38(1):250-255.
[21] JING X,ZhAO H,DING L,et al.Application of deep reinforcement learning in mobile robot path planning[C]//2017 Chinese Automation Congress(CAC),Jinan,20-22 Oct,2018:7112-7116.
[22] 孔松涛,刘池池,史勇,等.深度强化学习在智能制造中的应用展望综述[J].计算机工程与应用,2021,57(2):49-59.
KONG S T,LIU C C,SHI Y,et al.Overview of the application prospects of deep reinforcement learning in intelligent manufacturing[J].Computer Engineering and Applications,2021,57(2):49-59.
[23] HASSELT H V,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning[J].arXiv:1509. 06461,2015.
[24] WANG Z,SCHUAL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]// Proceedings of International Conference on Machine Learning,2016:1995-2003.
[25] FOERSTER J,NARDELLI N,FARQUHAR G,et al.Stabilising experience replay for deep multi-agent reinforcement learning[J].arXiv:1702.08887,2017.
[26] NAIR A,SRINIVASAN P,BLACKWELL S,et al.Massively parallel methods for deep reinforcement learning[J].arXiv:1507.04296,2015.
[27] ANSCHEL O,BARAM N,SHIMKIN N.Averaged-DQN:Variance reduction and stabilization for deep reinforcement learning[C]//Proceedings of International Conference on Machine Learning,2017:176-185.
[28] ANSCHEL O,BARMA N,SHIMKIN N.Deep reinforcement learning with averaged target dqn[J].arXiv:1611. 01929,2016.
[29] LV L,ZHANG S,DING D,et al.Path planning via an improved dqn-based learning policy[J].IEEE Access,2019,7:67319-67330.
[30] 董永峰,杨琛,董瑶,等.基于改进的DQN机器人路径规划[J].计算机工程与设计,2021,42(2):552-558.
DONG Y F,YANG C,DONG Y,et al.Path planning based on improved DQN robot[J].Computer Engineering and Design,2021,42(2):552-558.
[31] SUTTON R,BARTO A.Reinforcement learning:An introduction[J].IEEE Transactions on Neural Networks,1998,9(5):1054.
[32] QIU H,F LIU.A state representation dueling network for deep reinforcement learning[C]//2020 IEEE 32nd International Conference on Tools with Artificial Intelligence,Baltimore,2020:669-674.
[33] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[34] HAUSKNECHT M,STONE P.Deep recurrent Q-learning for partially observable MDPs[J].arXiv:1507.06527,2015.
[35] 翟建伟.基于深度Q网络算法与模型的研究[D].苏州:苏州大学,2017.
ZHAI J W.Research on deep Q network algorithm and model[D].Suzhou:Soochow University,2017.
[36] 刘全,闫岩,朱斐,吴文,等.一种带探索噪音的深度循环Q网络[J].计算机学报,2019,42(7):1588-1604.
LIU Q,YAN Y,ZHU F,WU WEN,et al.A deep cycle Q network with exploration noise[J].Chinese Journal of Computers,2019,42(7):1588-1604.
[37] SCHAUL T,QUAN J,ANTONOGLOU I,et al.Prioritized experience replay[C]//Proceedings of International Conference on Learning Representations,2016:1-21.
[38] HORGAN D,QUAN J,BUDDEN D,et al.Distributed prioritized experience replay[J].arXiv:1803.00933,2018.
[39] HESTER T,VEEERIK M,PIETQUIN O,et al.Learning from demonst rations for real world reinforcement learning[J].arXiv:1704.03732,2017.
[40] LV L,ZHANG S,DING D,et al.Path planning via an improved DQN based learning policy[J].IEEE Access,2019,7:67319-67330.
[41] 孙辉辉,胡春鹤,张军国.移动机器人运动规划中的深度强化学习方法[J].控制与决策,2021(6):1281-1292.
SUN H H,HU C H,ZHANG J G.Deep reinforcement learning method in mobile robot motion planning[J].Control and Decision,2021(6):1281-1292.
[42] 刘建伟,高峰,罗雄麟.基于值函数和策略梯度的深度强化学习综述[J].计算机学报,2019,42(6):1406-1438.
LIU J W,GAO F,LUO X L.Overview of deep reinforcement learning based on value function and policy gradient[J].Chinese Journal of Computers,2019,42(6):1406-1438.
[43] SUTTON R S,MC ALLESTER D A,SINGH S P,et al.Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems,2000:1057-1063.
[44] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[J].arXiv:1509. 02971,2015.
[45] HOU Z,DONG H,ZHANG K,et al.Knowledge-driven deep deterministic policy gradient for robotic multiple Peg-in-Hole assembly tasks[C]//2018 IEEE International Conference on Robotics and Biomimetics,2019.
[46] ZHENG Z,YUAN C,LIN Z,et al.Self-adaptive double boot-strapped DDPG[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence,2018:3198-3204.
[47] 武曲,张义,郭坤,等.结合LSTM的强化学习动态环境路径规划算法[J].小型微型计算机系统,2021,42(2):334-339.
WU Q,ZHANG Y,GUO K,et al.Path planning algorithm for reinforcement learning dynamic environment combined with LSTM[J].Journal of Chinese Computer Systems,2021,42(2):334-339.
[48] SCHULMAN J,LEVINE S,MORITZ P,et al.Trust region policy optimization[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning,2015:1889-1897.
[49] JHA D K,RAGHUNATHAN A U,ROMERES D.Quasi-newton trust region policy optimization[C]//2019 Conference on Robot Learning,2019.
[50] ZHANG H,BAI S,LAN X,et al.Hindsight trust region policy optimization[J].arXiv:1907.12439,2019.
[51] SHANI L,EFRONI Y,MANNORS S.Adaptive trust region policy optimization:global convergence and faster rates for regularized MDPs[J].arXiv:1909.02769,2019.
[52] SHANI L,YEFRONI F,MANNORS S.Adaptive trust region policy optimization:Global convergence and faster rates for regularized MDPs[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2020:5668-5675.
[53] SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policy optimization algorithm[J].arXiv:1707.06347,2017.
[54] WANG Y,HE H,WEN C,et al.Truly proximal policy optimization[J].arXiv:1903.07940,2019.
[55] MNIH V,BADIA A P,MIRZA M,et al.Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning,2016:1928-1937.
[56] KARTAL B,HERNANDEZ-LEAL P,TAYLOR M E.Terminal prediction as an auxiliary task for deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment,2019:38-44.
[57] LABAO A B,MARTIJA M A M,NAVAL P C.A3C-GS:Adaptive moment gradient sharing with locks for asynchronous actor-critic agents[J].IEEE Transactions on Neural Networks and Learning Systems,2020,99:1-15.
[58] HAARNOJA T,ZHOU A,ABBEEL P,et al.Soft actor-critic:Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning,2018:2976-2989.
[59] DUAN Y,CHEN X,HOUTHOOFT R,et al.Benchmarking deep reinforcement learning for continuous control[C]// Proceedings of the International Conference on Machine Learning,2016.
[60] FU F,KANG Y,ZHANG Z,et al.Soft actor-critic DRL for live transcoding and streaming in vehicular fog computing-enabled IoV[J].IEEE Internet of Things Journal,2021,8(3):1308-1321.
[61] CHENG Y,SONG Y.Autonomous decision-making generation of UAV based on soft actor-critic algorithm[C]//Proceedings of the 39th Chinese Control Conference(CCC),Shenyang,27-29 July,2020:7350-7355.
[62] TANG H,WANG A,XUE F,et al.A novel hierarchical soft actor-critic algorithm for multi logistics robots task allocation[J].IEEE Access,2021,9:42568-42582.
[63] XIE L,WANG S,MARKHAM A,et al.Towards monocular vision based obstacle avoidance through deep reinforcement learning[J].arXiv:1706.09829.2017.
[64] HESSEL M,MODAYIL J,HASSELT H V,et al.Rainbow:Combining improvements in deep reinforcement learning[J].arXiv:1710.02298,2017.
[65] KULKARNI T D,NARASIMHAN K,SAEEDI A,et al.Hierarchical deep reinforcement learning:integrating temporal abstraction and intrinsic motivation[C]//Advances in Neural Information Processing Systems,2016:3675-3683.
[66] 徐志雄,曹雷,张永亮,等.基于动态融合目标的深度强化学习算法研究[J].计算机工程与应用,2019,55(7):157-161.
XU Z X,CAO L,ZHANG Y L,et al.Research on deep reinforcement learning algorithm based on dynamic fusion target[J].Computer Engineering and Applications,2019,55(7):157-161.
[67] 张俊杰,张聪,赵涵捷.重复利用状态值的竞争深度Q网络算法[J].计算机工程与应用,2021,57(4):134-140.
ZHANG J J,ZHANG C,ZHAO H J.Competitive deep Q network algorithm for reusing state values[J].Computer Engineering and Applications,2021,57(4):134-140.
[68] AVRACHENKOV K,BORKAR V S,DOLHARE H P,et al.Full gradient DQN reinforcement learning:a provably Convergent Scheme[J].arXiv:2103.05981,2021.
[69] HUI T S,ISHAK M K,MOHAMED M F P,et al.Balancing excitation and inhibition of spike neuron using Deep Q Network(DQN)[C]//Proceedings of the?5th International Conference on Electronic Design(ICED),2020.
[70] PAN J,WANG X,CHENG Y,et al.Multisource transfer double DQN based on actor learning[J].IEEE Transactions on Neural Networks & Learning Systems,2018,29(6):2227-2238.
[71] SILVER D,HUANG A,MADDISON C J,et al.Mastering the game of go with deep neural networks and tree search[J].Nature,2016,529:484-489.
[72] SILVER D,SCHRITTWIESER J,SIMONYAN K,et al.Mastering the game of Go without human knowledge[J].Nature,2017,550:354-359.
[73] 刘朝阳,穆朝絮,孙长银.深度强化学习算法与应用研究现状综述[J].智能科学与技术学报,2020,2(4):314-326.
LIU C Y,MU C X,SUN C Y.A review of the research status of deep reinforcement learning algorithms and applications[J].Journal of Intelligent Science and Technology,2020,2(4):314-326.
[74] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015.518:529-533.
[75] BADIA A P,PIOT B,UROWSKI S K P,et al.Agent57:Outperforming the Atari Human Benchmark[J].arXiv:2003.13350,2020.
[76] KEMPKA M,WYDMUC M,RUNC G,et al.Vizdoom:A doom-based AI research platform for visual reinforcement learning[C]//2016 IEEE Conference on Computational Intelligence and Games(CIG),Santorini,20-23 Sept,2016:1-8.
[77] VINYALS O,EWALDS T,BARTUNOV S,et al.Starcraft II:A new challenge for reinforcement learning[J].arXiv:1708.04782,2017.
[78] YE D,LIU Z,SUN M,et al.Mastering complex control in MOBA games with deep reinforcement learning[J].arXiv:1912.09729,2019.
[79] JADERBERG M,MNIH V,CZARNECKI W M,et al.Reinforcement learning with unsupervised auxiliary tasks[J].arXiv:1611.05397,2016.
[80] ZHU Y,MOTTAGHI R,KOLVE E,et al.Target driven visual navigation in indoor scenes using deep reinforcement learning[C]//2017 IEEE International Conference on Robotics and Automation(ICRA),Singapore,29 May-3 June,2017:3357-3364.
[81] KULHANEK J,DERNER E,BABUKA R.Visual navigation in real world indoor environments using end-to-end deep reinforcement learning[J].IEEE Robotics and Automation Letters,2020,3:4345-4352.
[82] 王毅然,经小川,田涛,等.基于强化学习的多Agent路径规划方法研究[J].计算机应用与软件,2019,36(8):165-171.
WANG Y R,JING X C,TIAN T,et al.Research on multi-agent path planning method based on reinforcement learning[J].Computer Applications and Software,2019,36(8):165-171.
[83] 梁宸.基于强化学习的多智能体协作策略研究[D].沈阳:沈阳理工大学,2020.
LIANG C.Research on multi-agent cooperative cooperative strategy based on reinforcement learning[D].Shenyang:Shenyang University of Technology,2020.
[84] FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counter factual multi-agent policy gradients[J].arXiv:1705.08926,2017.
[85] MAO H,GONG Z,NI Y,et al.ACCNet:Actor-coordinator-critic net for “learning-to-communicate” with deep multi-agent reinforcement learning[J].arXiv:1706.03235,2017.
[86] SUNEHAG P,LEVER G,GRUSLYS A,et al.Value-decomposition networks for cooperative multi-agent learning[J].arXiv:1706.05296,2017.
[87] IQBAL S,SHA F.Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the International Conference on Machine Learning,2019.
[88] 梁星星,冯旸赫,马扬,等.多Agent深度强化学习综述[J].自动化学报,2020,46(12):2537-2557.
LIANG X X,FENG Y H,MA Y,et al.A review on multi-agent deep reinforcement learning[J].Acta Automatica Sinica,2020,46(12):2537-2557.
[89] 李航、李国杰、汪可友.基于深度强化学习的电动汽车实时调度策略[J].电力系统自动化,2020,692(22):166-172.
LI H,LI G J,WANG K Y.Electric vehicle real-time scheduling strategy based on deep reinforcement learning[J].Automation of Electric Power System,2020,692(22):166-172.
[90] 赵婷婷,孔乐,韩雅杰,等.模型化强化学习研究综述[J].计算机科学与探索,2020,14(6):918-927.
ZHAO T T,KONG L,HAN Y J.A review of modeling reinforcement learning[J].Journal of Frontiers of Computer Science and Technology,2020,14(6):918-927.
[91] XIR R,MENGZ,WANG L,et al.Unmanned aerial vehicle path planning algorithm based on deep reinforcement learning in large-scale and dynamic environments[J].IEEE Access,2021,9:24884-24900.
[92] ZHAO W,QUERALTA J P,Westerlund T,et al.Sim-to-real transfer in deep reinforcement learning for robotics:a survey[C]//2020 IEEE Symposium Series on Computational Intelligence(SSCI),Dec 1-4,2021:737-744.

平台注册入口