Abstract
Unsignalized intersections are considered to be one of the most complex and challenging scenarios in autonomous driving. As deep reinforcement learning (DRL) continues to advance, much research has been devoted to its application within unsignalized intersections. The DRL algorithm requires a shaped reward function to achieve optimal performance, which demands extensive experimentation. In reality, humans not only receive reward-like feedback, but they also learn early strategies from demonstrations. Inspired by the learning paradigm, this paper proposes a method termed leveraging diminishing demonstrations in twin-delayed deep deterministic policy gradient (LDD-TD3) to mitigate the reliance of DRL on reward shaping. A risk assessment module was designed to evaluate the current environmental risk in the LDD-TD3, and a prediction-based driving strategy was developed to guide the agent’s actions away from high-risk scenarios during training, so that the agent could accumulate successful experiences and quickly acquire a rudimentary strategy. Demonstrations were gradually phased out as the training progressed, allowing the agent’s exploration process to identify superior strategies. The simulation results showed that the LDD-TD3 effectively overcame DRL’s dependence on reward shaping and increased the average success rate by 4.9% compared with the TD3 under a shaped reward setting. The proposed LDD-TD3 is expected to improve the decision-making abilities of autonomous vehicles at unsignalized intersections.
Get full access to this article
View all access options for this article.
