Deep reinforcement learning-based low-speed vehicle following decision-making method

A technology that reinforces learning and decision-making methods, applied in vehicle position/route/altitude control, motor vehicles, two-dimensional position/airway control, etc., can solve problems such as gaps, improve fidelity, improve driving comfort and traffic Effects of security, strong versatility and flexibility

Active Publication Date: 2019-01-15
SOUTHEAST UNIV
2 Cites 21 Cited by

AI-Extracted Technical Summary

Problems solved by technology

The movement of cells is discrete in space and time. This method is mainly used in...
View more

Abstract

The invention discloses a deep reinforcement learning-based low-speed vehicle following decision-making method, which is implemented in the following manner: at first, receiving position, speed and acceleration information of front and back vehicles as an environmental state in real time through the Internet of vehicles, and expressing a present state and behavior of an unmanned vehicle; then, constructing an Actor-Critic framework-based deep reinforcement learning structure; and finally, selecting, by Actor, an appropriate action according to the present environmental state, and continuouslyperforming training and learning through an evaluation made by Critic, thereby obtaining an optimal control strategy to ensure that the unmanned vehicle can be kept at a certain safe distance away from the front and back vehicles and implement automatic low-speed running of the vehicle following the front vehicle under an urban congestion working condition. According to the deep reinforcement learning-based low-speed vehicle following decision-making method, the driving comfort is improved, the traffic safety is also ensured, and the clarity of a congested lane is further improved.

Application Domain

Position/course control in two dimensionsVehicles

Technology Topic

Automotive engineeringLow speed +6

Image

  • Deep reinforcement learning-based low-speed vehicle following decision-making method
  • Deep reinforcement learning-based low-speed vehicle following decision-making method
  • Deep reinforcement learning-based low-speed vehicle following decision-making method

Examples

  • Experimental program(1)

Example Embodiment

[0043] The present invention will be further described in detail below in conjunction with the drawings and specific embodiments:
[0044] The present invention provides a vehicle low-speed car-following decision-making method based on deep reinforcement learning. The vehicle low-speed car-following decision-making method based on deep reinforcement learning not only improves driving comfort, but also ensures traffic safety, and improves traffic jams. Flow rate
[0045] In this embodiment, as figure 1 The frame diagram shown shows the specific process of this embodiment:
[0046] Step 101: Receive the position, speed, and acceleration information of the front vehicle and the rear vehicle in real time through the Internet of Vehicles, as the environment state, express the current state and behavior of the unmanned vehicle, which specifically includes:
[0047] (1) The position, speed, and acceleration information of the three vehicles in front received in real time through the Internet of Vehicles is expressed as x f1 , V f1 , A f1 , X f2 , V f2 , A f2 , X f3 , V f3 , A f3 , Where f 1 Is the closest car in front of the unmanned vehicle, f 2 , F 3 And so on; the position, speed, and acceleration information of the vehicle behind is expressed as x r , V r , A r;
[0048] (2) Express the environmental state as E(x f1 ,v f1 ,a f1 ,x f2 ,v f2 ,a f2 ,x f3 ,v f3 ,a f3 ,x r ,v r ,a r );
[0049] (3) Express the current state of the unmanned vehicle as C(x, v), where x is the position of the unmanned vehicle in the current state, and v is the speed of the unmanned vehicle in the current state; express the behavior of the unmanned vehicle Is A(a), a is the acceleration of unmanned vehicle driving, in order to simulate the behavior of unmanned vehicle under low-speed car following more realistically, a needs to satisfy -3≤θ a ≤3, and the acceleration values ​​are continuous, the unit is m/s 2.
[0050] Step 102, such as figure 2 As shown, a deep reinforcement learning structure based on the Actor-Critic framework is constructed. The structure takes the environment state and the current state of the unmanned vehicle as input, and the acceleration of the unmanned vehicle as the output, including:
[0051] (1) Construct a 4-layer deep convolutional neural network with the same structure for Actor and Critic. The network consists of 1 convolutional layer, 2 fully connected layers and output layer. The activation functions of the first 3 layers are Relu Function, its expression is f(x)=max(0,x);
[0052] (2) The state of the environment and the current state of the unmanned vehicle first obtain an intermediate feature vector through the convolutional layer with a convolution kernel of 5×1, and then through the transformation of two fully connected layers with 16 and 8 nodes respectively, the output The behavior of unmanned vehicles.
[0053] Step 103: Train the parameters of the Actor network and the Critic network in the deep reinforcement learning structure, such as image 3 As shown, the specific steps include:
[0054] (1) Actor selects the appropriate action a according to the current environment state s, after obtaining the reward r by calculating the reward function, the state transfers from s to s′, combining s, a, r, s′ into a tuple τ=(s ,a,r,s′) and store it in the experience replay pool D, where the reward r is determined by the distance between the unmanned vehicle and the vehicle in front of x f1 -x, x f2 -x, x f3 -x, the distance between the unmanned vehicle and the rear vehicle x-x r And the acceleration a of the unmanned vehicle,
[0055] Among them, since the closer vehicles have a greater impact on the driving of unmanned vehicles, w 1 w 2 w 3 , While satisfying
[0056] (2) The unmanned vehicle will follow step (3.1) at low speed until it reaches the designated number of steps T;
[0057] (3) Update Critic network parameters θ v;
[0058] (4) Update Actor network parameters θ μ;
[0059] (5) Repeat steps (3) to (4) until the iteration reaches the maximum number of steps or the loss value is less than a given threshold.
[0060] Specifically, step (3) update the Critic network parameter θ v , Including steps:
[0061] (1) Randomly sample n tuples τ from the experience playback pool D i =(s i ,a i ,r i ,s′ i );
[0062] (2) For each τ i , Calculate y i =r i +γV(s′ i |θ v );
[0063] (3) Update θ v , which is
[0064] Specifically, step (4) Update Actor network parameters θ μ , Including steps:
[0065] (5.1) Randomly sample n tuples τ from the experience playback pool D j =(s j ,a j ,r j ,s′ j );
[0066] (5.2) For each τ j , Calculate δ j =r j +γV(s′ j |θ v )-V(s i |θ v );
[0067] (5.3) Update θ μ , which is
[0068] The above are only preferred embodiments of the present invention, and are not intended to limit the present invention in any other form.
[0069] Any modification or equivalent change made according to the technical essence of the present invention still belongs to the scope of protection claimed by the present invention.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

Traditional Chinese medicine acupuncture and massage device

InactiveCN114366584AImprove realism
Owner:濮阳市第三人民医院(濮阳市口腔医院)

Unity3D visual sensor simulation method suitable for ROS system

PendingCN114011066AImprove realismSolve data interaction problems
Owner:上海机器人产业技术研究院有限公司 +1

System and method for realizing intelligent class grouping based on shift chain

PendingCN114266458AStrong flexibility and versatility
Owner:成都佳发安泰教育科技股份有限公司

VR intelligent equipment capable of improving comfort

ActiveCN113577756AImprove comfortImprove realism
Owner:江苏赋助智能科技有限公司

Classification and recommendation of technical efficacy words

  • Strong flexibility and versatility
  • Improve realism

System and method for realizing intelligent class grouping based on shift chain

PendingCN114266458AStrong flexibility and versatility
Owner:成都佳发安泰教育科技股份有限公司

A space-based observation simulation system and method targeted at satellite targets and fixed star targets

InactiveCN105466477AImprove realism
Owner:ACAD OF OPTO ELECTRONICS CHINESE ACAD OF SCI

User interaction method and system based on virtual reality

InactiveCN106681479AImprove realism
Owner:FANTEM TECH SHEN ZHEN CO LTD

Recognition emotion-based chatting robot system and judgement method of system

PendingCN108009490Aimprove intelligenceImprove realism
Owner:宁波高新区锦众信息科技有限公司

Multi-target signal generation method and RF multi-target signal source

ActiveCN105891791AImprove realismMeet the needs of radar testing
Owner:BEIJING ZHENXING METROLOGY & TEST INST

3D projection rendering method based on fused electrostatic force and vibro-tactile reproduction device

ActiveCN110032281AImprove realismgood tactile reproduction
Owner:JILIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products