A method and system for mapless maze navigation based on reinforcement learning of spiking neural networks

By using a reinforcement learning method based on spiking neural networks, combined with visual odometry and path grid maps, the robot's path planning and obstacle avoidance in mazes were optimized, solving the problem of low navigation efficiency in existing technologies and achieving autonomous navigation and low power consumption.

CN116295415BActive Publication Date: 2026-06-16ZHEJIANG LAB +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG LAB
Filing Date
2023-03-02
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing mapless navigation algorithms based on reinforcement learning are inefficient and time-consuming in complex environments, and large-scale deep spiking neural networks lack effective training methods, making them difficult to apply in practical robots.

Method used

A reinforcement learning method based on spiking neural networks is adopted, combined with visual odometry of visible light camera and IMU for spatial localization, to build a path grid map, and to train the robot's path planning and obstacle avoidance in the maze through SNN-Actor and CNN Critic networks. The STBP algorithm is used for network weight training.

🎯Benefits of technology

It enables robots to autonomously find paths in maze environments, reduces power consumption, adapts to small mobile robot applications, and improves navigation efficiency and path selection optimization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116295415B_ABST
    Figure CN116295415B_ABST
Patent Text Reader

Abstract

A kind of map-free maze navigation method based on impulse neural network reinforcement learning, adopt a kind of path grid map data marked by odometer, radar information, robot state and target point information as input;Path grid map is established in the robot coordinate system, according to the path position in the robot odometer, the path marked grid map is updated, and the grid map information will be as the state input of robot;The impulse firing rate directly output by impulse neural network is as the left and right wheel control signal of differential mobile robot, and the autonomous navigation of mobile robot in complex map such as maze is completed.The present application also includes a kind of map-free maze navigation control system based on impulse neural network reinforcement learning.The present application can directly complete the car navigation task without establishing map, and the autonomous search navigation path task of car in maze can be completed by the aid of path grid map marked by odometer.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of mapless navigation for robots, and specifically to a mapless maze navigation method and system based on reinforcement learning of spiking neural networks. Background Technology

[0002] The purpose of robot navigation is to enable robots to move from their current location to a target location in the environment, while ensuring the safety of the robot and its surroundings. Currently, a large number of emerging robot navigation-related companies have been established, leading to specific industry applications such as robotic vacuum cleaners, logistics robots, and inspection robots. Simultaneously, robot navigation technologies have been expanded and derived in the fields of unmanned vehicles, environmental surveying, and unmanned aerial vehicles. Therefore, only robots with efficient and reliable navigation technologies can adapt to more complex application scenarios.

[0003] Most mature navigation algorithms are currently based on map-based methods, such as SLAM (Simultaneous Localization and Mapping) navigation algorithms. These algorithms pre-build a map model and match the robot's current environmental perception with the map model to solve the robot's localization problem. Then, path planning and motion control are used to navigate to the target point. Therefore, this method requires a detailed environmental model and real-time localization information to avoid collisions with other objects in the environment. Map-based navigation algorithms are unsuitable for unfamiliar environments. Navigation techniques that do not rely on pre-built maps are called mapless navigation. With the rise of deep learning, learning-based methods have gradually become a popular research direction for mapless navigation. The main methods are based on reinforcement learning and imitation learning to model the robot's navigation process. However, end-to-end navigation control algorithms based on reinforcement learning do not optimize the navigation path and speed control, therefore the selected navigation scheme to the target point is not optimal and is time-consuming.

[0004] Spiking neural network (SNN) models are third-generation neural network models, closely integrated with neuroscience. They use models that best fit the mechanisms of biological neurons for computation, thus more closely resembling the working mechanism of the human brain. With the development of neuromorphic chips, leveraging their computational advantages, deploying SNN control models on these chips can significantly reduce the power consumption of robots in neural network inference. However, due to the discontinuous values ​​of neurons in SNNs, there is currently no high-performing and biologically interpretable training method for large-scale, deep SNNs.

[0005] Reinforcement learning shows great promise in fields such as games and control, but most mapless navigation algorithms based on reinforcement learning are trained in relatively simple simulation environments, and their deployment in actual robots is also tested in simplified indoor scenarios. When facing complex environments, it is necessary to consider the representation of multimodal data information such as visible light images, depth images, and speech by the neural network control model trained by reinforcement learning. Summary of the Invention

[0006] This invention aims to overcome the shortcomings of existing technologies and provides a mapless maze navigation method based on spiking neural network reinforcement learning. It introduces a grid representation method for robot paths into the reinforcement learning of spiking neural networks, thereby enabling the robot to autonomously search for paths and navigate to the target point in the maze.

[0007] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0008] A mapless maze navigation method based on spiking neural network reinforcement learning, comprising the following steps

[0009] Step 1: Equip the mobile robot with a visible light camera, IMU, and LiDAR. Use visual odometry, which combines visible light and IMU, as the robot's spatial positioning method. Establish a robot coordinate system based on the odometry information and determine the navigation position point in this robot coordinate system.

[0010] Step 2: Select a square with a side length of about 0.1m to 1m as a single grid, and establish a path grid map in the robot coordinate system. Update the grid map of the path markers according to the path position in the robot odometry. The grid map information will be used as the robot's status input.

[0011] Step 3: Construct three types of training maps in the simulation platform according to the functions required by the navigation model, and establish the reward function of the mobile robot. Based on the state information obtained in Step 1 and Step 2, train the end-to-end spiking neural network of the mobile robot in the simulation platform using spiking neural network reinforcement learning. According to the speed command output by the spiking neural network, the robot autonomously navigates to the set target point in the maze.

[0012] Furthermore, the grid map described in step two contains two N*N matrices (initially 0). One matrix represents the number of times the robot passes through the grid, and the other matrix represents the path label of the mobile robot. The numbers in the grid represent the sequence of the mobile robot's travel path. The matrix of the grid map is represented by the following formula:

[0013]

[0014]

[0015] Where G represents a path raster map containing two N*N matrices M count With M route M count The matrix represents the number of times the robot passes through the grid, M. route A label matrix representing the robot's path; i t ,j t Indicates the robot's odometry position (x) t ,y t The position of an element in the matrix is ​​obtained by dividing by the side length L of the square grid and taking the integer part, i.e., i t =INT(x t / L),j t =

[0016] INT(y t / L); i t-1 ,j t-1 This indicates the position of the robot's odometry in the matrix at the previous moment;

[0017] Based on its current pose, the mobile robot takes information from the grid cells 90 degrees to the left and right of the robot's orientation direction in the grid cell where the robot is currently located as its input.

[0018] Furthermore, the state information mentioned in step three includes: 18-dimensional radar information with a step size of 10 degrees for the robot facing forward, 3-dimensional robot pose information, 2-dimensional velocity information, 2-dimensional target position information, and 20-dimensional data path grid map data. The two matrices are each 10-dimensional, of which 9 dimensions are the grid information at a distance L with a step size of 20 degrees for the robot facing forward, and 1 dimension is the information of the current grid.

[0019] Furthermore, in step three, the simulation platform needs to establish three different types of training maps, which are divided into three types according to the functions required by the spiking neural network control model: a training map mainly for target navigation, a training map mainly for obstacle avoidance, and a map mainly for maze path search.

[0020] Furthermore, the spiking neural network described in step three employs an SNN Actor network and a CNN Critic network. The spiking neural network outputs the number of pulses fired in the left and right directions of the mobile robot based on the observed state, converting this into the robot's linear and angular velocities to control the robot's movement towards the optimal direction. The observed state includes pose information, velocity information, radar information, and path grid information. The Critic network outputs a value of <state, action> based on the pulse firing information output by the spiking neural network and the observed information of the current state, which serves as the basis for the loss of the spiking neural network during training.

[0021] Furthermore, the spiking neural network comprises four fully connected layers, using the LIF (Leaky Intergrate and Fired) model as the neuron connection module between the fully connected layers. The output contains two neurons, and the output spiking rate is used as the basis for calculating the vehicle speed.

[0022] Furthermore, the reward function R(s) of the mobile robot in step three... t ,a t The expression is as follows:

[0023]

[0024] Where R goal >0,R obstacle <0 indicates the reward setting when the mobile robot approaches a target or obstacle; where D t T represents the distance between the robot and the target point. goal This represents the threshold for determining whether the target point has been reached. t T represents the distance between the robot and the obstacle. obstacle This indicates the threshold at which the robot is about to touch an obstacle; A1 and A2 represent the magnitude coefficients of the reward, V t The relationship between the grid map value of the next path and the current grid map value, depending on the robot's current direction of travel, is determined by the following formula:

[0025] V t =ρ*(M count (i t ,j t )-M count (i t+1 ,j t+1 ))+(1-ρ)(M route (i t ,j t )-M route (i t+1 ,j t+1 (4)

[0026] Where ρ is the relaxation coefficient between the two reward terms, and its value ranges from [0,1].

[0027] Furthermore, the output of the spiking neural network uses the pulse firing rate encoding of the cumulative pulse firing count as the basis for the speed control of the mobile robot.

[0028] Furthermore, the spiking neural network uses the STBP algorithm to perform gradient backpropagation to complete the training of network weights.

[0029] This invention also provides a mapless maze navigation and control system based on spiking neural network reinforcement learning, including a mobile robot, a device equipping the mobile robot with a visible light camera, an IMU, and a lidar, and further including:

[0030] The navigation position point determination module is used to establish a robot coordinate system based on odometry information by using visual odometry combined with visible light and IMU as the robot's spatial localization method, and to determine the navigation position point in this robot coordinate system.

[0031] The path grid map module selects squares with side lengths of approximately 0.1m to 1m as individual grids, establishes a path grid map in the robot coordinate system, updates the grid map of path markers based on the path position in the robot's odometry, and uses the grid map information as robot input.

[0032] The navigation control module is used to train an end-to-end spiking neural network control model for the mobile robot in the simulation platform based on the state information obtained by the navigation position point determination module and the path grid map module. According to the speed command output by the spiking neural network control model, the robot autonomously navigates in the maze to the set target point.

[0033] Compared with the prior art, the positive effects of the present invention are as follows:

[0034] 1) This invention adopts a simple path map representation method. Without pre-establishing a map, it can autonomously find the path to the target point in a maze environment based on historical path records during the robot's gradual exploration.

[0035] 2) This method uses SNN-Actor based on spiking neural networks as a strategy, which can be adapted to neuromorphic chips. Due to its low power consumption, it is more suitable for small mobile robots in actual robot applications. Attached Figure Description

[0036] Figure 1 This is a schematic diagram of the network training process of the method of the present invention.

[0037] Figure 2 The robot path and matrix M of the present invention route A diagram illustrating the relationship between the two.

[0038] Figures 3a-3c This is a schematic diagram of the robot training simulation environment for the method of the present invention. Figure 3a This is training scenario 1. Figure 3b This is training scenario 2. Figure 3c This is training scenario 3.

[0039] Figures 4a-4bThis is a schematic diagram illustrating the navigation test of the trained model in a maze using the method of the present invention. Figure 4a It is a maze environment built in the simulation platform. Figure 4b This is the navigation result of testing the method of this invention in a maze.

[0040] Figure 5 This is a system structure diagram of the present invention. Detailed Implementation

[0041] To further illustrate the technical solution of the present invention in detail, this embodiment is implemented based on the technical solution of the present invention, and detailed implementation methods and specific steps are given in conjunction with the accompanying drawings and examples.

[0042] Example 1

[0043] like Figure 1 The diagram shows the overall flowchart of the method of the present invention, a mapless maze navigation method based on spiking neural network reinforcement learning, comprising the following steps:

[0044] Step 1: The mobile robot uses visual odometry that combines visible light and IMU as its spatial localization method. Specifically, the VINS-MONO algorithm is used to realize the odometry function that combines vision and IMU. The pose of the mobile robot and the coordinates of the navigation target point are determined based on the odometry coordinate system. The mobile robot is equipped with a single-line radar to obtain radar information of 180 degrees in the direction the robot is facing.

[0045] Step Two: In this training example, a square with a side length of L = 1m is selected as a single map grid. The position of the robot's current coordinates in the grid map can be determined using the following formula:

[0046] i t =INT(x t / L),j t =INT(y t / L)

[0047] Where i t ,j t Indicates the element position in the raster matrix, (x t ,y t () indicates the position of the robot's odometry.

[0048] The size of the grid map matrix is ​​determined based on the actual operating scenario of the mobile robot. In this example, a 10*10 matrix is ​​selected to represent the grid map. Based on the currently acquired element positions and stored historical data, the path grid map is updated according to the following formula:

[0049]

[0050]

[0051] Where G represents a path raster map containing two N*N matrices M count With M route M count The matrix represents the number of times the robot passes through the grid, M. route A label matrix representing the robot's path; i t-1 ,j t-1 This indicates the position of the robot's odometry in the matrix at the previous moment;

[0052] like Figure 2 The diagram shows the position of the mobile robot after it has been transformed into the positions of the matrix elements. As the robot navigates to the target point, M... route The final state of the matrix data, M route The numerical value of M represents the path sequence numbering during robot exploration of the maze. The path numbering sequence indicates which maze areas have been explored and guides the robot back to the branching nodes in the maze to explore other unknown areas. count Matrix and M route Similar, but it stores the number of times the robot has passed through that grid, M count It identifies which areas have been explored repeatedly, and its matrix is ​​mainly used to prevent robots from navigating to these repeatedly explored areas;

[0053] Step 3: This example uses the Deep Deterministic Policy Gradient Algorithm (DDPG) to construct the SNN-Actor policy network model based on the spiking neural network and the Critic network based on CNN. The spiking neural network consists of four fully connected layers, using a LIF model as the neuron connection module between the fully connected layers (the firing threshold of LIF neurons is set to 0.5, and the delay parameter is set to 0.8). The number of neurons in each hidden layer is 256. The input state is the robot facing forward, taking 18-dimensional radar information, 3-dimensional robot pose information, 2-dimensional velocity information, 2-dimensional target position information, and 20-dimensional data path grid data in 10-degree steps. The input state is normalized and used as the input of the spiking neural network using a 20-time-step Poisson encoding method. The output contains two neurons. The firing rate of the output neurons is calculated by dividing the number of pulses fired by each time step to obtain the final firing rate of the two neurons. The firing rate of the two neurons is used as the basis for controlling the left velocity of the mobile robot. The average firing rate of the two neurons is used to control the linear velocity of the robot, and the difference in the firing rate of the two neurons is used as the basis for controlling the robot's turning angle. The Critic network uses four fully connected layers and ReLU as the activation function. Its input includes the robot input state without Poisson encoding and the firing rate of the spiking neural network output. The network output is used as the Q-value for training the spiking neural network.

[0054] The reward function R(s) used in training t ,a t The expression for ) is:

[0055]

[0056] Where R goal >0,R obstacle <0 indicates the reward setting when the mobile robot approaches a target or obstacle; where D t T represents the distance between the robot and the target point. goal This represents the threshold for determining whether the target point has been reached. t T represents the distance between the robot and the obstacle. obstacle This indicates the threshold at which the robot is about to touch an obstacle; A1 and A2 represent the magnitude coefficients of the reward, in this example A1 = 1.0, A2 = 0.5, V t The relationship between the grid map value of the next path and the current grid map value, depending on the robot's current direction of travel, is determined by the following formula:

[0057] V t =ρ*(M count (i t ,j t )-M count (i t+1 ,j t+1 ))+(1-ρ)(M route (i t ,j t )-M route (i t+1 ,j t+1 ))(4)

[0058] Where ρ is the relaxation coefficient between the two reward terms, and its value ranges from [0,1]. In this example, ρ = 0.3.

[0059] In this example, a training scenario as shown in Figure 3 was created in the robot simulation platform Gazebo. Figure 3a For training scenario 1, where the primary objective is navigation, the trained model can be equipped with the ability to navigate to the target. Once navigation functionality is established, further methods can be employed... Figure 3b Training scenario 2 was used to train the model's obstacle avoidance function, and finally... Figure 3c In training scenario 3, a navigation model with path search was trained in a simple maze scenario. The model was trained for 100, 300, and 500 epochs in the three scenarios, respectively. The initial position and target point were determined by a random function during the training process.

[0060] The reinforcement learning algorithm is based on the DDPG reinforcement learning algorithm. In the simulation environment, the robot's motion state is simulated according to the output of the spiking neural network. The input state, the output of the spiking neural network, the robot's input state and reward value in the simulation environment are saved into the experience pool. In the continuous simulation interaction, when the reinforcement learning experience pool reaches 100,000 times, the experience is replayed and the weights are trained and updated. The spiking neural network uses the STBP (Spatio-Temporal Backpropagation) algorithm to replace the non-differentiable spiking output with an approximate differentiable function, and the network parameters are optimized using the stochastic gradient descent algorithm.

[0061] To verify the effectiveness of the method, after model training was completed, the present invention established a simulation on the Gazebo platform as follows. Figure 4a A simulation experiment was conducted on the maze, and the results are as follows: Figure 4b As shown, the experimental results and navigation path demonstrate that the mobile robot was able to autonomously explore the maze step by step and eventually navigate to the target point in multiple trials.

[0062] Example 2

[0063] Reference Figure 5 The present invention also provides a mapless maze navigation control system based on spiking neural network reinforcement learning to implement the mapless maze navigation method based on spiking neural network reinforcement learning in Embodiment 1. The system includes a mobile robot, a device equipping the mobile robot with a visible light camera, an IMU, and a lidar, and further includes:

[0064] The navigation position point determination module is used to establish a robot coordinate system based on odometry information by using visual odometry combined with visible light and IMU as the robot's spatial localization method, and to determine the navigation position point in this robot coordinate system.

[0065] The path grid map module selects squares with side lengths of approximately 0.1m to 1m as individual grids, establishes a path grid map in the robot coordinate system, updates the grid map of path markers based on the path position in the robot's odometry, and uses the grid map information as robot input.

[0066] The navigation control module is used to train an end-to-end spiking neural network control model for the mobile robot in the simulation platform based on the state information obtained by the navigation position point determination module and the path grid map module. According to the speed command output by the spiking neural network control model, the robot autonomously navigates in the maze to the set target point.

[0067] Example 3

[0068] The present invention also provides a computing device, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, it implements a mapless maze navigation method based on spiking neural network reinforcement learning according to Embodiment 1.

[0069] At the hardware level, the computing device includes a processor, internal bus, network interface, memory, and non-volatile memory, and may also include other hardware required for business operations. The processor reads the corresponding computer program from the non-volatile memory into memory and then executes it to achieve the above. Figure 1 The method described herein. Of course, in addition to software implementation, this invention does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. That is to say, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.

[0070] Improvements in a technology can be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many improvements to the methodology can now be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that an improvement in methodology cannot be implemented using hardware physical modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program and "integrate" a digital system onto a PLD themselves, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, VHDL (Very-High-Speed ​​Integrated Circuit Hardware Description Language) and Verilog are the most commonly used. Those skilled in the art should understand that by simply performing some logic programming on the method flow using one of these hardware description languages ​​and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.

[0071] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.

[0072] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0073] For ease of description, the above apparatus is described by dividing it into various functional units. Of course, in implementing this invention, the functions of each unit can be implemented in one or more software and / or hardware components.

[0074] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0075] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0076] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0077] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0078] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0079] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0080] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0081] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0082] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0083] This invention can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a specific task or implement a specific abstract data type. This invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0084] The various embodiments in this invention are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.

[0085] The above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit them. Those skilled in the art can modify or make equivalent substitutions to the technical solutions of the present invention. The scope of protection of the present invention is defined by the claims.

Claims

1. A mapless maze navigation method based on spiking neural network reinforcement learning, comprising the following steps: Step 1: Equip the mobile robot with a visible light camera, IMU, and LiDAR. Use visual odometry, which combines visible light and IMU, as the robot's spatial positioning method. Establish a robot coordinate system based on the odometry information and determine the navigation position point in this robot coordinate system. Step 2: Select a square with a side length of 0.1m to 1m as a single grid, and establish a path grid map in the robot coordinate system. Update the grid map of the path markers according to the path position in the robot odometry. The grid map information will be used as the robot's status input. Step 3: Construct three types of training maps in the simulation platform according to the functions required by the navigation model, and establish the reward function of the mobile robot. Based on the state information obtained in Step 1 and Step 2, train the end-to-end spiking neural network of the mobile robot in the simulation platform using spiking neural network reinforcement learning. According to the speed command output by the spiking neural network, the robot autonomously navigates to the set target point in the maze. The raster map described in step two contains two N's. The N matrix is ​​initialized to 0. One matrix represents the number of times the robot traverses a grid cell, and the other matrix represents the path labels of the mobile robot. The numbers in the grid cells represent the sequence of the mobile robot's travel paths. The grid map matrix is ​​represented by the following formula: (1) (2) Where G represents a path raster map containing two Ns The matrix of N and ,in The matrix represents the number of times the robot passes through the grid. A label matrix representing the robot's path; Indicates the position of the robot's odometer The position of an element in the matrix is ​​obtained by dividing by the side length L of the square grid and taking the integer part. ; This indicates the position of the robot's odometry in the matrix at the previous moment; Based on its current pose, the mobile robot takes information from the grid cells 90 degrees to the left and right of the robot's orientation direction in the grid cell where the robot is currently located as its input.

2. The method as described in claim 1, wherein the status information in step three includes: The robot takes 18-dimensional radar information, 3-dimensional robot pose information, 2-dimensional velocity information, 2-dimensional target position information, and 20-dimensional data path grid map data in a step size of 10 degrees as it faces forward. The two matrices are each 10-dimensional, with 9 dimensions representing the grid information at a distance L when the robot is facing forward in a step size of 20 degrees, and 1 dimension representing the information of the current grid cell.

3. The method as described in claim 1, characterized in that, In step three, the simulation platform needs to establish three different types of training maps, which are divided into three types according to the functions required by the spiking neural network control model: training maps mainly for target navigation, training maps mainly for obstacle avoidance, and training maps mainly for maze path search.

4. The method as described in claim 1, characterized in that, The spiking neural network described in step three employs an Actor network (SNN) and a Critic network (CNN). The spiking neural network outputs the number of pulses fired in the left and right directions of the mobile robot based on the observed state, converting this into the robot's linear and angular velocities to control the robot's movement towards the optimal direction. The observed state includes pose information, velocity information, radar information, and path grid information. The Critic network outputs a value of <state, action> based on the pulse firing information output by the spiking neural network and the observed information of the current state, which serves as the loss criterion for the spiking neural network during training.

5. The method as described in claim 4, characterized in that, The spiking neural network consists of four fully connected layers. The LIF model is used as the neuron connection module between the fully connected layers. The output contains two neurons, and the output spiking rate is used as the basis for calculating the speed of the vehicle.

6. The method as described in claim 4, characterized in that, The reward function for the mobile robot in step three. The expression is as follows: (3) in This refers to the reward settings when the mobile robot approaches a target or obstacle; among which, This indicates the distance between the robot and the target point. This represents the threshold for determining whether the target point has been reached. Indicates the distance between the robot and the obstacle. This indicates that the robot is at the threshold of touching an obstacle; This represents the magnitude coefficient of the reward. The relationship between the grid map value of the next path and the current grid map value, depending on the robot's current direction of travel, is determined by the following formula: (4) in The relaxation coefficient between the two reward terms, with a range of values. .

7. The method as described in claim 4, characterized in that, The pulse neural network output uses the pulse firing rate encoding of the cumulative pulse firing count as the basis for mobile robot speed control.

8. The method as described in claim 4, characterized in that, The aforementioned spiking neural network uses the STBP algorithm for gradient backpropagation to train the network weights.

9. A mapless maze navigation and control system based on spiking neural network reinforcement learning, comprising a mobile robot, and a device equipping the mobile robot with a visible light camera, an IMU, and a lidar, characterized in that, include: The navigation position point determination module is used to establish a robot coordinate system based on odometry information by using visual odometry combined with visible light and IMU as the robot's spatial localization method, and to determine the navigation position point in this robot coordinate system. The path grid map module selects squares with side lengths of 0.1m to 1m as individual grids to establish a path grid map in the robot coordinate system. The grid map of path markers is updated based on the path position recorded by the robot's odometry. This grid map information serves as input to the robot. The grid map contains two N... The N matrix is ​​initialized to 0. One matrix represents the number of times the robot traverses a grid cell, and the other matrix represents the path labels of the mobile robot. The numbers in the grid cells represent the sequence of the mobile robot's travel paths. The grid map matrix is ​​represented by the following formula: (1) (2) Where G represents a path raster map containing two Ns The matrix of N and ,in The matrix represents the number of times the robot passes through the grid. A label matrix representing the robot's path; Indicates the position of the robot's odometer The position of an element in the matrix is ​​obtained by dividing by the side length L of the square grid and taking the integer part. ; This indicates the position of the robot's odometry in the matrix at the previous moment; The mobile robot takes the information of the grid cells 90 degrees to the left and right of the robot's orientation direction from the grid cell where the robot is currently located as the robot's input, based on the current pose state. The navigation control module is used to train an end-to-end spiking neural network control model for the mobile robot in the simulation platform based on the state information obtained by the navigation position point determination module and the path grid map module. According to the speed command output by the spiking neural network control model, the robot autonomously navigates in the maze to the set target point.