Method for recognizing behavior based on spiking neural network
By using a behavior recognition method based on spike neural networks, the problems of computational density and high energy consumption on edge devices are solved, enabling efficient recognition of human behavior on edge devices, reducing inference latency and improving energy efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 贵州省通信产业服务有限公司
- Filing Date
- 2026-02-05
- Publication Date
- 2026-06-19
AI Technical Summary
Existing behavior recognition methods based on artificial neural networks are computationally intensive and energy-intensive on edge devices, making them difficult to deploy effectively. They also require a large amount of training data, leading to increased latency and communication costs.
A behavior recognition method based on spike neural networks is adopted. The radar data is compressed and encoded for preprocessing, converted to the time-frequency domain using short-time Fourier transform, and a convolutional spike neural network architecture is constructed. The spatial and temporal features of the action are extracted by using lateral competitive inhibition mechanism and long-term inhibition mechanism, and finally recognized by a classifier.
It enables efficient recognition of human behavior on edge devices, reduces inference latency, improves computational and energy efficiency, and is suitable for neuromorphic deployment on edge devices.
Smart Images

Figure CN122241341A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of behavior recognition technology, and in particular to a behavior recognition method based on spike neural networks. Background Technology
[0002] Currently, classic machine learning techniques, including artificial neural networks (ANNs) and deep learning models, have been used to recognize data from visual sensors. However, besides privacy concerns, existing methods have another significant drawback: they are not tailored to end-to-end behavior recognition on edge devices. In various industrial sectors such as the Internet of Things (IoT), robotics, healthcare, and retail, there are numerous low-power devices at the network edge, and there is a method that utilizes the available computing cycles on these devices. The advantage of this method over mainstream approaches is that data does not need to be sent upstream through the network to the computing infrastructure, thus reducing latency and communication costs.
[0003] However, the existing methods mentioned above require a large amount of data for training and are computationally / memory intensive, making them too heavy for edge devices. Pre-trained compressed models can be deployed on constrained devices, but this does not avoid the costs incurred during training, the requirement for a large amount of training data, and they often sacrifice accuracy during compression.
[0004] Over the past two decades, radar-based human perception technology has become a research hotspot. Researchers are constantly exploring and designing more covert radar systems to achieve non-contact physiological detection and gesture or activity recognition. Due to the non-invasive nature of radar sensors, they are now being used more and more widely in the field of action recognition.
[0005] Researchers have conducted in-depth studies on the micro-Doppler effect, which is applicable not only to rigid bodies such as pendulum motion and rotor rotation, but also to capturing the unique characteristics of non-rigid body motions such as walking humans, flapping birds, and quadrupeds. Related research covers simulation analysis and theoretical modeling, promoting the application of micro-Doppler technology in human motion and fall detection.
[0006] For example, some studies have used dual-pulse Doppler radar (RCR) to collect data and successfully detected falls in the elderly through deep learning networks. In addition, some studies have tried to use radar combined with deep convolutional autoencoders to identify independent and assisted activities. Meanwhile, Google's "Project Soli" uses micro-radar to identify various finger movements, demonstrating the application potential in the field of gesture recognition.
[0007] Human movement typically exhibits highly complex patterns. For example, during walking, small movements like arm swings (micro-motion) trigger additional frequency shifts. When electromagnetic waves interact with the moving human body, the reflected signal not only contains the Doppler frequency shift caused by overall movement but also displays micro-Doppler characteristics induced by subtle limb movements. In the time-frequency domain, the positive and negative Doppler frequencies exhibited by different actions and body parts collectively constitute the unique micro-Doppler signature of the action. This time-frequency characteristic can be used to distinguish different types of actions or behaviors.
[0008] In recent years, research has also combined deep learning with radar micro-Doppler features to develop efficient recognition methods. For example, short-time Fourier transform (STFT) is used to extract time-frequency maps, which are then input into dual-stream 1D-CNN and BiGRU models for recognition.
[0009] In addition, some studies have improved radar signal quality and classification performance PLOS by noise removal (based on the minimum entropy criterion) and cross-residual CNN model (CRCNN). Other methods have proposed time-frequency representation with adaptive resolution to improve the quality of micro-Doppler spectral representation, thereby improving the accuracy of deep learning-based human action recognition.
[0010] On the other hand, Spiking Neural Networks (SNNs), with their bio-inspired architecture, have demonstrated excellent energy efficiency and are gradually becoming a new direction for radar data processing. SNN models mimic the mammalian nervous system and achieve efficient parallel processing through sparse spike communication, exhibiting significant energy advantages. Recent research has proposed end-to-end SNN frameworks (such as Spike-HAR and Spike-HAR++) for event-driven human action recognition. These models are small in size and have extremely low energy consumption, requiring only 0.03–0.06 mJ to complete action recognition, demonstrating their broad applicability in edge devices. Therefore, based on the problems existing in the prior art, this invention proposes a behavior recognition method based on spike neural networks that does not require inference through the network, thereby reducing inference latency. Summary of the Invention
[0011] The purpose of this invention is to address the shortcomings of existing technologies by proposing a behavior recognition method based on spike neural networks.
[0012] To achieve the above objectives, the present invention adopts the following technical solution: The behavior recognition method based on spike neural networks includes the following steps: S1: Compress and encode radar data to speed up computation; S2: Use the short-time Fourier transform to convert the time-domain data from the two channels into the time-frequency domain. The short-time Fourier transform provides a complex matrix. S3: Perform a modulo operation on the short-time Fourier transform matrix to obtain a real-valued matrix. Then, continuously convert the real matrix into a grayscale image and use an appropriate threshold to convert it into a binary matrix to obtain a binary matrix. S4: This binary matrix will be used as the input to the spike neural network, where each column becomes the input at the i-th time point (i=1…T); S5: The overall architecture of the spike neural network is constructed through a set of "category-specific" filter blocks; S6: Each filter block competes with each other through a lateral competition suppression mechanism. The input action sequence is preprocessed to form multiple spike frames. Each frame is connected to the convolutional layer pixel by pixel through a sliding window. S7: The sliding window moves row by row, and each pixel is connected to a neuron in the first filter block of the convolutional layer. After traversal, all pixels are connected to the filter. S8: Subsequent spike frames are connected to the same filter in the same way, and if there are multiple convolutional spike layers, the connection method between consecutive layers is also used. The number of layers can be flexibly set according to the spatial feature complexity of the dataset. S9: Spatial and temporal features corresponding to each action from multiple convolutional peak layers are input into the classifier layer in the form of peak counts in the temporal window direction to ultimately identify the action.
[0013] Preferably, in S1, the radar data is compressed and encoded preprocessed by the first component to speed up the calculation, and the second component contains multiple spike layers that extract spatial features from the input spike data.
[0014] Furthermore: In S2, the time-domain signal The short-time Fourier transform (STFT) is given by the following equation: ; in It is the selected time window.
[0015] A further preferred embodiment: In step S4, the number of time periods in the spectrum is calculated using the following formula: ; Where N is the total sample data. It is the window length used for STFT calculation. It is the number of overlapping data points.
[0016] As a preferred embodiment of the present invention, the switching mechanism of the filter block in step S5 includes the following steps: A1: In order to capture intra-frame spatial juxtaposition patterns of the same action category, the spike neural network creates multiple filters within each filter block and connects them through a "switching node", which is equivalent to a special LIF neuron. A2: During training, the switching node applies suppression to all filters within a filter block, forcing only one filter to be activated while the rest become inactive. A3: The duration of suppression is controlled by an adjustable suppression strength parameter. After suppression ends, all filters compete again, and the filter that produces the largest output spike is regarded as the "winner", realizing a winner-takes-all mechanism. A4: This switching process is repeated according to the set decay time constant, ensuring that all filters have a chance to be activated during the training phase.
[0017] As a further preferred embodiment of the present invention, in step S6, the long-term suppression between filter blocks includes the following steps: B1: The spike neural network applies long-term lateral inhibition between different filter blocks to ensure that only one filter block participates in learning at each time step; B2: The spike neural network randomly initializes the weights of each filter block. For each action category, the first block to obtain the maximum spike response is considered the winner. This block then sends a stronger inhibition signal to other blocks, so that they remain in an inhibited state during training for that category. B3: When a certain pattern repeats multiple times in a short period of time, a shorter switching period can be set for the filter block.
[0018] As a further aspect of the present invention: in step S9, the time feature extraction includes the following steps: C1: Divide the entire action sequence into multiple equal-length windows; C2: Count the number of peaks within each time window as a time series feature.
[0019] Based on the aforementioned scheme: In B3, different switching cycles can be set for each filter block according to different action categories to adapt to the duration and spatial pattern complexity of the action.
[0020] The beneficial effects of this invention are as follows: 1. This behavior recognition method based on spike neural networks proposes a paradigm of neuromorphic computing and applies it to radar data to learn and recognize human behavior. It realizes a novel convolution-based spike neural network that can learn the spatial and temporal features of actions. The advantage of this method is that it uses neuromorphic concepts and spike neural networks, which can be deployed on the neuromorphic edge connected to the radar. In this way, the data does not need to be inferred through the network, thereby reducing inference latency and effectively improving the efficiency of behavior recognition methods in terms of computation and energy use. Attached Figure Description
[0021] Figure 1 This is a flowchart illustrating the behavior recognition method based on spike neural networks proposed in this invention. Figure 2 This is a schematic diagram of the data preprocessing structure for the behavior recognition method based on spike neural networks proposed in this invention; Figure 3 This is a schematic diagram of the convolutional spike layer (CSNN) process of the behavior recognition method based on spike neural network proposed in this invention. Detailed Implementation
[0022] The technical solution of the present invention will be further described in detail below with reference to specific embodiments.
[0023] Example 1: A behavior recognition method based on spike neural networks, such as Figure 1-3 As shown, it includes the following steps: S1: Compress and encode radar data for preprocessing to speed up computation; S2: Use the short-time Fourier transform to convert the time-domain data from the two channels into the time-frequency domain. The short-time Fourier transform provides a complex matrix. S3: Perform a modulo operation on the short-time Fourier transform matrix to obtain a real-valued matrix. Then, continuously convert the real matrix into a grayscale image and use an appropriate threshold to convert it into a binary matrix to obtain a binary matrix. S4: This binary matrix will be used as the input to the spike neural network, where each column becomes the input at the i-th time point (i=1…T); S5: The overall architecture of the spike neural network is constructed through a set of "category-specific" filter blocks; S6: Each filter block competes with each other through a lateral competition suppression mechanism. The input action sequence is preprocessed to form multiple spike frames. Each frame is connected to the convolutional layer pixel by pixel through a sliding window. S7: The sliding window moves row by row, and each pixel is connected to a neuron in the first filter block of the convolutional layer. After traversal, all pixels are connected to the filter. S8: Subsequent spike frames are connected to the same filter in the same way, and if there are multiple convolutional spike layers (CSNN), the connection method is also used between consecutive layers. The number of layers can be flexibly set according to the spatial feature complexity of the dataset. S9: Spatial and temporal features corresponding to each action from multiple convolutional peak layers are input into the classifier layer in the form of peak counts in the temporal window direction to ultimately identify the action.
[0024] In this embodiment, the classifier layer uses a simple logistic regression-based classifier.
[0025] In S1, the radar data is compressed and encoded preprocessed by the first component to speed up the calculation, and the second component contains multiple spike layers that extract spatial features from the input spike data.
[0026] Spatial feature extraction is essentially hierarchical. The first layer captures low-level features such as edges, and the complexity increases continuously until the last layer. The convolutional features of a layer and its temporal spike features become a rich feature set, which is then passed to the classifier to finally identify the action.
[0027] In S2, the time-domain signal The short-time Fourier transform (STFT) is given by the following equation: ; in It is the selected time window; For the current operation, the 24 GHz continuous wave (CW) radar was used with a sampling frequency of 2 kHz and both I (in-phase) and Q (quadrature, i.e., offset by 90 degrees) channels. The data collection time for all operations was 5 seconds.
[0028] Therefore, for each activity, there is orthogonal time-domain data of length 10,000. Based on the dataset, a spectrogram for each action is calculated, which is a time-frequency domain representation of the time-series data obtained from radar. A 1024-point Fast Fourier Transform (FFT) and a 256-long Kaiser window (75% overlap) are used to calculate the spectrogram.
[0029] In S4, the number of time periods in the spectrum is calculated using the following formula: ; Where N is the total sample data (5×2000). It is the window length (256) used for STFT calculation. This is the overlap of the data points (75% of 256 = 192); therefore, we get 153 time points (T = 153). The 2000 Hz data (±1 kHz) is represented by 1024 data points (due to the 1024-point FFT).
[0030] In step S5, the switching mechanism of the filter block includes the following steps: A1: In order to capture intra-frame spatial juxtaposition patterns of the same action category, the spike neural network creates multiple filters (features) within each filter block and connects them through a "switcher node", which is equivalent to a special LIF neuron. This design allows only one filter to be activated at a time, avoiding the need to learn a 3D spatiotemporal filter for consecutive peak frames; A2: During training, the switching node applies suppression to all filters within a filter block, forcing only one filter to be activated while the rest become inactive. A3: The duration of suppression is controlled by an adjustable suppression strength parameter. After suppression ends, all filters compete again, and the filter that produces the largest output spike is regarded as the "winner", realizing a winner-take-all mechanism. A4: This switching process is repeated according to the set decay time constant, ensuring that all filters have a chance to be activated during the training phase; thus, spatially juxtaposed but temporally separable features are distributed to different filters.
[0031] In step S6, the long-term suppression between filter blocks includes the following steps: B1: The spike neural network applies long-term lateral suppression between different filter blocks to ensure that only one filter block participates in learning at each time step, thus avoiding redundant learning patterns in different blocks.
[0032] B2: The spike neural network randomly initializes the weights of each filter block. For each action category, the first block to obtain the maximum spike response is considered the winner. This block then sends a stronger inhibition signal to other blocks, so that they remain in an inhibited state during training for that category. This ensures that the winning block only produces the main output for that category in subsequent training. Such a filter block suppression mechanism has two obvious advantages: since only some filter blocks are active at the same time, the number of active neurons in the convolutional spike layer is reduced during training; and different switching periods (decay time constants) can be set for each filter block according to different action categories to adapt to the duration and spatial pattern complexity of the action.
[0033] B3: When a pattern repeats multiple times within a short period, a shorter switching period can be set for that filter block. During the testing phase, the long-term suppression between filter blocks and the intra-block switching mechanism mentioned above were removed because they are only used during the training process.
[0034] In step S9, the time feature extraction includes the following steps: C1: Divide the entire action sequence into multiple equal-length windows; C2: Count the number of peaks within each time window as a time series feature.
[0035] Besides spatial features, temporal features are crucial for action recognition, helping the system capture the event sequence of action execution. This is especially useful for actions that overlap spatially but have different temporal sequences, such as sit-ups and jumps, whose radar spatial signatures are similar, but are difficult to distinguish using only the spatial features extracted by convolutional spike layers.
[0036] In the domain of spikes, action events can be characterized by the timing of spikes and the total number of spikes. However, for spatially overlapping actions, since the number of spikes generated by the two actions is almost the same, it is often impossible to distinguish the categories by simply comparing the total number of spikes.
[0037] Therefore, this method divides the entire action sequence into multiple time windows of equal duration and counts the number of peaks within each time window as a temporal feature.
[0038] For example, the binary spectra of sit-ups and jumps produce almost the same total peaks (6253 and 6479, respectively). However, if they are divided into four equal time windows, the peak counts for each window are significantly different. In the example presented in this paper, all actions lasted 5 seconds, each action was divided into equal time windows, and the peak counts for each window (along with spatial features) were recorded and used as input for the next level of classification.
[0039] This action recognition method based on spike neural networks proposes a paradigm of neuromorphic computing and applies it to radar data to learn and recognize human behavior. It realizes a novel convolution-based spike neural network capable of learning the spatial and temporal features of actions. The advantage of this method is that it uses neuromorphic concepts and spike neural networks, which can be deployed on the neuromorphic edge connected to the radar. This eliminates the need for data to be inferred through the network, thereby reducing inference latency and effectively improving the efficiency of action recognition methods in terms of computation and energy use.
[0040] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A behavior recognition method based on spike neural networks, characterized in that, Includes the following steps: S1: Compress and encode radar data to speed up computation; S2: Use the short-time Fourier transform to convert the time-domain data from the two channels into the time-frequency domain. The short-time Fourier transform provides a complex matrix. S3: Perform a modulo operation on the short-time Fourier transform matrix to obtain a real-valued matrix. Then, continuously convert the real matrix into a grayscale image and use an appropriate threshold to convert it into a binary matrix to obtain a binary matrix. S4: This binary matrix will be used as the input to the spike neural network, where each column becomes the input at the i-th time point (i=1…T); S5: The overall architecture of the spike neural network is constructed through a set of "category-specific" filter blocks; S6: Each filter block competes with each other through a lateral competition suppression mechanism. The input action sequence is preprocessed to form multiple spike frames. Each frame is connected to the convolutional layer pixel by pixel through a sliding window. S7: The sliding window moves row by row, and each pixel is connected to a neuron in the first filter block of the convolutional layer. After traversal, all pixels are connected to the filter. S8: Subsequent spike frames are connected to the same filter in the same way, and if there are multiple convolutional spike layers, the connection method between consecutive layers is also used. The number of layers can be flexibly set according to the spatial feature complexity of the dataset. S9: Spatial and temporal features corresponding to each action from multiple convolutional peak layers are input into the classifier layer in the form of peak counts in the temporal window direction to ultimately identify the action.
2. The behavior recognition method based on spike neural network according to claim 1, characterized in that, In S1, the radar data is compressed and encoded preprocessed by the first component to speed up the calculation, and the second component contains multiple spike layers that extract spatial features from the input spike data.
3. The behavior recognition method based on spike neural network according to claim 1, characterized in that, In S2, the time-domain signal The short-time Fourier transform (STFT) is given by the following equation: ; in It is the selected time window.
4. The behavior recognition method based on spike neural network according to claim 1, characterized in that, In S4, the number of time periods in the spectrum is calculated using the following formula: ; Where N is the total sample data. It is the window length used for STFT calculation. It is the number of overlapping data points.
5. The behavior recognition method based on spike neural network according to claim 1, characterized in that, In step S5, the switching mechanism of the filter block includes the following steps: A1: In order to capture intra-frame spatial juxtaposition patterns of the same action category, the spike neural network creates multiple filters within each filter block and connects them through a "switching node", which is equivalent to a special LIF neuron. A2: During training, the switching node applies suppression to all filters within a filter block, forcing only one filter to be activated while the rest become inactive. A3: The duration of suppression is controlled by an adjustable suppression strength parameter. After suppression ends, all filters compete again, and the filter that produces the largest output spike is regarded as the "winner", realizing a winner-takes-all mechanism. A4: This switching process is repeated according to the set decay time constant, ensuring that all filters have a chance to be activated during the training phase.
6. The behavior recognition method based on spike neural network according to claim 1, characterized in that, In step S6, the long-term suppression between filter blocks includes the following steps: B1: The spike neural network applies long-term lateral inhibition between different filter blocks to ensure that only one filter block participates in learning at each time step; B2: The spike neural network randomly initializes the weights of each filter block. For each action category, the first block to obtain the maximum spike response is considered the winner. This block then sends a stronger inhibition signal to other blocks, so that they remain in an inhibited state during training for that category. B3: When a certain pattern repeats multiple times in a short period of time, a shorter switching period can be set for the filter block.
7. The behavior recognition method based on a spike neural network according to claim 1, characterized in that, In step S9, the time feature extraction includes the following steps: C1: Divide the entire action sequence into multiple equal-length windows; C2: Count the number of peaks within each time window as a time series feature.
8. The behavior recognition method based on spike neural network according to claim 6, characterized in that, In B3, different switching cycles can be set for each filter block according to different action categories to adapt to the duration and spatial pattern complexity of the action.