Method for training a brain-like gesture recognition model, gesture category recognition and related devices

By adaptively adjusting the number of layers and thresholds in the neuron model, and combining LIF neurons and residual networks, the gradient vanishing and network degradation problems of deep spiking neural networks are solved, achieving efficient gesture feature extraction and recognition.

CN117830799BActive Publication Date: 2026-06-19ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV
Filing Date
2024-01-16
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies struggle to effectively address the vanishing gradient and network degradation issues during the training process of deep spiking neural networks (SNNs), resulting in limited gesture feature extraction and recognition capabilities.

Method used

We employ an adaptive multi-level threshold spiking neuron model based on LIF neurons and a residual network structure. By adaptively adjusting the number of layers and thresholds in the neuron model and combining learnable thresholds, we optimize gradient propagation and construct a deep spiking neural network.

Benefits of technology

Stable training of deep spiking neural networks was achieved, improving gesture feature extraction and recognition capabilities, and exhibiting low latency and high recognition performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117830799B_ABST
    Figure CN117830799B_ABST
Patent Text Reader

Abstract

This invention discloses a method and related apparatus for training a brain-like gesture recognition model and gesture category recognition, relating to the field of gesture recognition technology. The method for training the brain-like gesture recognition model includes: acquiring gesture actions using an event camera; preprocessing the acquired data; using the preprocessed data as input and the gesture category corresponding to the preprocessed data as output to train the brain-like gesture recognition model, resulting in a trained brain-like gesture recognition model. The brain-like gesture recognition model is a spiking neural network model constructed based on an adaptive multi-level threshold spiking neuron model using LIF neurons and a residual network structure. This invention, by setting a learnable threshold, achieves adaptive adjustment of the number of layers in the neuron model, alleviating the gradient vanishing problem caused by directly training a deep spiking neural network, thereby training the SNN network to be deeper and thus improving gesture feature extraction and recognition capabilities.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of gesture recognition technology, and in particular to a method and apparatus for training a brain-like gesture recognition model and for gesture category recognition. Background Technology

[0002] Spiking Neural Networks (SNNs), as a new generation of neural networks, combine spatial and temporal concepts through event-driven asynchronous pulse transmission, multi-scale neurodynamics, and the coordinated regulation of various plasticities. They possess the ability to process information in both the spatial (SD) and temporal (TD) domains, simulating the brain's pulse computing mechanisms and cognitive processes, and exhibit significant advantages in complex spatiotemporal signal processing. Compared to traditional Artificial Neural Networks (ANNs), SNNs, based on discrete pulse events for information communication, are essentially close to binary processing in hardware platforms. This facilitates low-power, high-energy-efficiency real-time computing on hardware. For example, the energy consumed by an SNN in transmitting spike signals on neuromorphic hardware is only nJ or pJ. With its high energy efficiency, low latency, and biological interpretability, SNNs have become an attractive artificial intelligence solution.

[0003] In the context of highly robust and energy-efficient artificial intelligence applications, event-based cameras or dynamic vision sensors (DVS) have emerged as a promising innovative solution in the field of computer vision. Compared to traditional frame-based cameras, event cameras capture pixel-level light intensity changes independently, generating asynchronous binary event streams. The captured event features have binary pixel and temporal resolution, forming a highly sparse and energy-efficient visual representation. This spatiotemporal binarization naturally aligns with the computational mechanism of spiking neural networks (SNNs), providing a unique opportunity to bridge the gap between computer vision and neuromorphic computing.

[0004] However, due to nonlinear and nondifferentiable impulse activity, traditional gradient-based optimization methods are difficult to directly apply to SNNs. This nondifferentiability limits the application of SNNs on complex tasks and large-scale datasets, especially in scenarios requiring deep architectures and fine-tuning. Training high-performance deep SNNs remains a challenge. Currently, there are two main types of learning algorithms used for training deep SNNs: one is indirect supervised learning algorithms, such as the ANN-SNN conversion method; the second is direct training methods using surrogate gradients.

[0005] Transformation-based methods convert pre-trained convolutional neural networks (CNNs) into SNNs with the same structure. This typically takes a considerable amount of time to obtain similar information representations and inevitably results in a loss of accuracy compared to the original CNN. Although SNNs can achieve performance comparable to ANNs, the requirement for large time steps leads to significant inference latency and high energy consumption, and fails to fully utilize the temporal dynamics of SNNs.

[0006] In direct training methods using surrogate gradients, SNNs are treated as a special form of recurrent neural networks (RNNs) and trained using the backpropagation-time (BPTT) algorithm. This method allows SNNs to maintain their temporal dynamics while iteratively propagating gradients in the spatial and temporal domains for optimization. The development of direct training methods has enabled SNNs to achieve performance comparable to CNNs with ultra-low latency. However, the difference between approximate and exact gradients significantly limits the stability of training for large-scale models.

[0007] However, these methods do not address the problem of limited approximate derivative width and weak expressive power of binary pulse signals. The limited width of the surrogate gradient causes the membrane potentials of many neurons to fall into the saturation region where the approximate derivative is zero or extremely small, thus blocking gradient propagation and leading to the vanishing gradient problem, making direct training of deep SNNs inefficient. Furthermore, network degradation is a severe problem in direct training methods, even with residual structures, and this issue is very significant in directly trained deep SNNs. Vanishing gradients and network degradation severely limit the depth of directly trained SNNs.

[0008] To address the problems of vanishing gradients and network degradation, Zheng et al. proposed the STBP-tdBN method, introducing batch time normalization to balance the dynamics of spiking neurons and adjust the firing rate, thus avoiding gradient vanishing or exploding to some extent. Feng et al. proposed a multi-level firing method based on STBP to achieve more efficient gradient propagation and incremental neuron expression. One of the most successful methods to address degradation is the residual structure. It introduces a fast connection, increasing the network's identity mapping ability and allowing the network to reach hundreds of layers without degradation, significantly extending the network depth. Meanwhile, Fang et al. proposed sew-ResNet by directly propagating gradients through a designed residual structure to compensate for the difference between approximate and exact gradients. Furthermore, various regularization techniques have been proposed to stabilize SNN training, such as correcting membrane potential distribution and backpropagation with spatiotemporal adjustments.

[0009] The nervous system has achieved remarkable performance through its highly complex dynamics. Based on the impulse generation mechanism in the adaptive firing process discovered in neuroscience, Fang et al. achieved fine-tuning of neuronal dynamics by optimizing the membrane time constant throughout the training process, according to the needs of large models. Meanwhile, DSR optimizes the membrane potential threshold during training by multiplying the binary output impulse by a threshold. In this method, the relationship between the firing range and the threshold is constrained by a deterministic ratio. However, a fixed threshold often leads to membrane potential overshoot, thus limiting the performance of deep spiking neural networks during highly stable training processes.

[0010] Therefore, how to design an adaptive SNN training algorithm with a learnable threshold to improve gesture feature extraction and recognition capabilities has become a technical problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0011] The purpose of this invention is to provide a method and related apparatus for training a brain-like gesture recognition model and gesture category recognition. By setting a learnable threshold, the number of layers in the neuron model can be adaptively adjusted, which alleviates the gradient vanishing problem caused by directly training a deep spiking neural network. This allows the SNN network to be trained deeper, thereby improving the gesture feature extraction and recognition capabilities.

[0012] To achieve the above objectives, the present invention provides the following solution:

[0013] In a first aspect, the present invention provides a method for training a brain-like gesture recognition model, comprising:

[0014] The event camera is used to capture hand gestures, and the captured data is obtained; the captured data includes images of gesture categories.

[0015] The collected data is preprocessed to obtain preprocessed data.

[0016] Using the preprocessed data as input and the gesture category corresponding to the preprocessed data as output, the brain-like gesture recognition model is trained to obtain a trained brain-like gesture recognition model; the brain-like gesture recognition model is a spiking neural network model constructed based on an adaptive multi-level threshold spiking neuron model of LIF neurons and a residual network structure.

[0017] Optionally, the collected data is preprocessed to obtain preprocessed data, specifically including:

[0018] Determine a time window size and read all events within that time window.

[0019] The number of events falling within the coordinates (x, y, p) within the time window is counted; where (x, y) is the spatial location of the event point, and p is the polarity of the event point.

[0020] The number of events at each position is divided by the largest number of events at all positions, and then normalized to obtain the normalized data.

[0021] Multiply the normalized data by 255 gray levels to obtain a 2-channel grayscale frame image determined by the event frequency.

[0022] The two-channel grayscale frame image is downsampled using average pooling to obtain preprocessed data.

[0023] Optionally, the construction process of the brain-like gesture recognition model is as follows:

[0024] A neuron model is constructed; the neuron model is an adaptive multi-level threshold pulse firing neuron model based on LIF neurons; the neuron model can adaptively update the threshold and the number of levels.

[0025] Based on the residual network structure and the neuron model, the brain-like gesture recognition model is constructed; the residual network structure is a structure that performs spike activation before adding to the shortcut connection; the brain-like gesture recognition model consists of several basic blocks, each of which consists of two convolutional layers, two batch normalization layers for processing temporal data, and two adaptive multi-level threshold spike firing neuron models based on LIF neurons.

[0026] Optionally, the preprocessed data is used as input, and the gesture category corresponding to the preprocessed data is used as output to train the brain-like gesture recognition model, thereby obtaining a trained brain-like gesture recognition model, specifically including:

[0027] Determine the loss function.

[0028] The preprocessed data is input into the brain-like gesture recognition model to obtain the output cumulative membrane voltage.

[0029] The loss value is determined based on the output cumulative membrane voltage, the true label of the spiking neural network, and the loss function; the true label of the spiking neural network is the gesture category corresponding to the preprocessed data.

[0030] The parameters of the brain-like gesture recognition model are optimized based on the loss value to obtain a trained brain-like gesture recognition model.

[0031] Optionally, the parameters of the brain-like gesture recognition model are optimized based on the loss value, specifically including:

[0032] Based on the loss value, the gradients of the synaptic weights and biases of the brain-like gesture recognition model are calculated.

[0033] The synaptic weights and biases of the brain-like gesture recognition model are updated based on the gradient of the synaptic weights and biases of the model.

[0034] Based on the loss value, calculate the gradient of the threshold of the adaptive multi-level threshold spurious firing neuron model based on LIF neurons.

[0035] Based on the gradient of the threshold of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons, the threshold of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons is adaptively updated, and the number of levels of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons is adjusted.

[0036] Secondly, the present invention provides a method for gesture category recognition, comprising:

[0037] The current gesture action is captured using an event camera to obtain the current data; the current data is an image including the gesture category.

[0038] The collected current data is preprocessed to obtain preprocessed current data.

[0039] The preprocessed current data is input into the trained brain-like gesture recognition model to obtain the gesture category; the trained brain-like gesture recognition model is a model trained according to the method for training brain-like gesture recognition models described in the first aspect.

[0040] Thirdly, the present invention provides an apparatus for training a brain-like gesture recognition model, comprising:

[0041] The data acquisition unit is used to acquire gesture actions using an event camera to obtain acquired data; the acquired data includes images of gesture categories.

[0042] The preprocessing unit is used to preprocess the collected data to obtain preprocessed data.

[0043] The training unit is used to train the brain-like gesture recognition model with the preprocessed data as input and the gesture category as output, so as to obtain the trained brain-like gesture recognition model; the brain-like gesture recognition model is a spiking neural network model constructed based on the LIF neuron adaptive multi-level threshold spiking neuron model and the residual network structure.

[0044] Fourthly, the present invention provides a gesture category recognition device, comprising:

[0045] The data acquisition unit is used to collect current gesture actions using an event camera to acquire current data; the collected current data is an image including the gesture category.

[0046] The current data preprocessing unit is used to preprocess the current data to obtain preprocessed current data.

[0047] The recognition unit is used to input the preprocessed current data into the trained brain-like gesture recognition model to obtain the gesture category; the trained brain-like gesture recognition model is a model trained according to the method for training a brain-like gesture recognition model described in the first aspect.

[0048] Fifthly, the present invention provides an electronic device, comprising:

[0049] Processor; and

[0050] Memory, in which computer-readable program instructions are stored.

[0051] Wherein, when the computer-readable program instructions are executed by the processor, the method for training a brain-like gesture recognition model as described in the first aspect is performed, or the method for gesture category recognition as described in the second aspect is performed.

[0052] In a sixth aspect, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method for training a brain-like gesture recognition model as described in the first aspect, or the steps of the method for gesture category recognition as described in the second aspect.

[0053] According to specific embodiments provided by the present invention, the present invention discloses the following technical effects:

[0054] This invention provides a method and related apparatus for training a brain-like gesture recognition model and gesture category recognition. By combining an adaptive multi-level threshold spiking neuron model based on LIF neurons with a residual network structure, a multi-scale spiking pattern can be realized. This pattern can adaptively capture the multi-scale spatiotemporal dynamics of the spiking neuron model. The brain-like gesture recognition model integrated on this basis has excellent gesture feature extraction and recognition capabilities. Attached Figure Description

[0055] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0056] Figure 1This is a flowchart of a method for training a brain-like gesture recognition model provided in Embodiment 1 of the present invention;

[0057] Figure 2 This is a flowchart of the gesture recognition system provided in Embodiment 1 of the present invention;

[0058] Figure 3 This is a schematic diagram of the adaptive multi-level threshold pulse firing neuron model provided in Embodiment 1 of the present invention;

[0059] Figure 4 This is a schematic diagram of the network basic block structure provided in Embodiment 1 of the present invention;

[0060] Figure 5 This is a schematic diagram of the adaptive adjustment threshold and corresponding number of levels provided in Embodiment 1 of the present invention;

[0061] Figure 6 This is a flowchart of a gesture category recognition method provided in Embodiment 2 of the present invention;

[0062] Figure 7 This is a schematic diagram of the structure of a device for training a brain-like gesture recognition model provided in Embodiment 3 of the present invention;

[0063] Figure 8 This is a schematic diagram of the structure of a gesture category recognition device provided in Embodiment 4 of the present invention. Detailed Implementation

[0064] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0065] The purpose of this invention is to provide a method and related apparatus for training a brain-like gesture recognition model and gesture category recognition. By setting a learnable threshold, the number of layers in the neuron model can be adaptively adjusted, alleviating the gradient vanishing problem caused by directly training deep spiking neural networks, thereby allowing the SNN network to be trained to a deeper level. This algorithm can realize a multi-scale spiking pattern that can adaptively capture the multi-scale spatiotemporal dynamics of the spiking neuron model. The brain-like gesture recognition system integrated on this basis has excellent gesture feature extraction and recognition capabilities.

[0066] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0067] Example 1:

[0068] like Figure 1 and Figure 2 As shown, this embodiment provides a method for training a brain-like gesture recognition model, including:

[0069] A1: Use an event camera to capture hand gestures and obtain the captured data; the captured data includes images of gesture categories.

[0070] A2: The collected data is preprocessed to obtain preprocessed data.

[0071] A3: Using the preprocessed data as input and the gesture category corresponding to the preprocessed data as output, train the brain-like gesture recognition model to obtain a trained brain-like gesture recognition model; the brain-like gesture recognition model is a spiking neural network model constructed based on the LIF neuron adaptive multi-level threshold spiking neuron model and the residual network structure.

[0072] As an optional implementation method provided in this embodiment, step A2 specifically includes:

[0073] A21: Determine a time window size δt, and read all events within the time window.

[0074] A22: Count the number of events falling within (x, y, p) within the time window, using (x, y, p) as coordinates; where (x, y) is the spatial location of the event point, and p is the polarity of the event point.

[0075] A23: Normalize the data by dividing the number of events at each position by the largest number of events across all positions to obtain the normalized data.

[0076] A24: Multiply the normalized data by 255 gray levels to obtain a 2-channel grayscale frame image determined by the event frequency.

[0077] A25: The two-channel grayscale frame image is downsampled using average pooling to reduce the number of pixels in the image, resulting in preprocessed data. This reduces the amount of data and speeds up the processing.

[0078] As an optional implementation method provided in this embodiment, the construction process of the brain-like gesture recognition model is as follows:

[0079] A neuron model is constructed; the neuron model is an Adaptive Multi-Level Firing (AMLF) model based on LIF neurons, which can adaptively update the threshold and the number of layers; a schematic diagram of the AMLF unit is shown below. Figure 3 As shown.

[0080] An AMLF unit contains n elements with different thresholds. The AMLF neuron receives input and updates its membrane potential. Each layer of neurons generates a pulse when its membrane potential reaches a corresponding threshold. The final output of the AMLF unit is the union of the pulses fired by neurons at all layers. (The neuron's membrane potential at time t+1 is shown.) The updated formula is:

[0081]

[0082]

[0083]

[0084] Where, k τ Indicates the decay constant. and Let n and n represent the membrane potential vector and output vector of the i-th AMLF unit in the l-th layer at time t, respectively. n represents the total number of AMLF unit layers, which can be adaptively adjusted, and k represents the k-th layer.

[0085] This represents the input of the AMLF cell at time t+1. Formulas available calculate, This represents the membrane potential output vector of the j-th AMLF unit in the (l-1)-th layer at time t+1. This represents the synaptic weight from the j-th neuron in layer (l-1) to the i-th neuron in layer l. It is the bias of the i-th neuron in the l-th layer.

[0086] V th This represents the threshold vector of the AMLF neuron model. This represents the threshold of the neuron at level k. This represents the threshold of neurons in the first layer of the AMLF unit, initially set to 0.5.

[0087] In AMLF, LIF neurons at different levels have different thresholds, and the relationship between the thresholds of LIF neurons at different levels is as follows:

[0088]

[0089] It is generated by the step function f(·). When the membrane potential exceeds the firing threshold, the neuron will emit a pulse, and the membrane potential will be reset to zero.

[0090] Finally, the output of the i-th AMLF unit in the l-th layer at time t is A UNION representing n levels of LIF neurons.

[0091] Based on the ResNet residual network structure and the neuron model, the brain-like gesture recognition model is constructed; the brain-like gesture recognition model uses spiking neurons to process input and transmit pulses.

[0092] Modify the residual network structure by replacing the previous structure where addition with shortcut connections was performed before impulse activation with impulse activation, in order to reduce the proportion of each basic block output falling outside the saturation region of the rectangular area. For example... Figure 4 As shown.

[0093] The brain-like gesture recognition model consists of several basic blocks. Each basic block comprises two convolutional layers (conv), two batch normalization layers (tdBN) for processing temporal data, and two adaptive multi-level threshold pulse firing neuron models (AMLF) based on LIF neurons. Specifically, the first output of the input layer of the basic block is connected to the input of the first convolutional layer; the output of the first convolutional layer is connected to the input of the first batch normalization layer; the output of the first batch normalization layer is connected to the input of the first LIF-based adaptive multi-level threshold pulse firing neuron model; the output of the first LIF-based adaptive multi-level threshold pulse firing neuron model is connected to the input of the second convolutional layer; the output of the second convolutional layer is connected to the input of the second batch normalization layer; the output of the second batch normalization layer is connected to the input of the second LIF-based adaptive multi-level threshold pulse firing neuron model; and the second output of the output layer of the basic block is quickly connected to the output of the second LIF-based adaptive multi-level threshold pulse firing neuron model and then connected to the output layer of the basic block. Figure 4 As shown, a deep network structure is formed by stacking multiple layers, and finally classification is performed by average pooling and linear layers.

[0094] As an optional implementation method provided in this embodiment, step A3 specifically includes:

[0095] A31: Determine the loss function, which is the cross-entropy loss function based on membrane potential. Based on the true labels Y = (y1, y2, ..., y...) of the SNN... c ) and output cumulative membrane voltage u=(u1,u2,...,u c We use ) to calculate the cross-entropy loss, where c represents the number of label categories. Its expression is:

[0096]

[0097]

[0098] Where, p i Let L represent the probability distribution and L represent the loss function.

[0099] A32: Input the preprocessed data into the brain-like gesture recognition model to obtain the output cumulative membrane voltage.

[0100] A33: Determine the loss value based on the output cumulative membrane voltage, the true label of the spiking neural network, and the loss function; the true label of the spiking neural network is the gesture category corresponding to the preprocessed data.

[0101] A34: Optimize the parameters of the brain-like gesture recognition model based on the loss value to obtain a trained brain-like gesture recognition model.

[0102] During forward propagation, the entire network involves processing convolutions, batch normalization, and spiking neuron activation.

[0103] Traditional Batch Normalization (BN) layers are not designed for normalizing spatiotemporal data. The Threshold-Dependent Batch Normalization (tdBN) method is used to normalize presynaptic inputs in both the spatial and temporal domains. The distribution of the output layer accumulates the membrane potential without firing pulses, and the accumulated output membrane voltage u = (u1, u2, ..., u c Its expression is as follows:

[0104]

[0105] Where T represents T time steps, and N(l-1) represents the number of neurons in the (l-1)th layer as N(l-1). This represents the synaptic weight from the j-th neuron in layer (l-1) to the i-th neuron in layer l. This represents the output of the j-th neuron in the (l-1)-th layer at time t.

[0106] As an optional implementation method provided in this embodiment, in step A34, the parameters of the brain-like gesture recognition model are optimized based on the loss value, specifically including:

[0107] A341: Based on the loss value, calculate the gradient of the synaptic weights and biases of the brain-like gesture recognition model.

[0108] A342: Update the synaptic weights and biases of the brain-like gesture recognition model based on the gradient of the synaptic weights and biases of the brain-like gesture recognition model.

[0109] A343: Based on the loss value, calculate the gradient of the threshold of the adaptive multi-level threshold spurring neuron model based on LIF neurons.

[0110] A344: Based on the gradient of the threshold of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons, the threshold of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons is adaptively updated, and the number of layers of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons is adjusted.

[0111] During backpropagation, the gradient is calculated according to the chain rule, representing the synaptic weights from the j-th neuron in layer (l-1) to the i-th neuron in layer l. and the bias of the i-th neuron in the l-th layer The gradient can be obtained by the following formula:

[0112]

[0113]

[0114] Where L represents the loss function; T represents T time steps; This represents the membrane voltage of the i-th neuron in the l-th layer at time t; This represents the current of the i-th neuron in the l-th layer at time t; This represents the output of the j-th neuron in the (l-1)-th layer at time t.

[0115] In backpropagation, the gradient of the threshold is calculated, and the threshold is adaptively updated to adjust the number of AMLF layers, n. n is equal to a fixed interval of available gradients divided by the width of the alternative gradient centered on the threshold. The formula for calculating n is as follows:

[0116]

[0117]

[0118] Where α represents the width of the alternative gradient centered on the threshold, and width represents a fixed gradient available range, initially set to 3. This represents the threshold of the first-level neuron in the l-th layer AMLF unit, which is set as a learnable parameter. The gradient calculation formula is as follows:

[0119]

[0120] Where L represents the loss function, This represents the membrane voltage of the i-th neuron in the l-th layer at time t; This represents the output of the j-th neuron in layer l at time t.

[0121] Taking the AMLF unit in the first layer of the network as an example, the adaptive adjustment curves of the threshold and the number of layers during training are as follows: Figure 5 As shown.

[0122] Specifically, based on the AMLF neuron model and the aforementioned loss function, this embodiment constructs a trainable deep spiking residual neural network. The overall network architecture is shown in Table 1, N b1 N b2 and N b3 These represent the number of stacks within each basic block. k represents the kernel size, c1, c2, and c3 represent the number of output channels in each convolutional layer, and N represents the number of output channels in each convolutional layer. c This indicates the number of categories in the output layer.

[0123] Table 1 Overall Network Architecture

[0124]

[0125] We use learnable alternative gradients and spatiotemporal backpropagation algorithms to update the connection weights of the neural network, train a deep spiking neural network for gesture recognition, and adaptively update the neuron thresholds to adjust the number of layers in the AMLF unit.

[0126] In summary, this embodiment discloses a training method for a brain-like gesture recognition model. The brain-like gesture recognition model employs the following key technologies: motion acquisition based on an event camera; a spiking neuron model (LIF, leaky integrate-and-fire) that mimics the brain's pulse transmission of information; a neuron model based on LIF with adaptive multi-level threshold firing (AMLF); a deep spiking neural network structure based on residual networks; and efficient capture of multi-scale spatiotemporal dynamic pulse firing patterns using spiking neural networks. The brain-like gesture recognition system integrated based on the above key technologies possesses efficient gesture feature extraction and recognition capabilities.

[0127] Specifically, the brain-like gesture recognition model has the following advantages:

[0128] (1) An innovative pulse firing neuron model with adaptive multi-level thresholds is proposed, which can characterize rich spatiotemporal dynamics within a limited time window.

[0129] (2) The designed spiking neuron model improves the expressive power of neurons and effectively alleviates the gradient vanishing problem. The deep spiking neural network built based on the residual structure realizes the effective propagation of gradients, enabling the spiking neural network to reach deeper depths without network degradation.

[0130] (3) High stability: Experiments show that the strategy of dynamically adjusting the threshold and dynamically adjusting the number of layers of neuron units according to the threshold can better adapt to complex network structures and training requirements. Such optimization improves the stability of training deep SNN.

[0131] (4) High efficiency: Experiments on the non-neuromorphic dataset (CIFAR10) and the neuromorphic dataset (DVS-GRIGURE, CIFAR10-DVS) demonstrate that the adaptive multi-level threshold firing spiking neuron model proposed in this embodiment not only enables faster network training convergence but also achieves superior performance. This indicates that the system can achieve significant efficiency improvements on different types of datasets.

[0132] (5) Low latency: The recognition system has low latency characteristics and can recognize gestures in real time, providing faster response speed for applications.

[0133] Example 2:

[0134] like Figure 6 As shown, this embodiment provides a method for gesture category recognition, including:

[0135] B1: Use an event camera to capture the current gesture action to obtain the captured current data; the captured current data is an image including the gesture category.

[0136] B2: Preprocess the collected current data to obtain preprocessed current data.

[0137] B3: Input the preprocessed current data into the trained brain-like gesture recognition model to obtain the gesture category; the trained brain-like gesture recognition model is the model trained according to the method for training brain-like gesture recognition models described in Example 1.

[0138] Example 3:

[0139] like Figure 7 As shown, this embodiment provides an apparatus for training a brain-like gesture recognition model, comprising:

[0140] The data acquisition unit M1 is used to acquire gesture actions using an event camera to obtain acquired data; the acquired data includes images of gesture categories.

[0141] The preprocessing unit M2 is used to preprocess the collected data to obtain preprocessed data.

[0142] Training unit M3 is used to train the brain-like gesture recognition model with the preprocessed data as input and the gesture category as output, so as to obtain the trained brain-like gesture recognition model; the brain-like gesture recognition model is a spiking neural network model constructed based on the LIF neuron adaptive multi-level threshold spiking neuron model and the residual network structure.

[0143] Example 4:

[0144] like Figure 8 As shown, this embodiment provides a gesture category recognition device, including:

[0145] The data acquisition unit N1 is used to collect the current gesture action using an event camera and acquire the current data; the acquired current data is an image including the gesture category.

[0146] The current data preprocessing unit N2 is used to preprocess the current data to obtain preprocessed current data.

[0147] The recognition unit N3 is used to input the preprocessed current data into the trained brain-like gesture recognition model to obtain the gesture category; the trained brain-like gesture recognition model is a model trained according to the method for training brain-like gesture recognition models described in Example 1.

[0148] Example 5:

[0149] This embodiment provides an electronic device, including:

[0150] Processor; and

[0151] Memory, in which computer-readable program instructions are stored.

[0152] The computer-readable program instructions, when executed by the processor, perform either the method for training a brain-like gesture recognition model as described in Embodiment 1, or the method for gesture category recognition as described in Embodiment 2.

[0153] Example 6:

[0154] This embodiment provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the method for training a brain-like gesture recognition model as described in Embodiment 1, or the steps of the method for gesture category recognition as described in Embodiment 2.

[0155] The program portion of a technology can be considered a "product" or "artifact" existing in the form of executable code and / or related data, and is involved in or implemented through a computer-readable medium. Tangible, permanent storage media can include memory or storage used by any computer, processor, or similar device or related module. For example, various semiconductor memories, tape drives, disk drives, or any similar device capable of providing storage functionality for software.

[0156] All software, or parts thereof, may sometimes communicate via networks, such as the Internet or other communication networks. Such communication can load software from one computer device or processor to another. For example, loading software from a server or host computer of a video object detection device to a hardware platform of a computer environment, or another computer environment that implements the system, or a system with similar functionality related to providing the information needed for object detection. Therefore, another medium capable of transmitting software elements can also be used as a physical connection between local devices, such as light waves, radio waves, electromagnetic waves, etc., propagated through cables, fiber optic cables, or air. Physical media used for carrier waves, such as cables, wireless connections, or fiber optic cables, can also be considered as media carrying software. In this context, unless limited to tangible "storage" media, the term "readable medium" for a computer or machine refers to the medium involved in the execution of any instructions by the processor.

[0157] Furthermore, those skilled in the art will understand that aspects of the present invention can be described and illustrated through several patentable types or situations, including any new and useful combination of processes, machines, products, or substances, or any new and useful improvements thereof. Accordingly, aspects of the present invention can be implemented entirely by hardware, entirely by software (including firmware, resident software, microcode, etc.), or by a combination of hardware and software. All of the above hardware or software may be referred to as a "data block," "module," "engine," "unit," "component," or "system." Furthermore, aspects of the present invention may be embodied as a computer product located on one or more computer-readable media, the product comprising computer-readable program code.

[0158] Unless otherwise defined, all terms used herein (including technical and scientific terms) shall have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. It should also be understood that terms such as those defined in a common dictionary shall be interpreted as having a meaning consistent with their meaning in the context of the relevant art, and not as having an idealized or highly formalized meaning, unless expressly defined herein.

[0159] The foregoing description is illustrative of the invention and should not be construed as limiting it. Although several exemplary embodiments of the invention have been described, those skilled in the art will readily understand that many modifications can be made to the exemplary embodiments without departing from the novel teachings and advantages of the invention. Therefore, all such modifications are intended to be included within the scope of the invention as defined in the claims. It should be understood that the foregoing description is illustrative of the invention and should not be construed as limiting it to the specific embodiments disclosed, and modifications to the disclosed embodiments and other embodiments are intended to be included within the scope of the appended claims. The invention is defined by the claims and their equivalents.

[0160] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple; relevant parts can be referred to the method section.

[0161] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A method for training a brain-like gesture recognition model, characterized in that, include: The event camera is used to capture hand gestures and obtain the captured data; The collected data includes images that include gesture categories; The collected data is preprocessed to obtain preprocessed data; Using the preprocessed data as input and the gesture category corresponding to the preprocessed data as output, the brain-like gesture recognition model is trained to obtain a trained brain-like gesture recognition model; the brain-like gesture recognition model is a spiking neural network model constructed based on an adaptive multi-level threshold spiking neuron model based on LIF neurons and a residual network structure. The construction process of the brain-like gesture recognition model is as follows: A neuron model is constructed; the neuron model is an adaptive multi-level threshold pulse firing neuron model based on LIF neurons; the neuron model can adaptively update the threshold and the number of layers; Based on the residual network structure and the neuron model, the brain-like gesture recognition model is constructed; the residual network structure is a structure that performs spike activation before adding to the shortcut connection; the brain-like gesture recognition model consists of several basic blocks, each of which consists of two convolutional layers, two batch normalization layers for processing temporal data, and two adaptive multi-level threshold spike firing neuron models based on LIF neurons.

2. The method for training a brain-like gesture recognition model according to claim 1, characterized in that, The collected data is preprocessed to obtain preprocessed data, specifically including: Determine a time window size and read all events within that time window; The number of events falling within the coordinates (x, y, p) within the time window is counted; where (x, y) is the spatial location of the event point, and p is the polarity of the event point; The number of events at each position is divided by the largest number of events at all positions, and then normalized to obtain the normalized data. Multiply the normalized data by 255 gray levels to obtain a 2-channel grayscale frame image determined by the event frequency; The two-channel grayscale frame image is downsampled using average pooling to obtain preprocessed data.

3. The method for training a brain-like gesture recognition model according to claim 1, characterized in that, Using the preprocessed data as input and the gesture category corresponding to the preprocessed data as output, the brain-like gesture recognition model is trained to obtain a trained brain-like gesture recognition model, specifically including: Determine the loss function; The preprocessed data is input into the neuromorphic gesture recognition model to obtain the output cumulative membrane voltage; The loss value is determined based on the output cumulative membrane voltage, the true label of the spiking neural network, and the loss function; the true label of the spiking neural network is the gesture category corresponding to the preprocessed data. The parameters of the brain-like gesture recognition model are optimized based on the loss value to obtain a trained brain-like gesture recognition model.

4. The method for training a brain-like gesture recognition model according to claim 3, characterized in that, The parameters of the brain-like gesture recognition model are optimized based on the loss value, specifically including: Based on the loss value, the gradient of the synaptic weights and biases of the brain-like gesture recognition model is calculated; Update the synaptic weights and biases of the brain-like gesture recognition model based on the gradient of the synaptic weights and biases of the brain-like gesture recognition model. Based on the loss value, calculate the gradient of the threshold of the adaptive multi-level threshold spurious firing neuron model based on LIF neurons; Based on the gradient of the threshold of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons, the threshold of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons is adaptively updated, and the number of levels of the adaptive multi-level threshold pulse firing neuron model based on LIF neurons is adjusted.

5. A method for gesture category recognition, characterized in that, include: The event camera is used to capture the current gesture action, and the current data is obtained. The collected current data includes images that include gesture categories; The collected current data is preprocessed to obtain preprocessed current data; The preprocessed current data is input into the trained brain-like gesture recognition model to obtain the gesture category; the trained brain-like gesture recognition model is a model trained by the method of training a brain-like gesture recognition model according to any one of claims 1-4.

6. A device for training a brain-like gesture recognition model, characterized in that, include: The data acquisition unit is used to capture hand gestures using an event camera to obtain the acquired data. The collected data includes images that include gesture categories; The preprocessing unit is used to preprocess the collected data to obtain preprocessed data; The training unit is used to train the brain-like gesture recognition model with the preprocessed data as input and the gesture category as output, so as to obtain the trained brain-like gesture recognition model; the brain-like gesture recognition model is a spiking neural network model constructed based on the LIF neuron adaptive multi-level threshold spiking neuron model and the residual network structure. The construction process of the brain-like gesture recognition model is as follows: A neuron model is constructed; the neuron model is an adaptive multi-level threshold pulse firing neuron model based on LIF neurons; the neuron model can adaptively update the threshold and the number of layers; Based on the residual network structure and the neuron model, the brain-like gesture recognition model is constructed; the residual network structure is a structure that performs spike activation before adding to the shortcut connection; the brain-like gesture recognition model consists of several basic blocks, each of which consists of two convolutional layers, two batch normalization layers for processing temporal data, and two adaptive multi-level threshold spike firing neuron models based on LIF neurons.

7. A device for gesture category recognition, characterized in that, include: The data acquisition unit is used to collect the current gesture action using the event camera and acquire the current data; The collected current data includes images that include gesture categories; The current data preprocessing unit is used to preprocess the current data to obtain preprocessed current data; The recognition unit is used to input the preprocessed current data into the trained brain-like gesture recognition model to obtain the gesture category; the trained brain-like gesture recognition model is a model trained by the method of training a brain-like gesture recognition model according to any one of claims 1-4.

8. An electronic device, comprising: processor; as well as Memory, in which computer-readable program instructions are stored. Wherein, when the computer-readable program instructions are executed by the processor, the method for training a brain-like gesture recognition model as described in any one of claims 1-4 is performed, or the method for gesture category recognition as described in claim 5 is performed.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the program implements the steps of the method for training a brain-like gesture recognition model as described in any one of claims 1-4, or the steps of the method for gesture category recognition as described in claim 5.