A human action recognition method and device based on a neural network and a medium

By optimizing the correlation between sensor channels through neural networks and generating sensor and feature weights, the problems of data redundancy and low recognition efficiency in multi-sensor human motion recognition are solved, and high-precision and stable motion recognition is achieved.

CN117523675BActive Publication Date: 2026-06-26SHENZHEN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN UNIV
Filing Date
2023-12-20
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies for human motion recognition often fail to achieve ideal results using a single sensor. Multiple sensors lead to data redundancy and low recognition efficiency, and improper sensor deployment makes them susceptible to environmental interference, resulting in poor robustness of the recognition model.

Method used

By acquiring sensing data from multiple sensors, the correlation between sensor channels is explored using a fusion optimization network in a neural network, sensor and feature weights are generated, data redundancy is reduced, sensor selection is optimized, and a compact and effective human motion recognition system is constructed.

Benefits of technology

It improves the accuracy and stability of human motion recognition, reduces data processing costs and computational complexity, and enhances recognition efficiency and robustness.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117523675B_ABST
    Figure CN117523675B_ABST
Patent Text Reader

Abstract

The present disclosure discloses a human action recognition method and device based on a neural network and a medium, belonging to the technical field of machine learning, comprising: acquiring first sensing data through a plurality of sensors placed on different parts of the human body; setting second weights for each sensor according to the feature vectors of the first sensing data through a fusion optimization network and a sensor weight generation network; further obtaining feature weight values through a feature weight network; obtaining a first human action recognition result and calculating a loss value through a classification network according to the feature weight values to train the network; and identifying the human action of second sensing data to be identified according to the trained neural network model. The fusion optimization network explores the correlation between the internal channels of the sensor, fully utilizes the data of each channel in a sensor, reduces the redundancy of the data based on the dual selection of the sensor and the feature, reduces the sensor data, and at the same time ensures the final recognition accuracy and recognition stability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of machine learning technology, and in particular to a method, device and medium for human motion recognition based on neural networks. Background Technology

[0002] With the development of artificial intelligence and the popularization of wearable sensor devices, wearable sensor-based human activity recognition (HAR) systems, which combine artificial intelligence and Internet of Things technologies, have received increasing attention in recent years because they are widely used in many important fields such as health monitoring, intelligent interaction, and security monitoring.

[0003] Sensors are widely used in various fields due to their high portability, small size, wide applicability, low cost, and low power consumption. Wearable smart devices are also gaining popularity because they are closely related to popular applications such as navigation, human-computer interaction, and health monitoring. Through interconnection with various electronic products, the applications of wearable devices are further expanded, making people's lives more convenient. Typically, wearable devices contain multiple sensors, which collect and analyze data changes to detect human movement in real time.

[0004] Due to the high complexity and diversity of human activities, using a single sensor for classification in human motion recognition tasks often yields unsatisfactory results. This is because different motion changes are highly sensitive to sensor position; using a single sensor requires analyzing different motions and deploying it in appropriate locations to achieve good results. Therefore, human motion recognition tasks typically employ multiple homogeneous or heterogeneous sensors deployed at different locations on the human body. By processing and classifying the time-series data collected by multiple sensors, recognition accuracy is improved. Summary of the Invention

[0005] In view of this, the present disclosure aims to provide a method, device and medium for human motion recognition based on neural networks.

[0006] The technical solution disclosed herein is implemented as follows:

[0007] In a first aspect, this disclosure provides a human action recognition method based on a neural network, characterized in that the method includes:

[0008] First sensing data of known movements is obtained by placing multiple sensors on different parts of the human body;

[0009] The feature vector of the first sensing data is input into the neural network model. After obtaining the difference vector between the internal channels of the sensor through the fusion optimization network in the neural network model, the sensor weight generation network in the neural network model uses the difference vector and the feature vector to set the corresponding first weight for each channel in each sensor and the corresponding second weight for each sensor.

[0010] After generating a first feature weight value for each feature based on the feature vector of the first sensing data through the feature weight network in the neural network model, a second feature weight value is generated based on the second weight and the first feature weight value; wherein, the first feature weight value is the local weight of each feature within the corresponding sensor, and the second feature weight value is the global weight of each feature;

[0011] The classification network in the neural network model obtains the first human action recognition result based on the second feature weight value and the feature vector.

[0012] The loss value is calculated based on the first human motion recognition result and the known motion corresponding to the first sensing data, and the network parameters in the neural network model are updated based on the loss value to obtain the trained neural network model.

[0013] The second human motion recognition result is obtained by using the trained neural network model based on the second sensing data to be identified.

[0014] Secondly, this disclosure provides a human motion recognition device based on a neural network, the device comprising: a data acquisition section, a first weight section, a second weight section, a first recognition section, an update section, and a second recognition section; wherein,

[0015] The data acquisition section acquires first sensing data of known actions through multiple sensors placed on different parts of the human body;

[0016] The first weighting part is configured to input the feature vector of the first sensing data into the neural network model, obtain the difference vector between the internal channels of the sensor through the fusion optimization network in the neural network model, and then use the difference vector and the feature to set a corresponding first weight and a corresponding second weight for each channel in each sensor through the sensor weight generation network in the neural network model.

[0017] The second weighting component is configured to generate a first feature weight value for each feature based on the feature vector of the first sensing data through the feature weight network in the neural network model, and then generate a second feature weight value based on the second weight and the first feature weight value; wherein, the first feature weight value is the local weight of each feature within the corresponding sensor, and the second feature weight value is the global weight of each feature;

[0018] The first recognition part is configured to obtain a first human action recognition result through the classification network in the neural network model based on the second feature weight value and the feature vector;

[0019] The update part is configured to calculate a loss value based on the first human action recognition result and the known action corresponding to the first sensing data, and update the network parameters in the neural network model based on the loss value to obtain a trained neural network model.

[0020] The second recognition part inputs the feature vector of the sensor data to be recognized into the trained neural network model to obtain the second human motion recognition result.

[0021] Thirdly, this disclosure provides a computer storage medium storing a neural network-based human motion recognition program, which, when executed by at least one processor, implements the steps of the neural network-based human motion recognition method described in the first aspect.

[0022] This disclosure provides a method, apparatus, and medium for human motion recognition based on neural networks. A sensor weight network is used to generate sensor channel weights, and a fusion optimization network fully utilizes data from each channel within a single sensor to recognize human motion by mining the correlations between channels within the sensor, avoiding the scalability issues of channel selection based on channel weights. A feature weight network generates first feature weight values ​​based on feature vectors, used for feature selection and removal of redundant features. This dual selection based on sensors and features reduces data redundancy, decreases sensor data, and ensures final recognition accuracy and stability. Furthermore, observing the distribution of feature weights has significant practical value for studying feature extraction standards and optimizing multi-sensor deployments, demonstrating good interpretability and practical scalability. Attached Figure Description

[0023] Figure 1 This is a schematic diagram of a neural network-based human motion recognition method provided in this disclosure;

[0024] Figure 2 This is a schematic diagram of a neural network structure for human motion recognition provided in this disclosure;

[0025] Figure 3 This is a schematic diagram of a sensor weighted network structure provided in this disclosure;

[0026] Figure 4 This is a schematic diagram of an optimized subnetwork structure provided in this disclosure;

[0027] Figure 5 This is a schematic diagram of a feature subnetwork structure provided in this disclosure;

[0028] Figure 6 This is a schematic diagram of sensor location distribution for a Skoda dataset provided in this disclosure;

[0029] Figure 7 Comparison curves of evaluation metrics for the different methods provided in this disclosure on the Skoda(left) dataset;

[0030] Figure 8 Comparison curves of evaluation metrics for the different methods provided in this disclosure on the Skoda(left) dataset;

[0031] Figure 9 Comparison curves of evaluation metrics for different methods provided in this disclosure on the Skoda(right) dataset;

[0032] Figure 10 Comparison curves of evaluation metrics for different methods provided in this disclosure on the Skoda(right) dataset;

[0033] Figure 11 A schematic diagram showing the locations of the top N important sensors selected for this disclosure;

[0034] Figure 12 This is a schematic diagram of the confusion matrix obtained from testing on the Skoda dataset provided in this disclosure;

[0035] Figure 13 This is a schematic diagram of the feature weight distribution on the Skoda dataset provided in this disclosure;

[0036] Figure 14 A schematic diagram of a human motion recognition device based on a neural network provided in this disclosure;

[0037] Figure 15 This is a schematic diagram of the structure of a computing device provided in this disclosure. Detailed Implementation

[0038] The terms "first" and "second" in this disclosure are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.

[0039] To more clearly illustrate the technical solutions in this disclosure, the technical solutions in this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this disclosure, and not all embodiments. Based on the embodiments in this disclosure, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this disclosure.

[0040] Multi-sensor-based human motion recognition has wide applications in fitness tracking, healthcare, virtual reality, and motion analytics. Using sensor data to recognize human movements can help monitor health, improve motor skills, or design interactive applications. These sensors can be placed in different parts of the body, such as the wrist, upper arm, waist, and legs, to record data generated during human movement. By analyzing this data, specific human movements or activity patterns can be identified.

[0041] Specifically, N sensor nodes are first fixed to N different parts of the human body. Each sensor node is equipped with one or more sensors, and the collected measurement data is uploaded to a data processing platform via a receiving node. Then, the measurement data from the N sensor nodes is used to obtain human motion recognition results, such as standing, running, climbing stairs, etc.

[0042] Sensors used for human motion recognition tasks are mainly classified into motion sensors, environmental sensors, and other sensors. Among these, motion sensors are crucial and the most widely used, including gyroscopes, accelerometers, magnetometers, and pressure sensors. Gyroscopes are primarily used to measure changes in different directions of motion and angular velocity within a space. Accelerometers measure acceleration and convert it into a corresponding output signal. Magnetometers detect the strength and direction of magnetic fields. Pressure sensors measure pressure changes. Environmental sensors include temperature sensors, barometric pressure sensors, and light sensors, which monitor and record changes in the current environment. Other sensors include additional types such as sound sensors that detect changes in sound waves, skin-like sensors that attach to the skin to detect vital signs like body temperature and muscle activity, and heart rate sensors that detect changes in heart rate.

[0043] Different motion recognition systems employ varying types, numbers, and locations of sensors, which can unexpectedly impact classification performance. Furthermore, using multiple sensors for data acquisition not only results in excessively large datasets but also leads to the collection of data with similar or identical trends. Transmitting and processing this redundant data results in significant energy waste, contradicting the growing demand for low energy consumption and high recognition efficiency. Additionally, additional sensors may fail unexpectedly due to harsh deployment environments or signal interference, leading to data gaps and increased invalid values. This missing data not only confuses the recognition model, further impacting classification performance but also places high demands on the model's robustness. Therefore, it is necessary to perform multi-level filtering of redundant sensor data, identifying sensors with higher priority for the current task. This approach reduces sensor data usage while maintaining recognition accuracy, enabling the construction of a more compact and efficient human motion recognition system. Ultimately, this reduces data processing costs and computational complexity, while improving recognition efficiency.

[0044] See Figure 1 This disclosure provides a neural network-based human motion recognition method. By introducing a fusion optimization network to explore the correlation between sensor channels, it fully utilizes internal sensor data for sensor selection, reduces redundant sensor data, lowers data processing costs and computational complexity, and improves recognition efficiency, while ensuring the final recognition accuracy and stability. The method includes:

[0045] S101: Acquire first sensing data of known movements by using multiple sensors placed on different parts of the human body;

[0046] S102: Input the feature vector of the first sensing data into the neural network model, obtain the difference vector between the internal channels of the sensor through the fusion optimization network in the neural network model, and then use the sensor weight generation network in the neural network model to set the corresponding first weight for each channel in each sensor and the corresponding second weight for each sensor using the difference vector and the feature vector.

[0047] S103: After generating a first feature weight value for each feature based on the feature vector of the first sensing data through the feature weight network in the neural network model, a second feature weight value is generated based on the second weight and the first feature weight value; wherein, the first feature weight value is the local weight of each feature within the corresponding sensor, and the second feature weight value is the global weight of each feature;

[0048] S104: Obtain the first human action recognition result through the classification network in the neural network model based on the second feature weight value and the feature vector;

[0049] S105: Calculate the loss value based on the first human motion recognition result and the known motion corresponding to the first sensing data, and update the network parameters in the neural network model based on the loss value to obtain the trained neural network model;

[0050] S106: Based on the second sensing data to be identified, the trained neural network model is used to obtain the second human motion recognition result.

[0051] It should be noted that the sensing data is obtained through measurements taken from multiple sensor nodes fixed to different parts of the human body. Each sensor node may include one or more sensors. Each sensor transmits measurement data through multiple signal channels, each of which is called a channel of the sensor. For example, a gyroscope can simultaneously capture measurement data along the x, y, and z axes, transmitting the measurement data through the x, y, and z channels. The measurement data acquired by a gyroscope includes data from all three channels. Similarly, an accelerometer is used to measure acceleration and convert it into a corresponding output signal. For a three-axis accelerometer, acceleration measurement data for each of the three axes can be obtained through three separate channels.

[0052] Data characteristics provide information about the data to facilitate analysis and understanding. For example, time-series data is described using features such as mean and variance. Image data is described using features such as color histograms and texture features. The sensing data collected in this disclosure is multi-sensor, multi-channel time-series data. For each channel's data, features such as mean, maximum, and minimum values ​​are used to describe the data characteristics of that channel. These feature values ​​constitute a feature vector used to describe the data of the corresponding channel. For sensing data of a specific duration, the number of feature vectors extracted is the same as the number of channels across all sensors. Figure 2 In the schematic diagram of the neural network structure for human motion recognition shown, feature extraction is used to calculate feature values. The received sensor measurement data is processed through feature extraction to obtain the feature vector of each channel.

[0053] Typically, due to factors such as spatial structure, multiple channels within the same sensor exhibit some correlation. In practical action recognition tasks, channels within the same sensor may show similar trends, which often impact recognition performance. This disclosure utilizes a fusion optimization network to identify human actions by mining the correlations between channels within the sensor, fully leveraging data from each channel within a single sensor. This facilitates sensor selection and avoids the scalability issues associated with channel selection based on channel weights. Selecting channel data based on channel weights is difficult in practical applications to isolate and collect data from specific channels, leading to limited scalability. Figure 2 The schematic diagram of the neural network structure for human motion recognition shows that the feature vector of the sensed data is input into the neural network model. The fusion optimization network in the neural network model explores the correlation between the internal channels of the sensor and outputs the corresponding difference vector, which is fed back to the sensor weight generation network. Based on the feature vector of the sensed data and the difference vector for the correlation between the internal channels of each sensor, the sensor weight generation network in the neural network model outputs the first weight of each channel and the second weight of each sensor. The sensor weight generation network and the fusion optimization network interact to form the sensor weight network, which outputs the first weight of each sensor channel and the second weight of each sensor. This reduces the difference between the first weights within a sensor, which is beneficial for fully utilizing the data within a single sensor, and widens the difference in the second weights between sensors, making it easier to select sensors with higher priority.

[0054] like Figure 2 In the neural network model shown, the feature weight network outputs a first feature weight value for each feature within the sensor based on the feature vector of the sensed data. A second feature weight value representing the global weight of the feature is obtained by combining a second weight representing the sensor weight with the first feature weight value representing the local weight of the sensor's features. This second feature weight value simultaneously reflects both the sensor's weight information and the feature's weight information. After weighting the feature vector input to the neural network model using this second feature weight value, it is fed into the classification network. The classification network is primarily used to optimize the correlation between the weighted features and the target, obtaining the human action recognition result. For the first sensed data used for training, the first human action recognition result output by the classification network is combined with known action labels to calculate a loss value, which is used to optimize the parameters in the neural network model. Simultaneously, it updates each weight value, including the first weight, second weight, first feature weight value, and second feature weight value. After training, the trained neural network model can output a second human action recognition result for the second sensed data to be recognized.

[0055] Figure 2In the neural network model shown, the two-level weight network consisting of the sensor weight network and the feature weight network not only considers the correlation between the internal channels of the sensor for sensor selection, but also has the function of reducing redundant features. While ensuring recognition accuracy, it reduces the number of sensors and internal features used, thereby greatly improving recognition efficiency, while ensuring the final recognition accuracy and recognition stability.

[0056] It should also be noted that, in addition to outputting human motion recognition results from the sensor data to be recognized, the trained neural network model can also output a first weight representing channel priority, a second weight representing sensor priority, a first feature weight representing the priority of internal sensor features, and a second feature weight representing the global priority of features. Observing the distribution of feature weights is of great guiding significance for the study of feature extraction standards and has good practical value in the research of multi-sensor optimized deployment, satisfying the two major requirements of interpretability and practical scalability.

[0057] Optionally, in some examples, inputting the feature vector of the first sensed data into the neural network model includes:

[0058] The first sensing data is divided into multiple data samples by setting a sliding window and a step size;

[0059] For each of the multiple data samples, extract multiple features of each channel of data in each data sample to obtain the feature vector of each channel;

[0060] The multiple data samples are divided into multiple batches, and the feature vector corresponding to each batch is input into the neural network model in batch by batch, wherein each batch includes at least one data sample.

[0061] When the acquired sensing data is a time-series data of a series of actions, it is usually necessary to segment the sensing data according to the time sequence by setting a sliding window and step size. The sensing data corresponding to each time period obtained by segmentation is a data sample. The sliding window and step size are set according to the characteristics of the actions. Generally, in order to ensure the continuity of actions in the data samples, a 50% window overlap rate is usually set.

[0062] Feature values ​​are calculated for each channel of data in each data sample. Several feature values ​​for each channel constitute the feature vector for that channel. The number of feature vectors for each data sample is the same as the number of channels in that data sample. Therefore, for each data sample, a corresponding set of feature vectors can be obtained. The input data for training the neural network model can be per-sample input, meaning that only the feature vector of one data sample is input to the neural network model at a time. Alternatively, it can be batch input, dividing multiple data samples into multiple batches, each batch containing at least one data sample. The feature vector corresponding to each batch is input to the neural network model batch by batch, thus training the neural network model by inputting all sample data batch by batch, completing one iteration of all sample data. Multiple iterations are performed to input all sample data into the neural network model multiple times to complete the training of the neural network model. Generally, the number of iterations can be set manually or indirectly through convergence conditions. Batch input can improve training efficiency and help converge to suitable parameters faster. To avoid overfitting, the order of sample input is random when using per-sample input; when using batch input, data samples in a batch can be randomly selected from all the data samples obtained from the segmentation.

[0063] Optionally, in some examples, extracting multiple features from each channel of the data in each data sample includes:

[0064] Based on the data of each channel in each data sample, the extracted features include minimum value, maximum value, mean, variance, skewness, kurtosis, and five peak values ​​of the discrete Fourier transform.

[0065] Specifically, in some examples, the length is N x The feature extraction formulas for the data sequence X are as follows:

[0066] Mean:

[0067] variance:

[0068] Skewness:

[0069] Kuroshi:

[0070] Five peaks of the Discrete Fourier Transform: After performing a Discrete Fourier Transform on the data, the first five peak points of the transform result are taken. The Discrete Fourier Transform is as follows:

[0071] Optionally, in some examples, obtaining the difference vector between channels within the sensor through the fusion optimization network in the neural network model, and setting a corresponding first weight for each channel in each sensor and a corresponding second weight for each sensor through the sensor weight generation network in the neural network model using the difference vector and feature vector, includes:

[0072] For the current batch, the feature vector of the current batch and the difference vector generated by the fusion optimization network in the neural network model based on the feature vector of the previous batch are used. The sensor weight generation network in the neural network model sets a corresponding first weight for each channel in each sensor and a corresponding second weight for each sensor.

[0073] The fusion optimization network in the neural network model generates the difference vector of the current batch based on the first weight and the feature vector of the current batch.

[0074] It should be noted that for the first batch, since there is no difference vector feedback, the sensor weight generation network generates the first and second weights of the first batch based on the feature vector of the first batch, and the fusion optimization network generates the difference vector of the first batch based on the feature vector and the first weight. For the second batch and subsequent batches, the sensor weight generation network generates the first and second weights of the current batch based on the feature vector of the current batch and the difference vector of the previous batch, and the fusion optimization network generates the difference vector of the current batch based on the feature vector and the first weight. For example, for the second batch, the sensor weight generation network generates the first and second weights of the second batch based on the feature vector of the second batch and the difference vector of the first batch, and the fusion optimization network generates the difference vector of the second batch based on the feature vector and the first weight, for use in processing the next batch of data.

[0075] Optionally, in some examples, the sensor weight generation network in the neural network model includes a multilayer perceptron and a softmax activation function. The sensor weight generation network in the neural network model sets a corresponding first weight and a corresponding second weight for each channel in each sensor, including:

[0076] The feature vectors of the current batch are adjusted according to the difference vectors of the previous batch to obtain the adjusted feature vector group.

[0077] Extract the average value of each feature vector in the adjusted feature vector group to generate the first average value feature vector;

[0078] Extract the maximum value of each feature vector in the adjusted feature vector group to generate the first maximum value feature vector;

[0079] The first average value feature vector is input into the multilayer perceptron to obtain the first output vector;

[0080] The first maximum value feature vector is input into the multilayer perceptron to obtain the second output vector;

[0081] After summing the first output vector and the second output vector, the first weight corresponding to each channel in each sensor is obtained by using the softmax activation function;

[0082] The second weight is obtained for each sensor based on the first weight.

[0083] like Figure 3 In the schematic diagram of the sensor weight generation network structure shown, the length of the difference vector output by the fusion optimization network is consistent with the number of channels of all sensors, and each element in the difference vector corresponds to the adjustment factor of each channel. After adjusting the feature vector input to the neural network model through the difference vector, the adjusted feature vector is obtained. The average and maximum values ​​of the feature vector of each channel are calculated to obtain the first average feature vector and the first maximum feature vector. The first average feature vector and the first maximum feature vector are respectively passed through a multilayer perceptron to obtain the first output vector and the second output vector. After summing the first output vector and the second output vector, the channel weight of each channel is obtained through the softmax activation function.

[0084] The sensor weights are obtained by weighting the channel weights within the sensor. Optionally, in some examples, obtaining the sensor weight of each of all sensors based on the sensor channel weights includes:

[0085] Among them, S k C is the second weight of the k-th sensor. k,i It is the first weight of the i-th channel in the k-th sensor.

[0086] Optionally, in some examples, the fusion optimization network in the neural network model generates a difference vector for the current batch based on the first weights and the feature vector of the current batch, including:

[0087] The fusion optimization network in the neural network model includes multiple optimization subnetworks, where each optimization subnetwork corresponds to a sensor;

[0088] For each optimized subnetwork, based on the feature vector of the current batch, the feature vector of all channels in the sensor corresponding to each optimized subnetwork is obtained as the first input data, and the first weight of all channels in the sensor corresponding to the optimized subnetwork is obtained as the second input data.

[0089] Based on the first input data and the second input data, a difference vector for each optimized sub-network is generated using a convolutional neural network;

[0090] The difference vectors generated by all optimized subnetworks are used as the difference vectors for the current batch.

[0091] It should be noted that when the sensor weight generation network generates sensor channel weights and sensor weights based solely on the input feature vector, it fails to consider the correlation between channels. This can lead to insignificant differences in sensor weights across different sensors, hindering sensor selection. The fusion optimization network, through a convolutional neural network, explores the correlation between channels within each sensor and outputs a difference vector to the sensor weight generation network to strengthen the relationships between channels within the sensor. This brings the weights of different channels within the same sensor closer together, further increasing the weight differences between different sensors. This allows for easier selection of efficient sensors and more effective utilization of data within a single sensor.

[0092] Optionally, in some examples, generating the difference vector for each optimized sub-network using a convolutional neural network based on the first input data and the second input data includes:

[0093] Generate a weight difference matrix based on the second input data;

[0094] The maximum value of each feature vector is extracted from the first input data to obtain the second average feature vector, and the average difference matrix is ​​generated based on the second average feature vector.

[0095] The maximum value of each feature vector is extracted from the first input data to obtain the second maximum value feature vector, and the maximum value difference matrix is ​​generated based on the second maximum value feature vector.

[0096] The average difference matrix is ​​input into the first convolutional neural network to generate a first deep feature vector.

[0097] The maximum value difference matrix is ​​input into the second convolutional neural network to generate a second deep feature vector.

[0098] The sum of the first depth feature vector and the second depth feature vector is multiplied by the weight difference matrix to obtain the difference vector of each optimized subnetwork.

[0099] like Figure 4 The diagram illustrates an optimized subnetwork structure, where each subnetwork corresponds to a sensor. Based on the sensor's feature vector, the optimized subnetwork extracts the second average feature vector and the second maximum feature vector. Then, based on the first weight, second average feature vector, and second maximum feature vector of each channel within the sensor, corresponding weight difference matrices, average difference matrices, and maximum difference matrices are generated. Taking the weight difference matrix as an example, assuming the sensor has d channels, the size of the weight difference matrix D is d×d. The elements D in the weight difference matrix D... i,j D is the difference between the first weight of the i-th channel and the first weight of the j-th channel. i,j =|c i -c j Similarly, the mean difference matrix and the maximum difference matrix can be obtained using the same method. The mean difference matrix and the maximum difference matrix are then input into their respective convolutional neural networks to further extract depth features between the sensor's internal channels, obtaining corresponding depth feature vectors. Optionally, in some examples, these two convolutional neural networks have the same structure. The depth feature vectors generated by the two convolutional neural networks are summed and multiplied by the weight difference matrix to obtain a difference vector. The length of this difference vector is the same as the corresponding number of sensor channels, and each element in the difference vector corresponds to an adjustment factor for one channel. The feature vectors of each channel are adjusted using the corresponding element values ​​in the difference vector and then input into the sensor weight generation network. Through the interaction between the sensor weight generation network and the fusion optimization network, the sensor channel weights and the sensor weights are continuously optimized to ultimately obtain a reasonable distribution.

[0100] Optionally, in some examples, generating a first feature weight value for each feature based on the feature vector of the first sensed data through the feature weight network in the neural network model includes:

[0101] The feature weight network in the neural network model includes multiple feature subnetworks, where each feature subnetwork corresponds to a sensor;

[0102] For each feature subnetwork, the feature vector of the sensor corresponding to each feature subnetwork is obtained from the feature vector of the current batch input, and used as the third input data of each feature subnetwork;

[0103] Based on the third input data, the first feature weight value of each feature in the sensor corresponding to each feature sub-network is generated through a self-attention mechanism model.

[0104] Optionally, in some examples, the feature subnetwork is assigned based on the second weights of the sensor.

[0105] Specifically, after obtaining the second weight for each sensor, they are sorted in descending order, retaining sensors with higher weights. Sensors not retained can have their weights set to 0. The feature weight network can process only the retained sensors, meaning that feature subnetworks are assigned only to the retained sensors. For sensors not retained, no feature subnetwork is assigned to save computational resources and improve operational efficiency.

[0106] Optionally, in some examples, generating the first feature weight value for each feature within the sensor corresponding to each feature sub-network using a self-attention mechanism model based on the third input data includes:

[0107] By integrating all feature vectors from the third input data, a first feature vector is obtained;

[0108] The first feature vector is transformed by three one-dimensional convolutions to obtain the second feature vector, the third feature vector, and the fourth feature vector;

[0109] After the second feature vector is transposed and multiplied by the third feature vector matrix, the probability distribution matrix is ​​obtained by applying the softmax activation function.

[0110] After the fourth feature vector is multiplied by the probability distribution matrix and adjusted by a 1*1 convolution, a first feature weight vector with the same number of features as the sensor is output, wherein each element in the first feature weight vector is the first feature weight value of the corresponding feature in the sensor.

[0111] like Figure 5 In the schematic diagram of the feature subnetwork structure shown, for example, a sensor includes m channels, and the length of the feature vector of each channel is n. c Then the number of features in the sensor is m*n c The feature vectors from m channels in the sensor are integrated to form the first feature vector, and the length of the first feature vector is n. f =m*n c Set up three one-dimensional convolutional kernels with a length of n. f The number of channels is n f The length of the vector obtained by one-dimensional convolution of the first feature vector is 1×n. f The first eigenvector is transformed into the second, third, and fourth eigenvectors through three one-dimensional convolutions. The transposed third eigenvector is then multiplied by the fourth eigenvector to obtain n. f ×n fThe matrix, after being processed by the softmax activation function and multiplied by the fourth feature vector, is then adjusted by a 1*1 convolution to obtain a first feature weight vector with the same length as the first feature vector. Each element in the first feature weight vector is the first feature weight value of the corresponding sensor feature.

[0112] Optionally, in some examples, the generation of the second feature weight value based on the second weight and the first feature weight value is as shown in the following formula:

[0113] w k,i,p =S k *F i,p

[0114] Among them, w k,i,j S is the second feature weight value of the p-th feature of the i-th channel of the k-th sensor. k F is the second weight of the k-th sensor. i,p It represents the first feature weight value of the p-th feature in the i-th channel of the sensor.

[0115] It should be noted that each feature subnetwork processes data from a single sensor, and its output first feature weight is the feature weight within that sensor, representing a local feature weight. The second feature weight, determined by the product of the first feature weight and a second weight representing the sensor weight, is the global feature weight. This second feature weight reflects both the feature weight information and the sensor weight information. By sorting the second feature weights in descending order and selecting features with higher weights to obtain a feature subset, both sensor selection and feature selection can be achieved simultaneously.

[0116] Optionally, in some examples, calculating the loss value based on the first human motion recognition result and the known motion corresponding to the first sensing data includes:

[0117] Based on the first human motion recognition result and the known motion, the loss value is calculated by introducing a sensor channel weight regularization term, a feature weight regularization term, and a sensor intra-channel weight difference regularization term into the loss function, as shown in the following formula:

[0118]

[0119] Where CE(·) is the cross-entropy loss function, ω is the second feature weight vector, and c is the first weight vector. k,i,p c represents the weight value of the second feature corresponding to the p-th feature of the i-th channel of the k-th sensor. k,i Let λ1, λ2, and λ3 represent the first weight corresponding to the i-th channel of the k-th sensor, and let λ1, λ2, and λ3 represent hyperparameters.

[0120] Cross-entropy loss is a commonly used loss function in classification problems, used to measure the difference between the model's predictions and the actual labels. In the above formula, CE(ω,c) reflects the difference between the recognition results based on the feature weight vector ω and the sensor channel weight vector c of the training data and the actual human action labels. Least Absolute Shrinkage and Selection Operator (Lasso) is a regularization technique for linear regression, designed to reduce model complexity, prevent overfitting, and select the most important features during the modeling process. In the above formula, the regularization term... This ensures the sparsity of channel selection. This ensures the correlation of weights between channels within a single sensor, making full use of data from a single sensor. This ensures the sparsity of feature selection.

[0121] By optimizing the neural network model through loss calculation, a trained neural network model is obtained. This trained neural network model can be used for embedded feature selection, or as a whole for human action recognition. Alternatively, a feature subset can be obtained from the final second feature weights output by the trained neural network, used for filtered feature selection, and combined with other pre-trained networks for human action recognition.

[0122] Optionally, in some examples, obtaining the second human action recognition result based on the trained neural network model using the second sensing data to be identified includes:

[0123] The feature vector of the second sensing data is input into the trained neural network model to obtain the second human motion recognition result.

[0124] When used for embedded feature selection, the entire trained neural network model serves as a human action recognition model, outputting human action recognition results based on the input sensor data to be recognized.

[0125] Optionally, in other examples, obtaining the second human action recognition result based on the trained neural network model using the second sensing data to be identified includes:

[0126] Obtain the second feature weight value corresponding to the trained neural network model, and use it as the final feature weight;

[0127] A feature subset is obtained by filtering based on the set number of features and the final feature weights;

[0128] Acquire third-party sensing data of known actions;

[0129] The third sensing data is used to extract features based on the feature subset, which is then used as the fourth input data.

[0130] The fourth input data is input into the pre-trained second neural network model for training to obtain a trained second neural network model;

[0131] After extracting features from the second sensing data based on the feature subset, the results are input into the trained second neural network model to obtain the second human motion recognition result of the second sensing data.

[0132] It should be noted that, when using filtered feature selection, all features are sorted according to the final second feature weight values ​​corresponding to the pre-trained neural network, and a feature subset is obtained by sequentially filtering based on the set number of features. Features are extracted from the training sensing data using this feature subset, and then input into other pre-trained second neural network models for training, resulting in a trained second neural network model. The second sensing data to be identified is then used to extract features from the feature subset, input into the trained second neural network model, and the second human motion recognition result is output.

[0133] This disclosure evaluates the performance of the aforementioned human motion recognition model using the metrics precision, recall, accuracy, weight F (F1-measure), and AUC (Area Under the Receiver Operating Characteristic curve). Given predicted values ​​and ground truth labels, the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) predicted as negative samples can be calculated, and these metrics are used to calculate the evaluation metrics.

[0134] Precision: refers to the proportion of true positive samples out of all samples judged as positive. The formula is as follows:

[0135]

[0136] Recall: The proportion of all true positive samples that are identified as positive samples. The formula is as follows:

[0137]

[0138] Accuracy: refers to the proportion of samples that are correctly classified out of all samples, and its formula is as follows:

[0139] Where q is the total number of sample categories;

[0140] weightF: Also known as the F1-measure, it takes into account both precision and recall, and can measure the overall performance of the model. Its formula is as follows:

[0141] Among them, w z The proportion of the z-th class of samples out of all samples;

[0142] AUC: The area under the ROC curve, used to measure the performance of a model. A higher value indicates better classification performance. The formula is as follows:

[0143]

[0144] This disclosure uses the Skoda dataset from the UCI Machine Learning Library for experimental testing.

[0145] The Skoda dataset contains data on 10 different hand gestures performed by one subject in a car repair scenario, including writing on a notebook, opening and closing the hood, checking the front door gap, opening and closing the left front door, closing both left doors, checking the trunk gap, opening and closing the trunk, and checking the steering wheel. All data in this dataset was collected using 2*10 USB triaxial accelerometers placed on both arms at a sampling rate of 98 Hz. Figure 6 The diagram shows the sensor location distribution of the Skoda dataset, with the numbers indicating sensor numbers. This dataset contains calibrated acceleration data (labeled "calibrated") and raw acceleration data based on ADC readings for each sensor. In the experiment, the sensor data from both arms will be tested separately to verify the results.

[0146] This disclosure uses a sliding window method to divide the data samples. For the Skoda dataset, the sliding window size is 2s, the step size is set to 1s, and the window overlap rate is set to 50%, thereby ensuring the continuity of actions in the data samples.

[0147] Since the loss function disclosed herein includes a norm regularization term, which is itself non-differentiable, a hyperbolic tangent function is used in the experiments to approximate the regularization term in the function, as shown in the following formula:

[0148] |x|≈xf(x)=a*x*tanh(γx), where a and γ are parameters, and γ>>1.

[0149] The loss function can be updated to the following form:

[0150]

[0151] This formula effectively avoids the problem of difficulty in obtaining sparse solutions caused by the non-differentiability of the regularization term itself during training. In the actual experiment, a = 1 and γ = 100 were set. Repeated random subsampling was used to test the dataset. The dataset was divided into training and testing sets in an 8:2 ratio, and the training and testing process was repeated 10 times, with the average value used as the final result. The entire model was optimized using the stochastic gradient descent (SGD) optimizer, with a batch size of 64 and a weight decay of 0.0001. Additionally, γ was set to 100, and the number of iterations was set to 40. Furthermore, the experiment selected (λ1, λ2, λ3) from the set {1e-5, 2e-5, 5e-5, ..., 5e-1, 1, 2, 5}, which produced the fewest mislabeled values. In the classification network, the pre-trained ResNet18 provided by PyTorch was used for optimization.

[0152] The neural network proposed in this disclosure is compared with other methods, including group-based Lasso (I-CNN), discriminative pruning based on ensemble learning (MSF-EP), and channel-selective convolutional neural networks (ResNet+SelectConv). For ease of description, the neural network model proposed in this disclosure is referred to as dfLasso-Net in the experimental data below. The experimental tests can be divided into two categories: one comparing dfLasso-Net as an embedded feature selection method with other methods, and the other comparing dfLasso-Net as a filtered feature selection method.

[0153] The test results are shown in Table 1 when used as an embedded feature selection method. The method proposed in this disclosure achieves the best recognition performance on the Skoda dataset. MSF-EP performs poorly, while I-CNN yields the worst results.

[0154]

[0155] Table 1

[0156] When used as a filtered feature selection method, the test results are shown in Table 2. dfLasso-Net achieved the best average recognition accuracy on the Skoda dataset. I-CNN was slightly inferior, but generally outperformed MSF-EP. As for traditional lasso-based methods, the traditional sparse group lasso (sgLasso) method and the fused lasso method performed similarly on the Skoda dataset.

[0157]

[0158] Table 2

[0159] To verify the effectiveness and stability of the algorithm disclosed herein in the recognition process, experiments were conducted using test results of these methods within a given feature count range K, such as... Figures 7-10 As shown in the figure, when K is small, the recognition performance of all methods improves significantly with the increase of K, indicating that when the number of retained sensors is small, effective features are crucial for improving recognition performance. When K increases to a certain range, the performance improvement of all methods becomes less significant, indicating that there is a threshold for the number of effective sensors in HAR tasks, and excessive sensors and internal features cannot further improve recognition performance. Overall, the method proposed in this disclosure undoubtedly achieves the best recognition performance and optimal stability compared to other methods, significantly outperforming not only traditional sparse group lasso methods and fusion lasso methods, but also surpassing I-CNN based on neural networks. It can be seen that the proposed dfLasso-Net has a significant advantage when the number of features is small, indicating that compared to other methods, dfLasso-Net can more efficiently identify and extract the most effective data information for the current task.

[0160] exist Figure 11 The locations of the top N most important sensors selected by dfLasso-Net are marked in the image to observe whether it can accurately locate the corresponding sensors based on different actions. Here, N is set to half the number of sensors in the dataset. Figure 11 As can be seen, for the Skoda dataset, the sensor locations are primarily situated on the upper and lower arms of both hands. Whether testing with data from the left or right hand, dfLasso-Net's sensor location selection is relatively even, avoiding a situation where the majority of sensor locations are on the upper or lower arms. This is because the action categories in this dataset involve large hand movements, often requiring simultaneous movement of both upper and lower arms. Overall, dfLasso-Net effectively and dynamically selects the most efficient sensors for different tasks based on their specific requirements.

[0161] exist Figure 12The confusion matrix obtained from testing dfLasso-Net is presented. By observing the number and location of correctly classified and misclassified samples in the confusion matrix, the actual recognition performance of dfLasso-Net can be analyzed. On the Skoda dataset, different classification results were obtained using data from different arms. When testing with the left arm data, errors mainly occurred in "opening and closing the hood," while in the right arm data tests, errors mainly occurred in "opening the left front door" and "closing the left front door," because these actions are highly correlated and easily confused. Overall, the dfLasso-Net proposed in this disclosure can recognize most action categories well, basically achieving the expected goals.

[0162] A good action recognition method not only achieves high recognition accuracy but also provides reasonable explanations and interpretations. Therefore, this section attempts to illustrate the sensor selection results of this model by observing the distribution of feature weights on different sensors. Figure 13 The distribution of feature weights is shown, confirming the expected conclusions of the experiment. The horizontal axis of these graphs represents the extracted feature names, and the vertical axis corresponds to the x, y, and z axes of the internal channels of multiple sensors located in different body parts. It can be seen that the distribution of important features is closely related to the sensor priority selected by dfLasso-Net. For sensors with higher priority, their internal feature weights are significantly higher; while for sensors with lower priority, their internal feature weights are suppressed to smaller values. With the smoothing effect of fused lasso, the feature weights within the same sensor are closer and significantly different from those of other sensors. At this point, selecting sensors more important to the task based on sensor weights becomes easier, better avoiding the problem of fluctuating recognition performance caused by very similar sensor weights. Furthermore, the selection of features within each sensor also differs, which is very helpful for subsequent feature selection. Observing the distribution of feature weights is of great guiding significance for studying feature extraction standards for different datasets. In summary, the proposed dfLasso-Net has good practical value in multi-sensor optimization deployment research, meeting the two major requirements of interpretability and practical extensibility.

[0163] To verify the effectiveness of different modules in the proposed method, this section conducts further tests. Specifically, this chapter attempts to remove the entire sensor weight network (denoted as sennet), the fusion optimization network (denoted as optnet), and the feature weight network (denoted as feanet) separately, and then tests the corresponding results after removal. All test results are shown in Table 3. Since the entire network cannot calculate feature weights after removing the feature weight network, the average test results are calculated instead for different numbers of sensors. In other cases, the average test results are calculated for a given range K of feature selections. As can be seen from the table, when the sensor weight network is removed, the network only calculates the feature weights of all sensors, losing the sensor selection function, and the test results obtained are all reduced. This shows that in HAR tasks, selecting sensors useful for the current task can effectively improve recognition performance. The results obtained by removing the fusion optimization network separately are also lower than those of the original network, indicating that the fusion optimization network based on fusion lasso can indeed improve network performance. When the feature weight network is removed, the results are closer to the original network because the network only retains the sensor selection function, and the result is based on using all features within different numbers of sensors. The original network does retain effective features during the feature selection process, avoiding the impact of redundant features on recognition performance.

[0164]

[0165] Table 3

[0166] Based on the same inventive concept as the aforementioned technical solution, see [link to inventive concept]. Figure 14 This disclosure illustrates a neural network-based human motion recognition device based on the same concept. The device includes: a data acquisition section 1401, a first weight section 1402, a second weight section 1403, a first recognition section 1404, an update section 1405, and a second recognition section 1406; wherein...

[0167] The data acquisition section 1401 acquires first sensing data of known actions through multiple sensors placed on different parts of the human body;

[0168] The first weighting portion 1402 is configured to input the feature vector of the first sensing data into a neural network model, obtain the difference vector between the internal channels of the sensor through the fusion optimization network in the neural network model, and then use the difference vector and the feature to set a corresponding first weight and a corresponding second weight for each channel in each sensor through the sensor weight generation network in the neural network model.

[0169] The second weighting portion 1403 is configured to generate a first feature weight value for each feature based on the feature vector of the first sensing data through the feature weight network in the neural network model, and then generate a second feature weight value based on the second weight and the first feature weight value; wherein, the first feature weight value is the local weight of each feature within the corresponding sensor, and the second feature weight value is the global weight of each feature;

[0170] The first recognition part 1404 is configured to obtain a first human action recognition result through the classification network in the neural network model based on the second feature weight value and the feature vector;

[0171] The update portion 1405 is configured to calculate a loss value based on the first human action recognition result and the known action corresponding to the first sensing data, and update the network parameters in the neural network model based on the loss value to obtain a trained neural network model.

[0172] The second recognition part 1406 is configured to input the feature vector of the sensor sensing data to be recognized into the trained neural network model to obtain the second human motion recognition result.

[0173] It should be noted that for the specific implementation of the functions configured in each "part" of the above-mentioned device, please refer to the aforementioned... Figure 1 The implementation methods and examples of the corresponding steps in the neural network-based human motion recognition method shown are not repeated here.

[0174] Please refer to Figure 15 This illustration shows a structural block diagram of a computing device provided in an exemplary embodiment of the present disclosure. In some examples, the computing device 150 can be at least one of devices such as a smartphone, smartwatch, desktop computer, laptop, virtual reality terminal, augmented reality terminal, wireless terminal, and laptop computer. The computing device 150 has communication capabilities and can access wired or wireless networks. The computing device 150 can refer to one of multiple terminals, and those skilled in the art will understand that the number of such terminals can be more or less. In some examples, the computing device 150 can receive multiple sensor data based on the accessed wired or wireless network. It is understood that the computing device 150 undertakes the calculation and processing work after acquiring sensor data in the technical solution of the present disclosure, and the present disclosure does not limit this.

[0175] The computing device in this application may include one or more of the following components: processor 1510 and memory 1520.

[0176] Optionally, the processor 1510 connects to various parts of the computing device using various interfaces and lines, and performs various functions and processes data by running or executing instructions, programs, code sets, or instruction sets stored in the memory 1520, and by calling data stored in the memory 1520. Optionally, the processor 1510 can be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 1510 can integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), Neural-network Processing Unit (NPU), and baseband chip. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content to be displayed on the touch screen; the NPU is used to implement Artificial Intelligence (AI) functions; and the baseband chip is used to handle wireless communication. It is understandable that the aforementioned baseband chip may not be integrated into the processor 1510, but may be implemented as a separate chip.

[0177] The memory 1520 may include random access memory (RAM) or read-only memory (ROM). Optionally, the memory 1520 may include a non-transitory computer-readable storage medium. The memory 1520 may be used to store instructions, programs, code, code sets, or instruction sets. The memory 1520 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), instructions for implementing the various method embodiments described below, etc.; the data storage area may store data created according to the use of the computing device, etc.

[0178] In addition, those skilled in the art will understand that the structure of the computing device described above does not constitute a limitation on the computing device. The computing device may include more or fewer components than shown in the figure, or combine certain components, or have different component arrangements. For example, the computing device may also include a display screen, camera assembly, microphone, speaker, radio frequency circuit, input unit, sensors (such as accelerometer, angular velocity sensor, light sensor, etc.), audio circuit, WiFi module, power supply, Bluetooth module, etc., which will not be described in detail here.

[0179] This disclosure provides a computer storage medium storing a neural network-based human motion recognition program. When the neural network-based human motion recognition program is executed by at least one processor 1510, it implements the steps of the neural network-based human motion recognition method described in the above technical solution.

[0180] This disclosure also provides a computer program product including computer instructions stored in a computer-readable storage medium; a processor 1510 of a computing device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computing device to perform the neural network-based human motion recognition method described in the above embodiments.

[0181] It should be noted that the technical solutions described in this disclosure can be combined arbitrarily as long as they do not conflict.

[0182] The above description is merely a specific embodiment of this disclosure, but the scope of protection of this disclosure is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this disclosure should be included within the scope of protection of this disclosure. Therefore, the scope of protection of this disclosure should be determined by the scope of the claims.

Claims

1. A human motion recognition method based on neural networks, characterized in that, The method includes: First sensing data of known movements is obtained by placing multiple sensors on different parts of the human body; The feature vector of the first sensing data is input into the neural network model. After obtaining the difference vector between the internal channels of the sensor through the fusion optimization network in the neural network model, the sensor weight generation network in the neural network model uses the difference vector and the feature vector to set the corresponding first weight for each channel in each sensor and the corresponding second weight for each sensor. After generating a first feature weight value for each feature based on the feature vector of the first sensing data through the feature weight network in the neural network model, a second feature weight value is generated based on the second weight and the first feature weight value; wherein, the first feature weight value is the local weight of each feature within the corresponding sensor, and the second feature weight value is the global weight of each feature; The classification network in the neural network model obtains the first human action recognition result based on the second feature weight value and the feature vector. The loss value is calculated based on the first human motion recognition result and the known motion corresponding to the first sensing data, and the network parameters in the neural network model are updated based on the loss value to obtain the trained neural network model. The second human motion recognition result is obtained by using the trained neural network model based on the second sensing data to be identified. The step of inputting the feature vector of the first sensed data into the neural network model includes: The first sensing data is divided into multiple data samples by setting a sliding window and a step size; For each of the multiple data samples, extract multiple features of each channel of data in each data sample to obtain the feature vector of each channel; The multiple data samples are divided into multiple batches, and the feature vector corresponding to each batch is input into the neural network model in batch by batch, wherein each batch includes at least one data sample; The step of obtaining the difference vector between channels within the sensor through the fusion optimization network in the neural network model, and then using the sensor weight generation network in the neural network model to set a corresponding first weight for each channel in each sensor and a corresponding second weight for each sensor using the difference vector and feature vector, includes: For the current batch, the feature vector of the current batch and the difference vector generated by the fusion optimization network in the neural network model based on the feature vector of the previous batch are used. The sensor weight generation network in the neural network model sets a corresponding first weight for each channel in each sensor and a corresponding second weight for each sensor. The fusion optimization network in the neural network model generates the difference vector of the current batch based on the first weight and the feature vector of the current batch. The fusion optimization network in the neural network model generates a difference vector for the current batch based on the first weights and the feature vector of the current batch, including: The fusion optimization network in the neural network model includes multiple optimization subnetworks, where each optimization subnetwork corresponds to a sensor; For each optimized subnetwork, based on the feature vector of the current batch, the feature vector of all channels in the sensor corresponding to each optimized subnetwork is obtained as the first input data, and the first weight of all channels in the sensor corresponding to the optimized subnetwork is obtained as the second input data. Based on the first input data and the second input data, a difference vector for each optimized sub-network is generated using a convolutional neural network; The difference vectors generated by all optimized subnetworks are used as the difference vectors for the current batch.

2. The method according to claim 1, characterized in that, The step of generating the difference vector for each optimized sub-network using a convolutional neural network based on the first input data and the second input data includes: Generate a weight difference matrix based on the second input data; The maximum value of each feature vector is extracted from the first input data to obtain the second average feature vector, and the average difference matrix is ​​generated based on the second average feature vector. The maximum value of each feature vector is extracted from the first input data to obtain the second maximum value feature vector, and the maximum value difference matrix is ​​generated based on the second maximum value feature vector. The average difference matrix is ​​input into the first convolutional neural network to generate a first deep feature vector. The maximum value difference matrix is ​​input into the second convolutional neural network to generate a second deep feature vector. The sum of the first depth feature vector and the second depth feature vector is multiplied by the weight difference matrix to obtain the difference vector of each optimized subnetwork.

3. The method according to claim 1, characterized in that, The step of calculating the loss value based on the first human motion recognition result and the known motion corresponding to the first sensing data includes: Based on the first human motion recognition result and the known motion, the loss value is calculated by introducing a sensor channel weight regularization term, a feature weight regularization term, and a sensor intra-channel weight difference regularization term into the loss function, as shown in the following formula: in, (·) represents the cross-entropy loss function. It is the second feature weight vector. It is the first weight vector. Indicates the first k The first sensor i The first channel p The weight values ​​of the second feature corresponding to each feature, Indicates the first k The first sensor i The first weight corresponding to each channel. , , This represents hyperparameters.

4. The method according to claim 1, characterized in that, The step of obtaining a second human action recognition result based on the second sensing data to be identified through the trained neural network model includes: The feature vector of the second sensing data is input into the trained neural network model to obtain the second human motion recognition result.

5. The method according to claim 1, characterized in that, The step of obtaining a second human action recognition result based on the second sensing data to be identified through the trained neural network model includes: Obtain the second feature weight value corresponding to the trained neural network model, and use it as the final feature weight; A feature subset is obtained by filtering based on the set number of features and the final feature weights; Acquire third-party sensing data; The third sensing data is used to extract features based on the feature subset, which is then used as the fourth input data. The fourth input data is input into other pre-trained second neural network models for training to obtain a trained second neural network model; After extracting features from the second sensing data based on the feature subset, the results are input into the trained second neural network model to obtain the second human motion recognition result of the second sensing data.

6. A human motion recognition device based on a neural network, the device comprising: The data acquisition section, the first weight section, the second weight section, the first identification section, the update section, and the second identification section; among which, The data acquisition section acquires first sensing data of known actions through multiple sensors placed on different parts of the human body; The first weighting part is configured to input the feature vector of the first sensing data into the neural network model, obtain the difference vector between the internal channels of the sensor through the fusion optimization network in the neural network model, and then use the difference vector and the feature to set a corresponding first weight and a corresponding second weight for each channel in each sensor through the sensor weight generation network in the neural network model. The second weighting component is configured to generate a first feature weight value for each feature based on the feature vector of the first sensing data through the feature weight network in the neural network model, and then generate a second feature weight value based on the second weight and the first feature weight value; wherein, the first feature weight value is the local weight of each feature within the corresponding sensor, and the second feature weight value is the global weight of each feature; The first recognition part is configured to obtain a first human action recognition result through the classification network in the neural network model based on the second feature weight value and the feature vector; The update part is configured to calculate a loss value based on the first human action recognition result and the known action corresponding to the first sensing data, and update the network parameters in the neural network model based on the loss value to obtain a trained neural network model. The second recognition part inputs the feature vector of the sensor data to be recognized into the trained neural network model to obtain the second human action recognition result; The first weighting component is further configured to divide the first sensed data into multiple data samples by using a set sliding window and step size; for each of the multiple data samples, extract multiple features of each channel of the data in each data sample to obtain a feature vector for each channel; divide the multiple data samples into multiple batches, and input the feature vector corresponding to each batch of the multiple batches into the neural network model in batches, wherein each batch includes at least one data sample; and, For the current batch, the feature vector of the current batch and the difference vector generated by the fusion optimization network in the neural network model based on the feature vector of the previous batch are used. The sensor weight generation network in the neural network model sets a corresponding first weight for each channel in each sensor and a corresponding second weight for each sensor. The fusion optimization network in the neural network model generates the difference vector of the current batch based on the first weights and the feature vector of the current batch; and, The fusion optimization network in the neural network model includes multiple optimization subnetworks, where each optimization subnetwork corresponds to a sensor; For each optimized subnetwork, based on the feature vector of the current batch, the feature vector of all channels in the sensor corresponding to each optimized subnetwork is obtained as the first input data, and the first weight of all channels in the sensor corresponding to the optimized subnetwork is obtained as the second input data. Based on the first input data and the second input data, a difference vector for each optimized sub-network is generated using a convolutional neural network; The difference vectors generated by all optimized subnetworks are used as the difference vectors for the current batch.

7. A computer storage medium storing a neural network-based human motion recognition program, wherein the neural network-based human motion recognition program, when executed by at least one processor, implements the steps of the neural network-based human motion recognition method according to any one of claims 1 to 5.