Millimeter wave radar human continuous action recognition method, model construction method and system

By constructing an initial temporal feature decoupled detection network model and an adaptive speed gating algorithm, the problem of accurate recognition of continuous human movements by radar in complex indoor environments is solved, achieving high-precision, low-false-alarm movement recognition, which is suitable for smart elderly care and medical monitoring.

CN122244945APending Publication Date: 2026-06-19CHINA JILIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA JILIANG UNIV
Filing Date
2026-03-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing radar motion recognition methods struggle to accurately detect and recognize continuous, naturally occurring human motion flows in complex indoor environments. They are also susceptible to interference from static environmental clutter and the micro-Doppler effect of the target, resulting in insufficient robustness in feature extraction.

Method used

An initial temporal feature decoupling detection network model is constructed, including a multi-scale temporal feature extraction module, a feature pyramid network, and a temporal decoupling detection head. Through end-to-end training, accurate segmentation and recognition of continuous actions are achieved. In addition, an adaptive velocity gating algorithm is combined to generate high-quality micro-Doppler time-frequency maps and suppress dynamic clutter interference.

Benefits of technology

It significantly improves the accuracy and robustness of continuous motion recognition in complex indoor environments, reduces the false alarm rate, and provides a high-performance solution with no privacy awareness around the clock, suitable for smart elderly care and medical monitoring scenarios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244945A_ABST
    Figure CN122244945A_ABST
Patent Text Reader

Abstract

This invention discloses a millimeter-wave radar method, model construction method, and system for recognizing continuous human movements. The method first uses millimeter-wave radar to acquire raw intermediate-frequency signals of human movement. Then, it generates a high signal-to-noise ratio time-spectrum map through dynamic clutter suppression based on moving target focusing and adaptive velocity-resolved micro-Doppler feature extraction algorithms. Next, it constructs an initial temporal feature decoupling detection network, which includes a multi-scale temporal feature extraction module, a feature pyramid network sensitive to action boundaries, and a temporal decoupling detection head. Finally, it achieves accurate segmentation and recognition of continuous movements through end-to-end training. This invention effectively solves the technical problem of ambiguous boundaries in continuous movements through a spatial-action dimension decoupling mechanism of temporal features. It achieves high-precision real-time recognition of continuous movements such as falls, getting up, and walking in millimeter-wave radar systems while protecting user privacy, making it suitable for scenarios such as smart elderly care and medical monitoring.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of radar signal processing and artificial intelligence, specifically to a millimeter-wave radar method, model building method and system for recognizing continuous human movements, applicable to fields such as human behavior perception, smart health monitoring, and intelligent security, and particularly to a method for robust continuous movement recognition in complex indoor environments with non-target dynamic interference. Background Technology

[0002] Traditional camera-based human motion recognition technology has inherent drawbacks such as light sensitivity and invasion of personal privacy, limiting its application in dimly lit environments or private settings such as bedrooms and bathrooms. Wearable solutions based on inertial sensors are invasive and rely on user cooperation, resulting in poor compliance among the elderly, rehabilitation patients, and other groups.

[0003] Millimeter-wave radar sensing technology exhibits unique advantages in the field of privacy-preserving intelligent sensing due to its non-contact, image-free, and unaffected by lighting conditions. However, most existing radar action recognition methods only classify isolated, pre-defined single action segments, lacking the ability to automatically segment and recognize continuous, naturally occurring human action flows. Furthermore, radar signals are susceptible to interference from environmental static clutter and the micro-Doppler effect of targets, resulting in insufficient robustness in feature extraction and difficulty in achieving stable and reliable action parsing in complex scenarios. Therefore, how to achieve accurate temporal detection and recognition of continuous human actions while protecting user privacy has become a pressing technical problem in this field. Summary of the Invention

[0004] The purpose of this invention is to provide a millimeter-wave radar method, model construction method and system for continuous human motion recognition, so as to solve the limitations of current non-contact human behavior recognition. The specific technical solution is as follows: In a first aspect, the present invention proposes a method for constructing a millimeter-wave radar human continuous motion recognition model, comprising the following steps: An initial temporal feature decoupling detection network model is constructed. The input of the initial temporal feature decoupling detection network model is a sequence of micro-Doppler time spectrograms, and the output is the segmentation boundary and category label of continuous actions. The initial temporal feature decoupled detection network model is trained end-to-end using a continuous action dataset with time annotations, and the corresponding human action recognition model is output after training is completed. The initial temporal feature decoupling detection network model includes a multi-scale temporal feature extraction module, a feature pyramid network, and a temporal decoupling detection head connected in sequence. The input of the multi-scale temporal feature extraction module is a micro-Doppler time-spectrum map sequence, and the output is a multi-scale spatiotemporal feature map; The input to the feature pyramid network is a multi-scale spatiotemporal feature map, and the output is a multi-scale fused feature map. The input to the temporal decoupling detection head is a multi-scale fused feature map, and the output is the action category probability and the temporal boundary position. As one possible implementation, a two-stage strategy is used to train the initial temporal feature decoupling detection network model end-to-end. The specific training steps are as follows: Phase 1: Pre-train the initial temporal feature decoupling detection network model using isolated action samples to obtain the pre-trained model; The isolated action sample is a micro-Doppler time-spectrum image segment containing a single action category. Each sample is labeled with the action category and a temporal bounding box covering the entire action segment. The second stage involves fine-tuning the pre-trained model using a series of continuous action sequences to obtain the final human action recognition model. The continuous action sequence is a micro-Doppler time-spectrum sequence containing multiple continuous actions, with each sample labeled with the category of each action segment in the sequence and the precise start and end timestamps.

[0005] As one possible implementation, the multi-scale temporal feature extraction module embeds a spatiotemporal separation convolution module to extract spatial and temporal features respectively, outputting a multi-scale spatiotemporal feature map; for a basic unit, the input features are taken at frame t. As input, spatial features are extracted respectively. and time series characteristics And the spatiotemporal features are obtained through adaptive fusion. The calculation process is as follows: in For spatial features, As a time series feature, For the spatiotemporal characteristics after fusion, The input feature of the t-th frame in the micro-Doppler time-spectrum sequence is... It is the Sigmoid activation function. and For learnable weight tensors, The process involves element-wise multiplication; the fused spatiotemporal features The output of this unit is stacked to form a corresponding multi-scale spatiotemporal feature map.

[0006] As one possible implementation, the temporal decoupling detection head includes two parallel branches: an action classification branch and a temporal boundary regression branch. These two parallel branches share the multi-scale fused feature map output by the feature pyramid network as input and each has an independent top-level convolutional layer for predicting action category probabilities. The temporal boundary regression branch is used to predict the temporal boundary location. . As one possible implementation, the loss function used in the second stage of fine-tuning is: in For classifying losses, For bounding box regression loss, For time-boundary loss, , , These are the corresponding weighting coefficients; The first stage involves inverse updates based solely on classification loss and bounding box regression loss. As one possible implementation method, the timing boundary loss For additional boundary refinement loss, a smoothing loss is calculated between the predicted start and end points and the true start and end points to ensure boundary accuracy; for each positive sample, a predicted start time offset is set. and end time offset The corresponding true offset is and ,but Calculated using Smooth L1 Loss: Where N is the total number of positive samples in the current batch, and the SmoothL1 function is defined as:

[0007] Secondly, this invention proposes a millimeter-wave radar method for continuous human motion recognition, comprising the following steps: The original intermediate frequency signal is obtained by real-time acquisition of the intermediate frequency signal of human movement using millimeter-wave radar. The original intermediate frequency signal is preprocessed and a micro-Doppler time spectrum is generated to obtain the corresponding micro-Doppler time spectrum. The obtained micro-Doppler time-spectrum image is input into the human motion recognition model trained by any of the methods described above, and the human motion recognition model outputs the segmentation boundary and category label of continuous motion to obtain the corresponding recognition result.

[0008] As one possible implementation, the signal preprocessing specifically includes: a. Basic signal processing and static clutter suppression: The original intermediate frequency signal is subjected to fast Fourier transform in the range dimension and Doppler dimension, and static background reflection is eliminated by phasor mean cancellation method to generate an initial range-Doppler matrix; the phasor mean cancellation method generates a reference template for static background reflection by calculating the average value of the signal in the slow time dimension of multiple consecutive frames, and subtracts the template from the current frame signal; b. Dynamic clutter suppression: Construct and update the dynamic clutter map in real time. By analyzing the spatiotemporal characteristics of the target in the range-velocity domain, distinguish between human targets and non-human dynamic interference sources. Update the identified non-target interference energy to the dynamic clutter map and cancel it with the initial range-Doppler matrix to obtain the clean range-Doppler matrix. c. Adaptive speed gating: Automatically extracts the effective speed range containing the main human motion energy from the pure distance-Doppler matrix based on a preset energy threshold; d. Time-spectrum generation: Arrange the Doppler information within the effective velocity range along the time axis to generate a micro-Doppler time-spectrum.

[0009] Thirdly, this invention proposes a millimeter-wave radar human continuous motion recognition model construction system, comprising: The building block is used to construct the initial temporal feature decoupling detection network model; The training unit is used to perform end-to-end training on the initial temporal feature decoupled detection network model using a continuous action dataset with time labels, and outputs the corresponding human action recognition model after training is completed.

[0010] Fourthly, this invention proposes a millimeter-wave radar continuous human motion recognition system, comprising: Millimeter-wave radar module: used to collect human motion signals; Signal processing module: preprocesses and extracts features from the original intermediate frequency signal to generate a micro-Doppler time spectrum; Deep learning processing module: equipped with a human motion recognition model trained by the method described in any one of the above claims, for continuous motion recognition.

[0011] Beneficial effects of the embodiments in this application: This invention provides a millimeter-wave radar method, model construction method, and system for continuous human motion recognition. It generates high-quality micro-Doppler time-frequency maps through an innovative adaptive velocity gating algorithm and combines it with a temporal feature decoupling detection network (introducing a temporal feature fusion module and a decoupling detection head) to achieve end-to-end accurate segmentation and recognition of continuous motion streams, directly outputting action segments with timestamps. This method effectively overcomes the limitations of traditional solutions in temporal modeling and action boundary ambiguity. While ensuring all-weather, privacy-aware advantages, it significantly improves the recognition accuracy and robustness of complex continuous actions (such as fall and stand-up sequences). Furthermore, the lightweight model design facilitates deployment in practical edge computing devices, providing a high-performance, full-stack solution for scenarios such as smart elderly care and medical monitoring. In addition, by introducing dynamic clutter suppression based on moving target focusing, this invention effectively filters out non-target dynamic interference such as fans and pets, significantly reducing the false alarm rate in complex indoor environments.

[0012] Of course, implementing any product or method of this application does not necessarily require achieving all of the advantages described above at the same time. Attached Figure Description

[0013] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0014] Figure 1 A flowchart illustrating a method for constructing a millimeter-wave radar human continuous motion recognition model, provided in an embodiment of this application; Figure 2 A schematic diagram of a temporal feature decoupling detection network based on an improved YOLO architecture provided in an embodiment of this application; Figure 3 A flowchart illustrating a millimeter-wave radar continuous motion recognition method based on time-series detection, provided as an embodiment of this application; Figure 4 This is a schematic diagram of the structure of a millimeter-wave radar continuous motion recognition system based on time-series detection, provided in an embodiment of this application. Detailed Implementation

[0015] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0016] Example 1: A method for constructing a continuous human motion recognition model using millimeter-wave radar, such as... Figure 1 As shown, it includes the following steps:

[0017] Step 100: Construct the initial temporal feature decoupling detection network model; Specifically, an initial temporal feature decoupling detection network model based on an improved YOLO architecture is constructed, specifically for temporal segmentation and recognition of continuous actions. Its structure is as follows: Figure 2 As shown, it includes a multi-scale temporal feature extraction module with one connection, an action boundary-sensitive feature pyramid network, and a temporal decoupling detection head. Specifically:

[0018] Step 110: Multi-scale temporal feature extraction module; The input to the multi-scale temporal feature extraction module is a preprocessed, aligned micro-Doppler time-spectrum map sequence, and the output is an enhanced feature map that integrates multi-scale spatiotemporal information. The core of this module is the Spatiotemporally Separable Block (ST-Separable Block), which is used to decouple the extraction of spatial texture features and temporal motion features. For the t-th frame features in the input feature map sequence The processing procedure is as follows: Spatial feature branch: Uses 3×3 standard convolution to capture spatial local texture and shape features within the current frame. Temporal characteristic branches: Centered on a point, features from (2k+1) frames before and after it in temporal sequence (where k is the temporal perception radius) are extracted and concatenated along the channel dimension. These concatenated features are then convolved along the temporal dimension using a 1×3 convolution kernel (height 1, width 3) to explicitly model short temporal dependencies and extract motion change features between frames. Adaptive Feature Fusion: Introducing Learnable Spatial and Temporal Attention Weights and Through the Sigmoid activation function Gated fusion enables the network to adaptively emphasize more important feature dimensions for the current task. in, This indicates element-wise multiplication. The output of this module... As an enhanced spatiotemporal feature, it is passed on to subsequent networks. The aforementioned spatiotemporal separation convolutional blocks are repeatedly stacked in the Backbone to form multiple stages, constructing a multi-scale feature pyramid to capture human motion patterns at different time scales.

[0019] Step 120: Action boundary-sensitive feature pyramid network; The input to the feature pyramid network is a multi-level feature map output by the multi-scale temporal feature extraction module, and the output is a multi-scale fused feature after boundary enhancement. In this embodiment, the module introduces a boundary-aware weight mechanism on top of the traditional Feature Pyramid Network (FPN+PAN) to enhance the feature response at action boundaries. Specifically, it includes: FPN+PAN structure: Fuse high-level semantic information through an upsampling path and fuse low-level detailed information through a downsampling path, such as... Figure 2 The terms "Upsampling path: fusing high-level semantics" and "Downsampling path: fusing low-level details" are shown in the text. Boundary-aware weighting mechanism: Along the feature fusion path, in addition to the usual channel and spatial attention, a boundary sensitivity coefficient is calculated for feature maps from different levels. Specifically, firstly, the inter-frame gradient of the feature map is calculated along the time dimension to obtain a gradient feature map reflecting the intensity of motion; then, a light quantum network consisting of two 1×1 convolutional layers maps the gradient feature map to a boundary sensitivity coefficient; finally, before feature addition or concatenation, this coefficient is used to reweight the original feature map, thereby strengthening the response at the action boundary at the feature level and weakening the features of the stable segments inside the action, making the network more sensitive to the moment of action switching.

[0020] Step 130: Timing decoupling detection head; The input to the temporal decoupling detection head is the multi-scale fusion feature output by the feature pyramid network, and the output is the action category probability and the temporal boundary position. Traditional detection heads couple classification and localization (bounding box regression) together. For sequential action segmentation tasks, such as... Figure 2 As shown in the middle head section, this invention designs a timing decoupling detection head, which includes two parallel and clearly defined sub-branches: Action classification branch: Responsible for predicting the probability of various actions (such as falling, walking, getting up) occurring at each preset anchor point (on the timeline) or at each time position. . Temporal boundary regression branch: Independently responsible for predicting the precise start time of action segments. and end time (Time offset relative to the current detection position), output . The two branches share the underlying features (i.e., the multi-scale fused feature map of the input), but each has its own independent top-level convolutional layer for task-specific feature extraction. The independent convolutional layer in the classification branch focuses on action semantic discrimination, while the independent convolutional layer in the regression branch focuses on precise boundary localization. By adopting a "shared bottom layer + independent top layer" structure, the tasks of action semantic recognition and temporal boundary localization are decoupled, avoiding the problem of mutual interference between the two tasks in traditional coupled detection heads. This allows each branch to focus on its own target, thereby improving both classification accuracy and boundary localization accuracy.

[0021] Step 200: Use the time-labeled continuous action dataset to train the initial temporal feature decoupling detection network model end-to-end, and output the corresponding human action recognition model after training is completed; To achieve progressive learning from isolated actions to continuous actions and effectively address the issues of blurred action boundaries and category confusion in continuous scenes, this embodiment employs a two-stage training approach. The specific training steps are as follows:

[0022] Step 210: Generate the first training data based on the continuous action dataset; The first training data includes isolated samples labeled with time and action, where each isolated sample is a micro-Doppler time-spectrum map corresponding to an isolated action segment (a single fall, walking cycle). Each sample contains a single action category and a bounding box label covering the entire segment.

[0023] Step 220: Isolation exercise pre-training; Based on the first training data, the initial temporal feature decoupling detection network model is pre-trained for isolated actions to obtain the corresponding pre-trained model. Objective: To initialize network parameters so that the backbone and classification branch can learn discriminative features of basic actions. Input: First training dataset (isolated action samples). Loss function: Primarily classification loss is used. and bounding box regression loss This temporarily freezes or weakens the ability of the temporal feature extraction module to capture long-term temporal dependencies. Convergence criteria: After each training epoch, evaluate the model's classification accuracy and boundary regression accuracy on the isolated action validation set. If these metrics no longer improve or the improvement is less than a preset threshold for several consecutive epochs, stop training and save the current model as a pre-trained model.

[0024] Step 230: Prepare the second training data The second training data is generated based on the continuous action dataset. The second training data is the micro-Doppler time spectrum corresponding to the continuous action sequence. Each sample contains the category of all action segments in the sequence and the precise start and end timestamps (such as a long sequence containing "walking → falling down → getting up → walking").

[0025] Step 240: Fine-tuning of continuous action sequence The pre-trained model is trained based on the second training data to obtain the corresponding human motion recognition model. Objective: To adapt the model to continuous action flow, with a focus on optimizing its temporal segmentation capability and its ability to discriminate transitions between actions. Input: Second training dataset (continuous action sequences). Loss function: Unfreeze all network layers and train using the full loss function L: in: For action classification loss, labeled smoothing Focal Loss is used to handle the class imbalance problem. For temporal bounding box regression loss, CIoU Loss is adopted to better measure the overlap and consistency between the predicted action interval and the true labeled interval on the time axis. , , To balance the weighting coefficients, they were set based on experimental experience. =0.5, =0.3, =0.2. To add extra boundary refinement loss, a smoothing loss is calculated between the predicted start and end points and the true start and end points to ensure boundary accuracy. For each positive sample, a predicted start time offset is set. and end time offset The corresponding true offset is and ,but Calculated using Smooth L1 Loss: Where N is the total number of positive samples in the current batch, and the SmoothL1 function is defined as: Convergence criteria: After each training epoch, evaluate the model's average action segmentation accuracy and boundary localization error on a continuous action validation set. When these metrics no longer improve or the validation loss no longer decreases over multiple consecutive epochs, stop training and save the best-performing model on the validation set as the final human action recognition model. Through the above steps, a human motion recognition model that can accurately segment and recognize continuous movements is finally obtained. It should be noted that the initial temporal feature decoupling detection network described in this invention is not limited to a specific version of the YOLO architecture. In fact, the multi-scale temporal feature extraction module, the action boundary-sensitive feature pyramid network, and the temporal decoupling detection head of this invention can be embedded in different versions of the YOLO series, such as YOLOv8, YOLOv11, and YOLOv12, and these modules can also be applied to subsequent versions as the YOLO architecture is updated. This design gives this invention good scalability and adaptability.

[0026] Example 2: A system for constructing a human motion recognition model, used to execute the construction method described in Example 1 above, comprising: The building block is used to construct the initial temporal feature decoupling detection network model; The training unit is used to train the initial temporal feature decoupled detection network model end-to-end using a continuous action dataset with time labels, and outputs the corresponding human action recognition model after training is completed.

[0027] Example 3: Millimeter-wave radar method for continuous human motion recognition Figure 3 As shown, the specific steps are as follows: Step 300: Real-time acquisition of intermediate frequency signals of human motion using millimeter-wave radar to obtain the raw intermediate frequency signal. This embodiment uses a TI IWR1843BOOST millimeter-wave radar module with the following configuration parameters: transmitted waveform linear frequency modulated continuous wave, starting frequency 77GHz, bandwidth 4GHz, one transmitting antenna, four receiving antennas, frame period 100ms, and each frame containing 128 chirp signals. It acquires intermediate frequency signals of human motion in real time.

[0028] Step 400: Based on the original intermediate frequency signal, perform signal preprocessing and generate a micro-Doppler time-spectrum to obtain the corresponding micro-Doppler time-spectrum. In this embodiment, the specific process for preprocessing each frame of the original intermediate frequency signal to generate a micro-Doppler time-spectrum image is as follows:

[0029] Step 410: Basic Signal Processing and Static Clutter Suppression The original intermediate frequency signal is subjected to Fast Fourier Transform in both the range and Doppler dimensions, and static background reflections are eliminated using the phasor mean cancellation method to generate the initial range-Doppler matrix. .

[0030] Step 420: Dynamic Clutter Suppression To address interference from non-target dynamic objects such as fans, swaying plants, and pets, this embodiment introduces a dynamic clutter suppression module based on moving target focusing. The system maintains a dynamic clutter map. This is used to record persistent non-human dynamic interference patterns in the environment. The initial distance-Doppler matrix for the current frame... This method performs rapid classification by analyzing the spatiotemporal characteristics of potential targets. Here, only the energy of cells classified as non-target dynamic disturbances is updated to the dynamic clutter map with a small learning rate. In the middle. Then, joint suppression is performed: ,in This is the dynamic suppression coefficient (ranging from 0.5 to 1.0). This step effectively filters out background dynamic noise and significantly improves the signal-to-noise ratio of human targets.

[0031] Step 430: Adaptive speed gating Distance-Doppler matrix after purification The velocity profile is obtained by summing along the velocity dimension. Based on a preset energy ratio threshold... (Take 0.1), automatically calculate the effective speed range containing the main human motion energy. , The formula is as follows: This step adaptively filters out extremely low-speed (such as air disturbances) and ultra-high-speed (noise) interference, focusing computational resources on speed bands relevant to human motion.

[0032] Step 440: Generation of Micro-Doppler Time Spectrum Each frame is speed-gated (i.e., only [[...]) , The Doppler information within the specified interval is taken as a column vector. These column vectors are arranged sequentially along the slow time axis (frame sequence, i.e., real time) to form a two-dimensional time-velocity grayscale or energy map, i.e., a micro-Doppler time spectrum.

[0033] Step 500: Load the human motion recognition model The obtained time-spectrum image is input into a pre-trained human motion recognition model, which outputs the segmentation boundaries and category labels of continuous actions.

[0034] Example 4: A millimeter-wave radar human continuous motion recognition system, used to implement the continuous motion recognition method described in Example 3 above; like Figure 4 As shown, the system includes: Millimeter-wave radar acquisition module: used for real-time acquisition of raw intermediate frequency signals. Signal processing and feature extraction module: Runs the aforementioned preprocessing algorithm to generate a time-frequency spectrogram. Deep learning recognition module: Equipped with a pre-trained human motion recognition model, it performs inference. Data storage and analysis module: Stores historical action data and supports behavioral pattern analysis. Through the complete hardware and software collaborative solution described above, this invention achieves high-precision and robust temporal segmentation and recognition of continuous human movements while strictly protecting user privacy, providing reliable technical support for scenarios such as smart elderly care, family monitoring, and medical rehabilitation.

[0035] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0036] The above description is merely a specific embodiment of this application, enabling those skilled in the art to understand or implement this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

1. A method for constructing a millimeter-wave radar human continuous motion recognition model, characterized in that, Includes the following steps: An initial temporal feature decoupling detection network model is constructed. The input of the initial temporal feature decoupling detection network model is a sequence of micro-Doppler time spectrograms, and the output is the segmentation boundary and category label of continuous actions. The initial temporal feature decoupled detection network model is trained end-to-end using a continuous action dataset with time annotations, and the corresponding human action recognition model is output after training is completed. The initial temporal feature decoupling detection network model includes a multi-scale temporal feature extraction module, a feature pyramid network, and a temporal decoupling detection head connected in sequence. The input of the multi-scale temporal feature extraction module is a micro-Doppler time-spectrum map sequence, and the output is a multi-scale spatiotemporal feature map; The input to the feature pyramid network is a multi-scale spatiotemporal feature map, and the output is a multi-scale fused feature map. The input to the temporal decoupling detection head is a multi-scale fused feature map, and the output is the action category probability and the temporal boundary position.

2. The method according to claim 1, characterized in that, A two-stage strategy is employed to train the initial temporal feature decoupling detection network model end-to-end. The specific training steps are as follows: Phase 1: Pre-train the initial temporal feature decoupling detection network model using isolated action samples to obtain the pre-trained model; The isolated action sample is a micro-Doppler time-spectrum image segment containing a single action category. Each sample is labeled with the action category and a temporal bounding box covering the entire action segment. The second stage involves fine-tuning the pre-trained model using a series of continuous action sequences to obtain the final human action recognition model. The continuous action sequence is a micro-Doppler time-spectrum sequence containing multiple continuous actions, with each sample labeled with the category of each action segment in the sequence and the precise start and end timestamps.

3. The method according to claim 1, characterized in that, The multi-scale temporal feature extraction module embeds a spatiotemporal separation convolution module to extract spatial and temporal features respectively, outputting a multi-scale spatiotemporal feature map; for a basic unit, the input features are taken at frame t. As input, spatial features are extracted respectively. and time series characteristics And the spatiotemporal features are obtained through adaptive fusion. The calculation process is as follows: in For spatial features, As a time series feature, For the spatiotemporal characteristics after fusion, The input feature of the t-th frame in the micro-Doppler time-spectrum sequence is... It is the Sigmoid activation function. and For learnable weight tensors, The process involves element-wise multiplication; the fused spatiotemporal features The output of this unit is stacked to form a corresponding multi-scale spatiotemporal feature map.

4. The method according to claim 1, characterized in that, The temporal decoupling detection head comprises two parallel branches: an action classification branch and a temporal boundary regression branch. These two parallel branches share the multi-scale fused feature map output by the feature pyramid network as input and each has an independent top-level convolutional layer for predicting action class probabilities. The temporal boundary regression branch is used to predict the temporal boundary location. .

5. The method according to claim 2, characterized in that, The loss function used in the second stage of fine-tuning is: in For classifying losses, For bounding box regression loss, For time-boundary loss, , , These are the corresponding weighting coefficients; The first stage involves inverse updates based solely on classification loss and bounding box regression loss.

6. The method according to claim 5, characterized in that, The time boundary loss For additional boundary refinement loss, a smoothing loss is calculated between the predicted start and end points and the true start and end points to ensure boundary accuracy; for each positive sample, a predicted start time offset is set. and end time offset The corresponding true offset is and ,but Calculated using Smooth L1 Loss: Where N is the total number of positive samples in the current batch, and the SmoothL1 function is defined as:

7. A method for continuous human motion recognition using millimeter-wave radar, characterized in that, Includes the following steps: The original intermediate frequency signal is obtained by real-time acquisition of the intermediate frequency signal of human movement using millimeter-wave radar. The original intermediate frequency signal is preprocessed and a micro-Doppler time spectrum is generated to obtain the corresponding micro-Doppler time spectrum. The obtained micro-Doppler time-spectrum image is input into the human motion recognition model trained by the method described in any one of claims 1 to 6, and the human motion recognition model outputs the segmentation boundary and category label of continuous motion to obtain the corresponding recognition result.

8. The method according to claim 7, characterized in that, The signal preprocessing specifically includes: a. Basic signal processing and static clutter suppression: The original intermediate frequency signal is subjected to fast Fourier transform in the range dimension and Doppler dimension, and static background reflection is eliminated by phasor mean cancellation method to generate an initial range-Doppler matrix; the phasor mean cancellation method generates a reference template for static background reflection by calculating the average value of the signal in the slow time dimension of multiple consecutive frames, and subtracts the template from the current frame signal; b. Dynamic clutter suppression: Construct and update the dynamic clutter map in real time. By analyzing the spatiotemporal characteristics of the target in the range-velocity domain, distinguish between human targets and non-human dynamic interference sources. Update the identified non-target interference energy to the dynamic clutter map and cancel it with the initial range-Doppler matrix to obtain the clean range-Doppler matrix. c. Adaptive speed gating: Automatically extracts the effective speed range containing the main human motion energy from the pure distance-Doppler matrix based on a preset energy threshold; d. Time-spectrum generation: Arrange the Doppler information within the effective velocity range along the time axis to generate a micro-Doppler time-spectrum.

9. A millimeter-wave radar human continuous motion recognition model construction system, characterized in that, include: The building block is used to construct the initial temporal feature decoupling detection network model; The training unit is used to perform end-to-end training on the initial temporal feature decoupled detection network model using a continuous action dataset with time labels, and outputs the corresponding human action recognition model after training is completed.

10. A millimeter-wave radar continuous human motion recognition system, characterized in that, include: Millimeter-wave radar module: used to collect human motion signals; Signal processing module: preprocesses and extracts features from the original intermediate frequency signal to generate a micro-Doppler time spectrum; Deep learning processing module: equipped with a human motion recognition model trained by the method described in any one of claims 1 to 6, for continuous motion recognition.