Gesture wake-up method based on micro-doppler spectrogram and lightweight convolutional neural network

By employing a gesture wake-up method based on micro-Doppler spectra and lightweight convolutional neural networks, the problems of high power consumption and high false alarm rate in millimeter-wave radar gesture recognition systems are solved. This method achieves low-power and reliable gesture wake-up, ensuring intelligent switching between standby and full-operation states and improving the user experience.

CN121934725BActive Publication Date: 2026-06-19UNIV OF ELECTRONICS SCI & TECH OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNIV OF ELECTRONICS SCI & TECH OF CHINA
Filing Date
2026-03-30
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing millimeter-wave radar gesture recognition systems suffer from excessive power consumption due to continuous operation of the recognition model, and excessively high false alarm rates due to simple threshold triggering methods.

Method used

A gesture wake-up method based on micro-Doppler spectra and lightweight convolutional neural networks is adopted. The low-power processor periodically acquires radar data to generate micro-Doppler spectra, and uses a lightweight convolutional neural network model to extract time-frequency features, outputs confidence scores, and combines wake-up thresholds to determine whether to trigger the main processor to wake up.

Benefits of technology

It achieves reliable detection of specific gesture wake-up in low power mode, reduces false alarm rate, and the system intelligently switches between standby mode and full working mode, taking into account both energy saving and the reliability of interactive experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121934725B_ABST
    Figure CN121934725B_ABST
Patent Text Reader

Abstract

This invention discloses a gesture wake-up method based on micro-Doppler spectrograms and a lightweight convolutional neural network, belonging to the field of human-computer interaction technology. The method first acquires signal data using millimeter-wave radar in low-power mode, determines the target distance unit through range FFT, and then performs a short-time Fourier transform on the signal of this unit to generate a detailed micro-Doppler spectrogram. Next, this spectrogram is input into a lightweight convolutional neural network model deployed on a low-power processor for analysis. The model outputs a confidence score for a preset wake-up gesture. Finally, the main processor is only woken up when this score exceeds a preset threshold. This invention solves the problems of high power consumption and high false trigger rate in existing gesture recognition systems by extracting the dynamic "fingerprint" of the gesture and combining it with a specially trained lightweight CNN model, achieving a reliable gesture wake-up function with low power consumption and an extremely low false alarm rate.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of human-computer interaction technology, and more specifically, to a low-power gesture wake-up method based on millimeter-wave radar signal processing and deep learning. Background Technology

[0002] With the widespread adoption of smart devices, gesture recognition, as a natural non-contact interaction method, has received considerable attention. Millimeter-wave (mmWave) radar, due to its advantages such as privacy protection, immunity to light conditions, and ability to accurately detect minute movements, has become a key sensor for realizing gesture recognition.

[0003] Currently, high-precision gesture recognition based on millimeter-wave radar typically relies on complex and computationally intensive deep learning models. To capture gesture commands, these models need to run continuously, performing real-time analysis of the radar data stream. This "always-on" operating mode results in extremely high power consumption, which is unacceptable for battery-powered mobile devices or embedded devices with strict power consumption standards.

[0004] To reduce power consumption, existing technologies have proposed some simple triggering mechanisms, such as setting a motion energy threshold to determine the presence of hand movement. However, this method has a serious drawback: an extremely high false alarm rate. Unconscious body movements by the user, the movement of other objects in the environment, etc., can easily trigger the threshold, causing the main model to be frequently and falsely awakened. This not only fails to achieve the expected energy-saving effect but also seriously affects the user experience.

[0005] Analogous to "voice wake-up" in the field of voice interaction, the field of gesture interaction also urgently needs a reliable "gesture wake-up" mechanism. This mechanism should be able to operate continuously with extremely low power consumption, responding only to one or a few specific "wake-up gestures," while remaining silent to all other unintentional actions, thereby achieving an extremely low false alarm rate. Summary of the Invention

[0006] This invention aims to solve the problems of excessive power consumption caused by continuous operation of the recognition model and excessive false alarm rate caused by the use of simple threshold triggering methods in existing millimeter-wave radar gesture recognition systems, and provides a gesture wake-up method with low power consumption and low false alarm rate.

[0007] To address the aforementioned technical problems, this invention proposes a gesture wake-up method based on micro-Doppler spectroscopy and a lightweight convolutional neural network. This method includes the following steps:

[0008] Step 1: Data Acquisition and Preprocessing;

[0009] In low-power mode, millimeter-wave radar sensors periodically acquire raw radar data; for a frequency-modulated continuous-wave radar, the received intermediate-frequency signal... for:

[0010] ;

[0011] in, Indicates signal amplitude. It is determined by the target distance The generated beat frequency, where S is the sweep slope and c is the speed of light. It is the signal round-trip time delay. Indicates the frequency sweep period. Indicates time, Indicates the initial phase;

[0012] Step 2: Next, the original radar data is processed to generate a micro-Doppler spectrum that can characterize the dynamic features of the target;

[0013] Step 2.1: Perform a one-dimensional fast Fourier transform on the ADC data acquired for each frequency modulation pulse to obtain the target distance information. ;

[0014] Step 2.2: Calculate the energy variance of each distance cell in a short time, and select the distance cell with the largest variance as the target distance cell of interest;

[0015] Step 2.3: Arrange the complex fast Fourier transform results of the target distance cells in chronological order to form a one-dimensional complex time series; perform a short-time Fourier transform on the time series within a preset window, and take the square of its amplitude to generate a two-dimensional micro-Doppler spectrum. The generated micro-Doppler spectrum has time on the horizontal axis and Doppler frequency on the vertical axis. The brightness and color of the image represent the signal energy at the corresponding time and velocity component. The micro-Doppler spectrum finely depicts the speed change of the tiny movements of each part of the gesture over time, forming a dynamic "fingerprint" of the gesture.

[0016] Step 3: Lightweight model inference;

[0017] A pre-trained lightweight convolutional neural network model is used to extract the time-frequency features of the micro-Doppler spectrum, and a confidence score is output based on the time-frequency features.

[0018] Step 4: Awaken the decision;

[0019] The confidence score output by the lightweight convolutional neural network model is compared with a preset wake-up threshold. When the confidence score is greater than or equal to the threshold, it is determined that a valid wake-up gesture has been detected, and the control logic is triggered to activate the main processing unit. When the confidence score is lower than the threshold, it is considered that the current input does not constitute a wake-up condition, and the system continues to maintain a low-power monitoring mode.

[0020] Furthermore, the lightweight convolutional neural network model sequentially includes:

[0021] Input layer: Receives a 64x64x1 image;

[0022] Convolutional layer 1: 16 5x5 convolutional kernels with a stride of 1, using the ReLU activation function; output feature map size is 60x60x16;

[0023] Max pooling layer 1: 2x2 pooling window with a stride of 2; output feature map size is 30x30x16.

[0024] Convolutional layer 2: 32 3x3 convolutional kernels with a stride of 1, using the ReLU activation function; the output feature map size is 28x28x32;

[0025] Max pooling layer 2: 2x2 pooling window with a stride of 2; output feature map size is 14x14x32.

[0026] Flattening layer: Flattens the 14x14x32 feature map into a one-dimensional vector;

[0027] Fully connected layer 1: 64 neurons, using the ReLU activation function;

[0028] Dropout layer: Randomly deactivates neurons with a probability of 0.5 to prevent model overfitting;

[0029] Output layer: 1 neuron, using the Sigmoid activation function, outputting a confidence score between 0 and 1;

[0030] The lightweight convolutional neural network model is trained using binary cross-entropy as the loss function. :

[0031] ;

[0032] in, These are real-world tags: 1 for a wake-up gesture and 0 for a non-wake-up action. This is the probability predicted by the model, which is then optimized using gradient descent with the Adam optimizer.

[0033] Furthermore, the specific method of step 2.1 is as follows:

[0034] ;

[0035] in, Indicates the number of sampling points. This represents the discrete frequency index, where the peak position after the one-dimensional fast Fourier transform corresponds to the distance to the target.

[0036] Furthermore, in step 2.3:

[0037] ;

[0038] in, It is a time-series signal of the target distance unit. It is a window function. This indicates the Doppler frequency.

[0039] A gesture wake-up system includes a low-power processor, a millimeter-wave radar sensor, and a main processor;

[0040] The millimeter-wave radar sensor collects data in real time. A low-power processor uses a gesture wake-up method based on micro-Doppler spectrum and lightweight convolutional neural network to process the collected data. When the confidence score is greater than or equal to the threshold, it is determined that a valid wake-up gesture has been detected, and the control logic is triggered to activate the main processing unit. When the confidence score is lower than the threshold, it is considered that the current input does not constitute a wake-up condition, and the system continues to maintain a low-power monitoring mode.

[0041] This invention enables intelligent switching between "standby mode" and "full working mode", allowing the system to maintain low power consumption most of the time while responding quickly to user operations when a target gesture is actually detected, thus balancing energy efficiency and the reliability of the interactive experience. Attached Figure Description

[0042] Figure 1 This is a schematic diagram of the architecture of a gesture wake-up system provided in an embodiment of the present invention.

[0043] Figure 2 A flowchart of a gesture wake-up method provided in an embodiment of the present invention.

[0044] Figure 3 This is a schematic diagram of the millimeter-wave radar data preprocessing process, illustrating the process of generating micro-Doppler spectra.

[0045] Figure 4 This is a detailed hierarchical structure diagram of the lightweight convolutional neural network model in an embodiment of the present invention. Detailed Implementation

[0046] The technical solution of the present invention will now be described in more detail with reference to the accompanying drawings.

[0047] Reference Figure 1 The gesture wake-up system 100 provided in this embodiment of the invention includes: a millimeter-wave radar sensor 101, a low-power processor 102, and a main processor 103.

[0048] Millimeter-wave radar sensor 101: Employs Infineon BGT60TR13C model 60GHz millimeter-wave radar sensor.

[0049] Low-power processor 102: A microcontroller (MCU) that embeds a lightweight wake-up gesture detection algorithm (i.e., a lightweight CNN model).

[0050] Main processor 103: A high-performance application processor (AP) that is in a sleep state when the system is not woken up.

[0051] Reference Figure 2 The specific process of the method of the present invention is as follows:

[0052] Step S201: Data acquisition;

[0053] The low-power processor 102 controls the BGT60TR13C radar sensor 101 to operate at a low frame rate (e.g., 20Hz) to acquire ADC sampling data of the raw intermediate frequency (IF) signal.

[0054] Step S202: Data preprocessing and micro-Doppler spectrum generation;

[0055] First, a one-dimensional Fast Fourier Transform (FFT) is performed on the intermediate frequency sampling data acquired for each frequency-modulated pulse to obtain the energy distribution of different distance cells. The peak positions in the FFT results correspond to the distance information of the target.

[0056] Subsequently, to reduce interference from static background clutter and distant unrelated motion, the system analyzes the energy changes of each range cell over a short period, calculates the degree of fluctuation, and selects the range cell with the most significant energy change as the target range cell. This ensures that processing resources are concentrated on the area most likely to contain hand movements, thereby improving detection accuracy.

[0057] Next, the complex signals of the selected target range units are arranged in chronological order to form a one-dimensional complex time series. For this time series, the system performs a short-time Fourier transform (STFT) within a preset time window to obtain the frequency distribution of the signal at different time slices, and maps the amplitude energy onto a two-dimensional plane to generate a micro-Doppler spectrum.

[0058] The resulting spectrum represents time on the horizontal axis and Doppler frequency (proportional to hand speed) on the vertical axis, while brightness or color reflects the energy intensity of that speed component at the corresponding moment. The resulting image clearly depicts the trajectory and speed changes of the gesture along the time axis, forming a unique dynamic "fingerprint" that provides reliable input for subsequent inference in lightweight convolutional neural network models.

[0059] Step S203: Lightweight CNN model inference;

[0060] The generated micro-Doppler spectra (which can be normalized and scaled to, for example, a 64x64 pixel grayscale image) are input into a lightweight CNN model deployed on a low-power processor 102. (See reference...) Figure 4 A specific hierarchical structure of this model can be designed as follows:

[0061] Input Layer: Receives 64x64x1 images.

[0062] Convolutional Layer 1 (Conv1): 16 5x5 convolutional kernels with a stride of 1, using the ReLU activation function. The output feature map size is 60x60x16.

[0063] MaxPool1: A 2x2 pooling window with a stride of 2. The output feature map size is 30x30x16.

[0064] Convolutional layer 2 (Conv2): 32 3x3 convolutional kernels with a stride of 1, using the ReLU activation function. The output feature map size is 28x28x32.

[0065] MaxPool2: A 2x2 pooling window with a stride of 2. The output feature map size is 14x14x32.

[0066] Flatten layer: Flattens the 14x14x32 feature map into a one-dimensional vector.

[0067] Fully connected layer 1 (Dense1): 64 neurons, using the ReLU activation function.

[0068] Dropout layer: Randomly deactivates neurons with a probability of 0.5 to prevent the model from overfitting.

[0069] Output layer: 1 neuron, using the sigmoid activation function, outputting a confidence score between 0 and 1. .

[0070] Step S204: Awakening Decision;

[0071] The confidence score output by the model With a rigorously calibrated high threshold (For example, 0.98) for comparison.

[0072] Step S205: System state switching;

[0073] if If a wake-up gesture is detected, the low-power processor 102 wakes up the main processor 103. Otherwise, the system remains in low-power monitoring mode. When the confidence score of the lightweight convolutional neural network model output is higher than a preset wake-up threshold, the low-power processing unit immediately generates a wake-up command and transmits it to the main processor via an internal bus or interrupt signal, thereby switching the main processor from sleep mode to full-speed operation. At this time, the main processor enters working mode and can perform more complex application tasks, such as complete gesture recognition, voice interaction, application startup, or other computationally intensive processing. Conversely, when the confidence score is lower than the threshold, the system determines that no valid wake-up gesture has been detected, the low-power processing unit does not trigger any wake-up signal, and the main processor remains in sleep mode. In this case, the entire system only maintains the low-power radar acquisition and lightweight model monitoring functions, thereby minimizing energy consumption.

Claims

1. A gesture wake-up method based on micro-Doppler spectroscopy and a lightweight convolutional neural network, characterized in that, The method includes the following steps: Step 1: Data Acquisition and Preprocessing; In low-power mode, millimeter-wave radar sensors periodically acquire raw radar data; for a frequency-modulated continuous-wave radar, the received intermediate-frequency signal... for: ; in, Indicates signal amplitude. It is determined by the target distance The generated beat frequency, where S is the sweep slope and c is the speed of light. It is the signal round-trip time delay. Indicates the frequency sweep period. Indicates time, Indicates the initial phase; Step 2: Next, the original radar data is processed to generate a micro-Doppler spectrum that can characterize the dynamic features of the target; Step 2.1: Perform a one-dimensional fast Fourier transform on the ADC data acquired for each frequency modulation pulse to obtain the target distance information. ; Step 2.2: Calculate the energy variance of each distance cell in a short time, and select the distance cell with the largest variance as the target distance cell of interest; Step 2.3: Arrange the complex fast Fourier transform results of the target distance cells in chronological order to form a one-dimensional complex time series; perform a short-time Fourier transform on the one-dimensional complex time series within a preset window, and take the square of its amplitude to generate a two-dimensional micro-Doppler spectrum. The generated micro-Doppler spectrum has time on the horizontal axis and Doppler frequency on the vertical axis. The brightness and color of the image represent the signal energy at the corresponding time and velocity components. The micro-Doppler spectrum finely depicts the speed changes of the minute movements of each part of the gesture over time, forming a dynamic "fingerprint" of the gesture. Step 3: Lightweight model inference; A pre-trained lightweight convolutional neural network model is used to extract the time-frequency features of the micro-Doppler spectrum, and a confidence score is output based on the time-frequency features. The lightweight convolutional neural network model includes, in sequence: Input layer: Receives a 64x64x1 image; Convolutional layer 1: 16 5x5 convolutional kernels with a stride of 1, using the ReLU activation function; output feature map size is 60x60x16; Max pooling layer 1: 2x2 pooling window with a stride of 2; output feature map size is 30x30x16. Convolutional layer 2: 32 3x3 convolutional kernels with a stride of 1, using the ReLU activation function; the output feature map size is 28x28x32; Max pooling layer 2: 2x2 pooling window with a stride of 2; output feature map size is 14x14x32. Flattening layer: Flattens the 14x14x32 feature map into a one-dimensional vector; Fully connected layer 1: 64 neurons, using the ReLU activation function; Dropout layer: Randomly deactivates neurons with a probability of 0.5 to prevent model overfitting; Output layer: 1 neuron, using the Sigmoid activation function, outputting a confidence score between 0 and 1; The lightweight convolutional neural network model is trained using binary cross-entropy as the loss function. : ; in, These are real-world tags: 1 for a wake-up gesture and 0 for a non-wake-up action. This represents the probability predicted by the model, which is then optimized using gradient descent with the Adam optimizer. Indicates the number of sampling points; Step 4: Awaken the decision; The confidence score output by the lightweight convolutional neural network model is compared with a preset wake-up threshold of 0.

98. When the confidence score is greater than or equal to the wake-up threshold, it is determined that a valid wake-up gesture has been detected, and control logic is triggered to activate the main processing unit. When the confidence score is lower than the wake-up threshold, it is considered that the current input does not constitute a wake-up condition, and the system continues to maintain a low-power monitoring mode.

2. The gesture wake-up method based on micro-Doppler spectrograms and lightweight convolutional neural networks as described in claim 1, characterized in that, The specific method for step 2.1 is as follows: ; in, Indicates the number of sampling points. This represents the discrete frequency index, where the peak position after the one-dimensional fast Fourier transform corresponds to the distance to the target.

3. The gesture wake-up method based on micro-Doppler spectrograms and a lightweight convolutional neural network as described in claim 1, characterized in that, In step 2.3: ; in, It is a time-series signal of the target distance unit. It is a window function. This indicates the Doppler frequency.

4. A gesture wake-up system employing the gesture wake-up method as described in claim 1, characterized in that, The system includes a low-power processor, a millimeter-wave radar sensor, and a main processor; The millimeter-wave radar sensor collects data in real time. A low-power processor uses a gesture wake-up method based on micro-Doppler spectrum and lightweight convolutional neural network to process the collected data. When the confidence score is greater than or equal to the wake-up threshold, it is determined that a valid wake-up gesture has been detected, and the control logic is triggered to activate the main processing unit. When the confidence score is lower than the wake-up threshold, it is considered that the current input does not constitute a wake-up condition, and the system continues to maintain a low-power monitoring mode.