A gesture trajectory extraction method and related device
By constructing RDM, RAM, and REM graphs, interference signals are eliminated, and gesture targets are accurately located, solving the problem of inaccurate gesture trajectory acquisition in existing technologies and achieving accuracy and stability in gesture recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA MOBILE GROUP DESIGN INST
- Filing Date
- 2026-02-05
- Publication Date
- 2026-06-12
Smart Images

Figure CN122194082A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of radar detection technology, and more specifically, to a method and related equipment for extracting gesture trajectories. Background Technology
[0002] In recent years, millimeter-wave radar technology has been gradually introduced into the field of gesture recognition, becoming an important means to improve recognition performance. This technology achieves real-time acquisition of gesture information by measuring the reflection and scattering characteristics of millimeter waves between the human body and the environment, providing a new technical path for building efficient and stable gesture recognition systems.
[0003] Position-independent gesture recognition is a gesture recognition technology that is not affected by the relative position of the user's hand and the sensor. Its core is that even if the hand is at different distances, angles or spatial positions from the sensor, the system can still accurately recognize preset gestures (such as waving, clenching a fist, making a heart shape, etc.) without requiring the user to maintain a fixed posture or operate in a designated area.
[0004] Currently, in practical scenarios, one of the pain points of position-independent gesture recognition based on millimeter-wave radar technology is: how to accurately acquire gesture trajectories to ensure the accuracy of gesture recognition results. Summary of the Invention
[0005] This application provides a method for extracting gesture trajectories to solve the problem of how to accurately obtain gesture trajectories in the prior art.
[0006] This application also provides a gesture trajectory extraction device, apparatus, computer-readable storage medium, and computer program product.
[0007] The embodiments of this application adopt the following technical solutions: A method for extracting a gesture trajectory includes: constructing a corresponding range-Doppler map (RDM) containing only the gesture target based on acquired human gesture digital signals; locating a signal data block corresponding to the unique index of the gesture target in the RDM from the human gesture digital signals; constructing a corresponding range-azimuth map (RAM) and a range-pitch map (REM) containing only the gesture target based on the signal data block; and determining the trajectory of the gesture target according to the RDM, the RAM, and the REM.
[0008] A gesture trajectory extraction device includes: an RDM construction unit for constructing a corresponding range-Doppler image (RDM) containing only the gesture target based on acquired human gesture digital signals; a data block localization unit for locating a signal data block corresponding to the unique index of the gesture target in the RDM from the human gesture digital signals; a RAM and REM construction unit for constructing a corresponding range-azimuth image (RAM) and a range-pitch image (REM) containing only the gesture target based on the signal data block; and a trajectory determination unit for determining the trajectory of the gesture target based on the RDM, the RAM, and the REM. A computing device includes: a memory and a processor, wherein, The memory is used to store computer programs; The processor, coupled to the memory, is used to execute the computer program stored in the memory for performing the methods described above.
[0009] A computer-readable storage medium storing a computer program that, when executed by a computer, enables the implementation of the above-described method.
[0010] A computer program product storing instructions that, when executed by a computer, cause the computer to perform the method described above.
[0011] The above-described technical solutions adopted in the embodiments of this application can achieve the following beneficial effects: This problem is addressed through a systematic process of "interference filtering, precise positioning, multi-dimensional mapping, and trajectory fusion": First, an RDM containing only the gesture target is constructed to eliminate non-gesture interference signals at the source; then, based on the unique index of the gesture target in the RDM, the dedicated signal data block is precisely located from the original signal to avoid mixing gesture signals with interference; subsequently, RAM and REM containing only the gesture target are constructed based on this pure signal data block to complete the horizontal and vertical spatial position features of the gesture; finally, the interference-free RDM, RAM, and REM are fused to completely restore the gesture motion features from multiple dimensions such as dynamics, orientation, and pitch, thus completely avoiding the impact of interference on trajectory extraction and achieving accurate acquisition of the gesture trajectory. Attached Figure Description
[0012] The accompanying drawings, which are included to provide a further understanding of this application and form part of this application, illustrate exemplary embodiments and are used to explain this application, but do not constitute an undue limitation of this application. In the drawings: Figure 1a A flowchart illustrating the specific implementation of a gesture trajectory extraction method provided in this application embodiment; Figure 1bThis is a schematic diagram of the initial RDM obtained in the embodiments of this application; Figure 1c This is a schematic diagram of the RDM obtained after background noise suppression according to the embodiments of this application; Figure 1d A schematic diagram of RDM after interference suppression following 2D-CA-CFAR processing; Figure 1e This is a schematic diagram of the RDM (Research Data Model) for target extraction obtained after clustering data points using DBSCAN. Figure 1f This is a schematic diagram of the time-accumulated clusters of multiple targets obtained in the embodiments of this application; Figure 1g This is a schematic diagram of the gesture target RDM extracted in the embodiments of this application; Figure 1h A diagram illustrating the location of the unique index of the gesture target in the RDM; Figure 1i This is a schematic diagram comparing RAM before and after interference suppression; Figure 1j This is a schematic diagram comparing REM signals before and after interference suppression. Figure 1k This is a diagram illustrating the principle of the four-dimensional fast Fourier transform (4D-FFT) in the embodiments of this application; Figure 11 For REM mapping space geometric model (side view); Figure 1m Top view of the RDM / RAM mapped spatial geometry model. Figure 1n This is a diagram showing the comparison of the effects before and after parameter mapping; Figure 1o This is a schematic diagram showing the data acquisition locations during experimental verification. Figure 1p A schematic diagram comparing the recognition performance at different data acquisition locations during experimental verification; Figure 2 A schematic diagram of the specific structure of a gesture trajectory extraction device provided in an embodiment of this application; Figure 3 A schematic diagram of the specific structure of the computing device provided in the embodiments of this application. Detailed Implementation
[0013] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0014] As will be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.
[0015] The terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such terms are interchangeable where appropriate; this is merely a way of distinguishing objects with the same attributes in the embodiments of this application. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion, so that a process, method, system, product, or apparatus that comprises a series of elements is not necessarily limited to those elements, but may include other elements not explicitly listed or inherent to those processes, methods, products, or apparatuses.
[0016] Example 1 Embodiment 1 of this application provides a method for extracting gesture trajectories to solve the problem in the prior art of how to accurately obtain gesture trajectories from scenarios with various interference noises.
[0017] The subject executing this method can be any computing device capable of implementing the method, such as a server, mobile phone, personal computer, smart wearable device, smart robot, etc.
[0018] Different steps of this method can be implemented by the same execution entity or by different execution entities. This application does not limit which execution entity is used to implement the method.
[0019] Furthermore, the embodiments of this application do not limit the execution order of different steps. When using the method provided in the embodiments of this application, the execution order of different steps can be adjusted according to actual needs.
[0020] For ease of description, the following uses a software system for extracting gesture trajectories as the execution subject of this method to provide a detailed description of the method provided in this application embodiment.
[0021] Figure 1 shows a flowchart of a gesture trajectory extraction method provided in this embodiment of the application, which includes the following steps: Step 11: Based on the collected human gesture digital signals, construct the corresponding range-Doppler map (RDM) containing only the gesture target. Human gesture digital signal refers to the digital electrical signal obtained by radar equipment after collecting and converting the electromagnetic wave echo signal reflected during the human gesture. This signal carries physical characteristics such as radial distance, Doppler velocity, angle and phase corresponding to the gesture, and may also contain non-gesture-related signal components such as background noise and static human body reflection. It is the original data basis for extracting gesture trajectory in radar gesture recognition.
[0022] RDM (Radio Directional Mapping) refers to a two-dimensional feature map constructed in radar signal processing, after processing the acquired target echo signal, with radial distance as one dimension and Doppler velocity as the other. The numerical value of each pixel in the map represents the signal energy intensity at the corresponding range-velocity location, which can intuitively reflect the distribution characteristics of the target in the range and velocity dimensions.
[0023] The RDM containing only gesture targets described in this application embodiment is a two-dimensional feature map constructed by filtering out interference from the acquired human gesture digital signals, with radial distance as one dimension and Doppler velocity as another dimension. The map retains only the signal features corresponding to the gesture targets, and removes all interference signals from non-gesture targets such as background noise and static human bodies. The pixel values represent the signal energy intensity of the gesture targets at the corresponding distance-velocity positions, which can accurately reflect the unique distribution characteristics of gesture targets in the distance and velocity dimensions.
[0024] In an alternative implementation, an RDM containing only gesture targets can be constructed in the following manner: First, based on multi-frame human gesture digital signals, a corresponding multi-frame initial RDM is constructed; Then, for each frame of the initial RDM in the multi-frame RDM, background noise suppression and interference suppression are performed sequentially to obtain the multi-frame interference-suppressed RDM; Finally, based on the RDM after multi-frame interference suppression, an RDM containing only gesture targets is constructed.
[0025] Background noise suppression includes, but is not limited to, signal processing methods such as filtering, weighting, and thresholding to weaken or eliminate useless signal components such as random background clutter and inherent device noise in human gesture digital signals that are unrelated to the gesture target. This reduces the masking and interference of noise on the effective gesture signal, improves the signal-to-interference ratio of the gesture target signal, and lays a clean signal foundation for subsequent accurate extraction of gesture target features and construction of relevant feature maps.
[0026] In one alternative implementation, the frame difference method can be used for background noise suppression.
[0027] In an alternative implementation, a constant false alarm rate (CFAR) algorithm, such as 2D-CA-CFAR, can be used for interference suppression.
[0028] In one alternative implementation, the process of constructing an RDM containing only gesture targets based on the multi-frame interference-suppressed RDM may include: The data points in the RDM after interference suppression in different frames are clustered to obtain the RDM for target extraction in each frame; the RDM for target extraction contains data point clusters corresponding to each target. The data point clusters corresponding to the same target in the RDM to be extracted in each frame are merged to obtain each cumulative cluster in the continuous time dimension; the data point clusters are obtained by clustering the data points in the interference-suppressed RDM using a density-based clustering algorithm; Based on the data points contained in the cumulative cluster with the largest variance, the power values of specified data points in the RDM after interference suppression in each frame are set to 0 to obtain an RDM that only contains gesture targets.
[0029] The specified data points are the data points contained in the other cumulative clusters, excluding the cumulative cluster with the largest variance.
[0030] The following explanation is needed regarding the "power value of data points": In various feature maps of radar gesture signal processing (such as RDM, Range-Azimuth Map (RAM), Range-Elevation Map (REM)), the power value of a data point refers to the cumulative signal energy value corresponding to a single pixel in the map. Its magnitude is calculated from the amplitude or phase of the echo signal after processing such as Fourier transform and inter-frame accumulation. It can intuitively reflect the intensity of the gesture target signal at that location and is the core quantitative basis for distinguishing the effective gesture signal from background noise.
[0031] In a specific example, the RDM of the gesture target point can be constructed in the following way: 1.1: First, frequency modulated continuous wave (FMCW) millimeter-wave radar is used to collect digital signals of human hand gestures; FMCW is a radar signal modulation method.
[0032] The principle behind using FMCW millimeter-wave radar to acquire digital signals of human gestures is based on distance and velocity measurement using frequency differences. The specific process is as follows: Signal transmission: The radar transmits a continuous millimeter-wave signal whose frequency changes linearly with time. The frequency change of this signal follows a triangular wave or sawtooth wave pattern.
[0033] Signal reflection: When the emitted millimeter-wave signal encounters the human hand, it is reflected, and the reflected signal carries information such as the spatial position and movement state of the hand.
[0034] Frequency mixing: The radar mixes the transmitted signal with the reflected echo signal, and the frequency difference between the two is converted into an intermediate frequency signal. This frequency difference is directly related to the radial distance between the hand and the radar. If the hand is moving, a Doppler frequency shift will also be introduced, which in turn reflects the speed of the hand movement.
[0035] Analog-to-digital conversion: Converting the analog intermediate frequency signal obtained from mixing into a digital signal to provide raw data for subsequent digital signal processing.
[0036] As can be seen from the above principle, the core result obtained by directly acquiring data using FMCW millimeter-wave radar is a digital intermediate frequency signal sequence containing human gesture features. In this embodiment, the (digital) intermediate frequency signal sequence containing human gesture features is referred to as the human gesture digital signal.
[0037] Considering that the original acquired intermediate frequency signal will be superimposed with background noise, reflection signals from static parts of the human body, etc., this embodiment uses a Hanning window to highlight the effective features of the gesture target. See 1.2 for details.
[0038] 1.2: For each frame of human gesture digital signal obtained after acquisition, mixing and analog-to-digital conversion by FMCW millimeter-wave radar, Hanning windows are applied in the sampling point dimension and pulse dimension respectively to suppress the spectral leakage phenomenon generated in the subsequent Fast Fourier Transform (FFT) process. Specifically, the size of the Hanning window can be determined first based on the actual number of sampling points and pulses collected, and then the Hanning window can be applied to the human gesture digital signal corresponding to each frame of gesture.
[0039]
[0040] In the above formula: This is a two-dimensional signal matrix output after weighted processing using the Hanning window (corresponding to the digital signal of each human gesture in a frame); the dimension of this two-dimensional signal matrix is "number of sampling points × number of pulses".
[0041] (The right side of the formula) is the original human gesture digital signal before the Hanning window is loaded (also a two-dimensional matrix), and the dimension is also "number of sampling points × number of pulses"; The number of sampling points within a single linear frequency modulation pulse period (corresponding to the length of the "sampling point dimension"); The number of consecutively transmitted linear frequency modulated pulses (corresponding to the length of the "pulse dimension"); The length of the Hanning window vector, constructed for the "sampling point dimension," is determined by the number of sampling points. Consistent; This represents the transpose of the Hanning window vector (converting a row vector into a column vector). The Hanning window vector is constructed for the "pulse dimension," and its length is determined by the number of pulses. Consistent.
[0042] 1.3: The two-dimensional signal matrices obtained after applying the Hanning window. Perform a two-dimensional fast Fourier transform (2D-FFT), and then sum and average the results of each transform obtained after performing the 2D-FFT to obtain the initial RDM.
[0043] In this embodiment of the application, the two-dimensional signal matrix Specific implementations of 2D-FFT may include: First, in each linear frequency modulated pulse... Perform a distance-dimensional FFT on each sampling point, and then perform a distance-dimensional FFT on each sampling point. Doppler FFT is performed on a linear frequency modulated pulse.
[0044] It should be noted that, in order to improve the accuracy of target parameter estimation (such as the accuracy of distance and angle measurement), FMCW millimeter-wave radar usually adopts an antenna architecture of "multiple transmit and multiple receive" or "single transmit and multiple receive", and the receiving end will deploy multiple independent antennas (forming a receiving array).
[0045] Each receiving antenna independently receives the echo signal reflected by human gestures. In an optional implementation of this application, the intermediate frequency signal data obtained based on the reflected signal received by each receiving antenna is individually weighted by a Hanning window, and then 2D-FFT processing is performed on the two-dimensional signal matrix obtained after loading the Hanning window, so that a corresponding RDM (i.e., the aforementioned "each transformation result") is finally output for each antenna.
[0046] The initial RDM is obtained by accumulating the RDM corresponding to all receiving antennas and then averaging the results. This initial RDM can be used as the initial RDM of a frame.
[0047] Generally, the received signal of a single antenna may be greatly affected by noise and environmental interference. However, by accumulating the RDM corresponding to all receiving antennas and taking the average, the results of multiple antennas can be fused, thereby canceling out some random interference and making the distance and speed characteristics of the gesture target clearer.
[0048] like Figure 1b The diagram shown is a schematic of the initial RDM obtained. Figure 1b In the diagram, the horizontal axis represents the target's velocity relative to the radar (unit: m / s), and the vertical axis represents the radial distance between the target and the radar (unit: m). The brightness of the colors corresponds to the signal power intensity (higher brightness indicates stronger power). The meanings of each element in the diagram are as follows: 1) Background noise: It appears as a continuous bright line distributed along the "velocity=0" line, covering the entire distance dimension. It is an interference signal formed by radar waves reflected by fixed objects in the environment (such as walls and furniture), and has no dynamic change characteristics.
[0049] 2) Gesture target: The bright area at a distance of about 3m and a speed of about -5m / s in the figure is a signal cluster formed by the reflection of radar waves when the human hand performs an action. It has the dynamic characteristic of "distance-speed continuously changing with time" and is the core target to be extracted in the embodiments of this application.
[0050] 3) Static human targets: The high-brightness area at a distance of about 3m and a speed of 0 is a signal formed by radar waves reflected from stationary parts such as the human torso. It has strong power but no motion characteristics and belongs to static interference that needs to be suppressed.
[0051] 4) Static interference: The low-brightness area at a distance of about 4m and a speed of 0 is an interference signal formed by reflections from other stationary objects in the environment (such as a tabletop), and its power is weaker than that of a static human body target.
[0052] 5) Dynamic interference: The low-brightness area at a distance of about 5m and a speed of about -5m / s is an interference signal formed by reflections from other dynamic objects in the environment (such as swaying curtains). It has dynamic characteristics but its power is weaker than that of the gesture target.
[0053] Depend on Figure 1b As can be seen, without any processing, there will be a lot of interference in the initial RDM, which will cause the gesture target to be presented in the image to be abnormally blurry and the signal-to-noise ratio to be at a low level, which will greatly hinder the subsequent gesture recognition.
[0054] Therefore, in one alternative implementation, the frame difference method, constant false alarm rate (CFAR) detection, and density-based spatial clustering of applications with noise (DBSCAN) algorithm will be further combined to improve the signal-to-noise ratio of the gesture target by analyzing the characteristics of different interferences in RDM and adopting a multi-level interference suppression method.
[0055] The following details how to improve the signal-to-noise ratio of gesture targets by combining frame difference, CFAR detection, and the DBSCAN algorithm. The specific implementation process includes the following steps: 1.4: The frame difference method is used to suppress background noise in the initial RDM; like Figure 1b As shown, the background noise appears as a continuous bright line distributed along the "velocity=0" line, covering the entire distance dimension. It is an interference signal formed by radar waves reflected by fixed objects in the environment (such as walls and furniture), and has no dynamic change characteristics.
[0056] In step 1.4, the frame difference method is used to suppress background noise, which can improve the signal-to-noise ratio of the gesture target to a certain extent.
[0057] The specific implementation process of 1.4 may include the following operations: First, the initial RDM is divided into an m-dimensional vector space (i.e., split into m row vectors of length n according to the distance dimension). Then, calculate the absolute value of the element difference between adjacent row vectors in the m-dimensional vector space, and take the maximum value among them. The formula is:
[0058] In the above formula: This represents the maximum absolute value of the difference between elements of adjacent row vectors in the RDM vector space, and is the basic parameter for subsequent calculation of the step peak detection threshold; In this embodiment, the initial RDM is denoted as matrix X, where X(i,j) represents the signal power value corresponding to the i-th range unit and the j-th velocity unit. The row index i of the matrix corresponds to the "range dimension", i∈[1,m] (m is the number of radar sampling points, each sampling point corresponds to a range unit, i.e., the radial range level between the target and the radar); the column index j of the matrix corresponds to the "velocity dimension", j∈[1,n] (n is the number of radar pulses, each pulse corresponds to a velocity unit, i.e., the Doppler velocity level of the target relative to the radar). This represents the signal power value corresponding to the "i-th range unit and j-th velocity unit". Its value reflects the strength of the gesture echo signal received by the radar under this range-velocity combination (corresponding to "brightness" in the original RDM - the larger the value, the brighter the signal). i∈[1,m 1]: Indicates the index range of the distance unit (i is the row number of the distance dimension). Since the difference between adjacent rows is calculated, the upper limit of i is m. 1 (Avoid exceeding the matrix range); j∈[1,n]: indicates the index range of the velocity unit (j is the column number of the velocity dimension).
[0059] The maximum value obtained from the execution steps Furthermore, the step peak detection threshold is calculated; The specific calculation formula is as follows:
[0060] In the above formula: This is the step peak detection threshold; this threshold is the power critical value used to determine whether a distance-velocity unit in the RDM is "background noise", and its unit is consistent with the RDM signal power unit (usually dBm or linear power unit). This is a threshold adjustment constant, for example, 5.
[0061] In the above formula, the difference between elements in the same column (velocity dimension) of adjacent rows (distance dimension) is compared. It can quantify the magnitude of power change in the distance dimension—the power change of background noise is small (the difference is close to 0), while the power change of real targets (gestures, static human targets) is large (the difference is significant).
[0062] In the calculation Then, for each cell X(i,j) in the RDM, perform the following decision: according to Filter out pure background noise with minimal power changes, i.e.: if the current unit ≤ If it is, it is directly identified as background noise, and the current unit is... The value is set to the level of ambient noise; for values greater than [value missing], unit Then for The value is retained. This reduces the computational load for subsequent clutter power estimation and improves interference suppression efficiency.
[0063] Among them, environmental noise refers to the signal power received by the radar when there are no targets (including human bodies and moving objects) and only random electromagnetic interference, circuit thermal noise, etc. in the environment. It is the "noise reference level" defined in the embodiments of this application.
[0064] In a specific example, the specific value of environmental noise in actual engineering implementation can be determined in the following way: First, in the radar deployment scenario, remove all targets (e.g., clear the test area), collect at least 100 frames of target-free RDM data, calculate the average power value of all elements, and use this average value as the ambient noise level (denoted as ). ); If the scene environment changes (such as adding furniture), a frame of targetless RDM data can be collected every preset time interval (such as 10 seconds) and the calculation can be recalculated. And update it to ensure that the noise benchmark matches the actual environment.
[0065] Typically, the range of ambient noise is related to the performance of radar hardware. In the scenario of the FMCW millimeter-wave radar (operating frequency band 60GHz) used in this application embodiment, the typical ambient noise level is -80dBm to -70dBm (the specific value needs to be determined by actual calibration in combination with the radar's receiving sensitivity, circuit noise figure and other parameters).
[0066] In this embodiment of the application, by less than or equal to of The value is set to the level of ambient noise, which can directly eliminate the power superposition of pure background noise, making the high-power unit corresponding to the real target (gesture, static human target) ( > This is even more pronounced in RDM, significantly improving the power difference (signal-to-noise ratio) between the target signal and noise, providing a clearer signal basis for subsequent target detection.
[0067] Meanwhile, by setting the pure background noise to the level of ambient noise, the power mean of the static area is closer to the "true clutter interference intensity", which can make the clutter power estimation more accurate, thereby reducing the false judgment rate of target detection to be carried out in the subsequent 1.5 and ensuring the accuracy of gesture target extraction.
[0068] The RDM obtained after background noise suppression of the initial RDM in section 1.4 can be called the background noise suppressed RDM.
[0069] like Figure 1c As shown, this is for Figure 1b The initial RDM shown is followed by background noise suppression to obtain the background noise-suppressed RDM. Figure 1cIn the figure, the horizontal axis represents the target's velocity relative to the radar (unit: m / s), and the vertical axis represents the radial distance between the target and the radar (unit: m). The bright areas in the figure correspond to effective targets with strong signal power, while the low-brightness areas represent background noise uniformly set to the ambient noise level.
[0070] Depend on Figure 1c It can be known that: Figure 1b The continuous bright lines (background noise) in the "velocity=0" line, Figure 1c The area in the middle has been uniformly set to the environmental noise level (dark blue area in the figure), and no longer covers the entire distance dimension; Figure 1b The signal clusters corresponding to "gesture target", "static human target", "static interference", and "dynamic interference" in the text are Figure 1c The highlighted area is preserved (at a distance of 3-5m and a speed of -5m / s to 0m / s), and Figure 1c The target outline is separated from the background, and its power characteristics are more prominent; Figure 1b The "static interference" and "dynamic interference" areas in the text are Figure 1c The brightness decreases, and the power difference between the target (gesture, human body) and the core target becomes more significant.
[0071] It can be seen that, through The process of filtering and setting noise units to ambient noise not only removes pure background noise covering all dimensions but also retains signal clusters containing valid targets, providing high signal-to-noise ratio input data for subsequent "2D-CA-CFAR target detection" and "DBSCAN gesture target extraction".
[0072] 1.5: The 2D-CA-CFAR algorithm is used to perform target detection on the RDM after background noise suppression in order to suppress interference targets with low power; The 2D-CA-CFAR algorithm is short for "Two-Dimensional Cell Averaging Constant False Alarm Rate" algorithm. Its core is to distinguish between targets and interference under a constant false alarm probability by "clutter power statistics in a two-dimensional region".
[0073] In this embodiment of the application, the algorithm is used to perform target detection on a two-dimensional signal matrix (RDM after background noise suppression). The principle is as follows: a "two-dimensional reference window" is set for each element to be detected (detected unit) in the two-dimensional signal matrix (RDM after background noise suppression), and the average clutter power in the two-dimensional reference window is statistically analyzed. Based on this, the "real target" and "interference signal" are distinguished, while the "false alarm probability (the probability of misjudging interference as a target)" is kept constant to avoid fluctuations in the detection results caused by changes in environmental noise.
[0074] In this embodiment of the application, the two-dimensional reference window includes three functional areas: (1) Detected unit: a single element in the RDM matrix after background noise suppression that is to be determined as a target (corresponding to the signal power of a certain distance-velocity combination); (2) Training window: A two-dimensional region centered on the detected unit (such as a “5×5” unit matrix) is used to calculate the power mean of all elements in the region as a “clutter power estimate” (representing the background interference intensity around the detected unit). (3) Guard window: A narrow band region between the detected unit and the training window to prevent the power value of the detected unit itself from participating in the training window statistics, thus preventing the high power of the real target from interfering with the accuracy of clutter power estimation.
[0075] In section 1.5, the specific implementation process of using the 2D-CA-CFAR algorithm for target detection on the background noise-suppressed RDM includes: First, use training window statistics The power value of each training unit Then, calculate the clutter power estimate using the following formula:
[0076] in: This represents the estimated clutter power value corresponding to the currently detected unit; (m,n) represents the position of the detected unit in the RDM after background noise suppression; Indicates the first training window i The power value of each training unit; This indicates the total number of training units within the training window. The value is determined by the two-dimensional size of the training window (for example, if the training window is set to a "5 rows × 5 columns" area, then the total number of training units is [value missing]. =25; if the training window is "7 rows × 3 columns", then =21).
[0077] Based on the above formula, by... The average power of each training unit can reflect the intensity of background interference around the detected unit.
[0078] It needs to be explained that: The "detected element" is the "currently processed element" when traversing the RDM matrix (e.g., starting from the first row and first column of the RDM matrix, each element is treated as a detected element for detection). Each detected element corresponds to a dedicated "two-dimensional reference window" (including a training window and a guard window). For each detected element, the algorithm calculates its clutter power estimate separately according to the formula described above. Then, the target / interference determination is completed. For example, when the detected unit is the element corresponding to "distance = 3m, velocity = -5m / s" in RDM, the algorithm will construct a reference window centered on the element, estimate the clutter power around it, and finally determine whether the position is a gesture target.
[0079] A “training unit” refers to all the RDM matrix elements contained within the “training window region” of a two-dimensional reference window, and is the basic data unit used to estimate clutter power. Each training unit corresponds to an element in the RDM (range-Doppler diagram) matrix, representing the signal power value under a certain “range unit-velocity unit” combination.
[0080] Based on the above formula, in section 1.5, the clutter power estimate of the elements (detected cells) in the RDM matrix after background noise suppression is calculated. Then, compare the power values of the elements with... : If the power value of the current element (the currently detected unit) is > Therefore, the current element (the currently detected unit) is determined to be the real target, and its power value is retained; If the power value of the current element (the currently detected unit) is ≤ If the current element (the currently detected unit) is identified as an interference target, its power value is set to 0.
[0081] After performing 2D-CA-CFAR processing on the RDM with background noise suppression (1.5), the resulting RDM can be called the interference-suppressed RDM.
[0082] In a specific example, the RDM after interference suppression processed by the 2D-CA-CFAR algorithm is as follows: Figure 1d As shown.
[0083] Figure 1d In the diagram, the horizontal axis represents the target's velocity relative to the radar (unit: m / s), and the vertical axis represents the radial distance between the target and the radar (unit: m). The highlighted areas in the diagram correspond to the real targets that have been detected and determined, while the low-brightness areas represent the suppressed interference targets.
[0084] Depend on Figure 1d It can be known that: The highlighted clusters (including green and yellow areas) in the image, located approximately 3m apart and with speeds ranging from -5m / s to 0m / s, are identified as "gesture targets + static human targets" using the 2D-CA-CFAR algorithm—compared to... Figure 1c The outlines of these targets are more focused, and the boundaries with the interference are clearer.
[0085] Figure 1c Low-power interference (such as weak signals at a distance of 4m and a speed of -5m / s) exists in the system. Figure 1d The middle has been completely suppressed (corresponding to) Figure 1d (The dark blue area in the image) retains only valid targets with power significantly higher than clutter levels.
[0086] It is evident that by using "two-dimensional clutter power estimation + target / interference determination", we can accurately filter out interference targets with lower power in RDM, further purify the signal, make the characteristics of gesture targets more prominent, and provide high-purity input data for subsequent DBSCAN clustering extraction of dynamic gesture targets.
[0087] 1.6 The DBSCAN algorithm is used to suppress interference signals in the RDM after interference suppression in order to obtain the RDM for target extraction.
[0088] In version 1.6, the density-based noise applied spatial clustering (DBSCAN) algorithm is used to cluster the RDM after interference suppression, with the goal of suppressing residual interference signals and extracting gesture targets.
[0089] Before executing the algorithm, the hyperparameter is set in advance: neighborhood radius. ε =2.5 (the distance threshold for measuring the "neighborhood" of data points in the RDM after interference suppression), MinPts = 7 (the minimum amount of data to determine the core point).
[0090] Based on the above settings, the specific implementation process of version 1.6 is as follows: First, the DBSCAN algorithm is run to perform a core point search: The algorithm iterates through all data points in the interference-suppressed RDM and identifies points with a neighborhood count ≥ MinPts as core points. The "neighborhood count" is based on the two-dimensional data characteristics of the RDM: each data point in the interference-suppressed RDM corresponds to a two-dimensional coordinate (r, v), where r represents distance (m) and v represents velocity (m / s). "Neighborhood" refers to a region centered on the current data point (a valid data point requires a non-zero power value) with a radius equal to the nearest integer. ε=2.5 is a two-dimensional region. "Number of data points in the neighborhood" is the total number of data points (data points are valid data points, and their power values must be non-zero) that fall within this region.
[0091] Then, cluster partitioning is performed: Iterate through each core point and group its "density-accessible" (points within the core point's neighborhood) and "density-reachable" (points connected through a core point chain) points into the same cluster. A density-accessible point is defined as follows: if point B is within the neighborhood of core point A, then point B is a density-accessible point originating from point A; a density-reachable point is defined as follows: if there exists a core point chain A1→A2→...→Ak, where each Ai is a core point, and point B is a density-accessible point originating from Ak, then point B is a density-reachable point originating from A1.
[0092] Finally, noise filtering is performed: Points not assigned to any cluster are identified as noise, and their power values are set to 0 to filter out interference.
[0093] After performing the above operations, the interference-suppressed RDM can be converted into an "RDM for target extraction".
[0094] In a specific example, the RDM to be used for target extraction is as follows: Figure 1e As shown.
[0095] Figure 1e In the diagram, the horizontal axis represents the target's velocity relative to the radar (unit: m / s), and the vertical axis represents the radial distance between the target and the radar (unit: m). Different colored blocks in the diagram represent different target clusters, and "×" represents interference points that are identified as noise and filtered out.
[0096] Figure 1e The different colored clusters (red, green, cyan, and purple) represent target clusters partitioned by the DBSCAN algorithm based on the "density reachability" rule. Specifically, the red and green clusters correspond to gesture targets and static human targets within a 3m distance, while the cyan and purple clusters correspond to residual interference clusters at a distance of 4-5m.
[0097] The "×" in the figure represents noise points that have not been assigned to any cluster. These points are sparse interference signals and have been identified as noise by the algorithm and set to 0 for filtering. They will no longer participate in subsequent gesture feature analysis.
[0098] Combining the characteristic of the gesture target's "distance-velocity dynamically changing over time", it can be known that... Figure 1e The red and green clusters in the data are concentrated in the typical distance (3m) and speed (-5m / s to 0m / s) range of gesture actions, and will eventually be identified as the gesture target clusters to be extracted, providing accurate target data for subsequent gesture action recognition.
[0099] 1.7: Based on the RDM of multiple frames to be extracted by the DBSCAN algorithm, the RDM containing only the gesture target is extracted (called the gesture target RDM). Specifically, for the multi-frame RDM to be extracted by the DBSCAN algorithm, the inter-frame accumulation and fusion process is performed: the clusters of the same type in the RDM to be extracted from different frames are merged to generate the cumulative cluster of the target in the continuous time dimension (temporal cumulative cluster); the variance of each temporal cumulative cluster is calculated, and the cluster with the largest variance is determined as the gesture target; the other points in the multi-frame RDM to be extracted that correspond to the gesture target are set to 0, so as to obtain the RDM that only contains the gesture target (called the gesture target RDM).
[0100] It should be noted that gesture targets have dynamic motion characteristics over a continuous period of time, and their corresponding time accumulation clusters exhibit a large variance (variance can measure the degree of motion fluctuation of a target over a continuous period of time) in terms of statistical characteristics. The motion characteristics of other interfering targets are significantly different from those of gesture targets, and the statistical characteristics of their time accumulation clusters are also different from those of gesture targets. Therefore, by comparing the variance of the time accumulation clusters, it is possible to effectively distinguish gesture targets from other interfering targets.
[0101] In this embodiment of the application, a single-frame cluster refers to the target cluster obtained by clustering a single-frame RDM using the DBSCAN algorithm (e.g., ...). Figure 1e Clusters represented by a single color in RDM only reflect the position and density characteristics of a target at a certain moment. Temporal cumulative clusters refer to continuous clusters formed by accumulating and fusing single-frame clusters belonging to the same target in multiple frames of RDM over time. They are a collection of the target's motion characteristics over multiple frames.
[0102] In a specific example, in 1.7, firstly, the clustering results in the RDM (RDM to be extracted) of multiple consecutive frames can be matched to identify single-frame clusters belonging to the same target in each frame (for example, green clusters near a distance of 3m and a speed of -5m / s in 5 consecutive frames all correspond to the same gesture target). Then, single-frame clusters corresponding to the same target are merged along the time dimension to obtain the time-cumulative cluster of that target. This time-cumulative cluster contains all dynamic characteristics of the target, such as position and power, over a continuous time period. For example... Figure 1f The diagram shown is a schematic representation of the time-accumulated clusters of multiple targets.
[0103] By calculating the variance of the cumulative clusters at each time point, the cluster with the largest variance is identified as the gesture target (e.g., ...). Figure 1f Cluster1 in the image), and the corresponding gesture target RDM, such as Figure 1g As shown.
[0104] Figure 1f In the graph, the horizontal axis represents the target's velocity relative to the radar (unit: m / s), and the vertical axis represents the radial distance between the target and the radar (unit: m). Different colored blocks represent different time-accumulated clusters, and the total variance value corresponding to each cluster is marked in the upper right corner. Figure 1f It can be known that: The total variance of the red cluster (Cluster1) is 95.62, which is the largest among all clusters, corresponding to the gesture target.
[0105] The variances of the green cluster (Cluster2), purple cluster (Cluster3), and cyan cluster (Cluster4) are 4.26, 3.15, and 7.03, respectively, with relatively small fluctuations, corresponding to static human targets or interference targets.
[0106] The red clusters (gesture targets) exhibit a clear dynamic diffusion distribution within a speed range of -4m / s to 4m / s and a distance range of 2m to 3m, reflecting the motion fluctuations of the gesture over a continuous period of time; the remaining clusters show a concentrated and stable distribution, consistent with the characteristics of static targets or interference.
[0107] Figure 1g The image shows the hand gesture targets extracted after variance filtering. The horizontal axis represents the target's velocity relative to the radar (unit: m / s), and the vertical axis represents the radial distance between the target and the radar (unit: m). Only the red cluster (hand gesture targets) with the largest variance is retained in the image, and all other clusters are filtered out.
[0108] Depend on Figure 1g As can be seen, in the final RDM of the gesture targets, only the gesture targets with the largest variance in the time cumulative cluster are retained (highlighted in yellow). All static targets and interfering targets have been completely suppressed, the background is a uniform dark blue, and the target features are highly focused. The retained gesture target clusters fully present their motion trajectory and fluctuation characteristics over continuous time, providing high-purity target data for subsequent gesture recognition.
[0109] It should be further explained that, in a specific example, the aforementioned process of "matching the clustering results in the RDM (the RDM to be extracted for target extraction) of multiple consecutive frames to identify single-frame clusters belonging to the same target in each frame" may include: First, perform single-frame cluster feature extraction: For each single-frame cluster obtained by DBSCAN clustering of each frame RDM (the RDM to be extracted), the following core features are extracted as matching criteria: Cluster center coordinates: The coordinates of the cluster center are obtained by taking the average distance and velocity of all data points within the cluster. , ),in This is the cluster number for the current frame.
[0110] Cluster average power: The average power of all data points within a cluster, reflecting the target's reflection intensity.
[0111] Cluster density: The number of data points within a cluster, reflecting the density of effective signal points of the target.
[0112] Then, inter-frame cluster association matching is performed. For example, for the clusters in frame t and frame t+1, the "distance-first + feature-assisted" matching rule is used for inter-frame cluster association matching. Calculate the cluster of frame t using the following formula. Clusters with frame t+1 Central Euclidean distance:
[0113] like ≤ δ ( δ If a preset spatial threshold is used (e.g., 1.5), then the cluster of frame t is determined. Clusters with frame t+1 These are potential matching pairs.
[0114] Then, feature consistency verification is performed: for potential matching pairs, the rate of change of cluster average power and cluster density is further verified. If the rate of change of power And density change rate If so, it is confirmed that the two clusters (potential matching pairs) belong to the same target.
[0115] Furthermore, it should be noted that, in a specific example, the "variance of the time-cumulative cluster" mentioned above can be calculated in the following way: First, extract the location sequence of the time-accumulated cluster: For each time-accumulated cluster, extract its cluster center coordinate sequence across multiple consecutive frames: {( , ), ( , ), ..., ( , ))};where T is the cumulative number of frames, ( , ) represents the center distance and velocity of the cluster in frame t.
[0116] Then, calculate the variances of the distance and velocity dimensions: The specific calculation formula is as follows:
[0117]
[0118] in, It is the mean distance between the centers of all frame clusters. It is the average of the center velocities of all frame clusters.
[0119] Finally, the variances of the distance dimension and the velocity dimension are summed to obtain the total variance of the time-cumulative cluster. For example, the specific calculation method could be: .
[0120] 1.8: Extract static human targets from the RDM (Real Data Model) to be extracted; In section 1.8, based on the characteristics of human targets—"large reflective area and highest corresponding power value in RDM"—the cluster with the highest average power of the points (i.e., the core points constituting the clusters mentioned above) included in the RDM containing the target to be extracted is selected as the human static target. The power value of other points besides the human static target can be set to 0. In this way, an RDM containing only human static targets can be obtained, which is called the human static target RDM.
[0121] By executing steps 1.1 to 1.8 above, we can see that we eventually obtain an RDM that only contains gesture targets (referred to as the gesture target RDM) and an RDM that only contains static human targets.
[0122] Step 12: Based on the unique index of the gesture target in the RDM containing only the gesture target, locate the signal data block corresponding to the unique index from the human gesture digital signal; In one optional implementation, after obtaining the RDM of the gesture target, a cumulative averaging method can be used to extract the unique index of the gesture target in the RDM—this index represents the position of the gesture target in the current frame in the RDM matrix, as illustrated in the diagram below. Figure 1h As shown.
[0123] Figure 1h In the diagram, the horizontal axis represents the target's velocity relative to the radar (unit: m / s), and the vertical axis represents the radial distance between the target and the radar (unit: m). The only highlighted convergence point in the diagram is the position of the gesture target in the current frame.
[0124] Figure 1h In the image, the bright convergence point at a distance of approximately 3m and a velocity of approximately -5m / s is a unique index of the gesture target extracted from the RDM using a cumulative averaging method. This index represents the precise location of the gesture target in the range-Doppler dimension in the current frame. The uniform dark blue background of the image indicates that all interfering targets and non-gesture targets have been completely suppressed, retaining only the valid signal points of the gesture target. This provides a precise index for subsequent target original signal localization and the execution of 2D-FFT in the antenna plane dimension.
[0125] It should be noted that, in one optional implementation, the specific implementation method of "extracting the unique index of the gesture target in the RDM of the gesture target by means of cumulative averaging" may include: Step 1: Spatial accumulation within a single frame RDM: For an RDM that contains only the gesture target in a single frame, the grayscale value (power value) of all pixels (data points) in the distance-Doppler two-dimensional space of the gesture target is accumulated within the range of the effective data points (power non-zero points) of the gesture target.
[0126] The core function of this step is to gather the effective signal energy of the gesture target scattered in space, highlight the core area of the gesture target, and suppress residual sporadic noise in a single frame.
[0127] Step 2: Inter-frame averaging of multi-frame RDM: After spatial accumulation of the RDM results for multiple consecutive frames, perform inter-frame averaging calculation—sum the accumulated values at corresponding positions in each frame and divide by the total number of frames to obtain an "averaged RDM".
[0128] The core function of this step is to smooth out the minute positional fluctuations of the gesture target over a continuous period of time, eliminate the randomness of single-frame data, and make the stable core position (distance-velocity coordinates) of the gesture target clearly visible.
[0129] Step 3: Extract the unique index: From the averaged RDM, locate the pixel (data point) with the largest energy (power mean) after cumulative averaging. The distance-Doppler 2D coordinates corresponding to this point are the unique index of the gesture target.
[0130] Step 13: Based on the signal data block located corresponding to the unique index of the gesture target in the RDM, construct the corresponding RAM containing only the gesture target and REM containing only the gesture target; In this embodiment of the application, after extracting the unique index of the gesture target, the two-dimensional signal matrices obtained after loading the Hanning window can be used as described in section 1.3 above. Based on this unique index, from this two-dimensional signal matrix... Locating the signal data block corresponding to that unique index—first, based on a two-dimensional signal matrix. "Distance-velocity coordinates" established when performing distance-Doppler 2D-FFT The matrix row number mapping relationship converts the unique index into a two-dimensional signal matrix. The specific row number is then determined; subsequently, all column data corresponding to that row number in the matrix are extracted, and the resulting data set is the signal data block corresponding to that unique index. Furthermore, a two-dimensional fast Fourier transform (2D-FFT) is performed on the two-dimensional antenna plane dimension of the located data block. The transform result is separated into amplitude information and phase information, thus obtaining the radar amplitude map (RAM) and radar phase map (REM) of the gesture target. The specific implementation process is as follows: First, a range-dimensional fast Fourier transform is performed on the data block. For the echo signal corresponding to each antenna element in the data block, a one-dimensional FFT transformation is performed in the dimension of the sampling points of its linear frequency modulation pulse, which transforms the signal from the time domain to the range domain, preserves the feature information of the gesture target in the radial range dimension, and obtains the signal matrix in the range domain-two-dimensional antenna plane dimension. Then, for the signal matrix in the range domain-two-dimensional antenna plane dimension, perform a two-dimensional antenna plane dimension fast Fourier transform. Based on the arrangement of the radar antenna array, perform two-dimensional FFT transforms in the azimuth angle to the antenna array element dimension and the elevation angle to the antenna array element dimension in the two-dimensional antenna plane dimension, convert the signal from the antenna domain to the angle domain, calculate the feature information of the gesture target in the azimuth and elevation dimensions, and obtain the three-dimensional angle domain signal matrix of range-azimuth-elevation angle. Finally, from the above three-dimensional angle domain signal matrix, the signal amplitude information in the range-azimuth dimension is extracted to construct a RAM containing only the gesture target; the signal phase information in the range-pitch dimension is extracted to construct a REM containing only the gesture target.
[0131] In this embodiment, a signal data block containing only the gesture target is accurately located using a unique index. Subsequent 2D-FFT calculations in the angular dimension are performed only on this data block, eliminating the need for full calculation of the entire two-dimensional signal matrix, thus effectively reducing the computational load of the algorithm. Simultaneously, since all data from interfering targets in the RDM has been filtered out when locating the signal data block, the subsequently generated radar amplitude map (RAM) and radar phase map (REM) naturally contain only valid information about the gesture target, eliminating the need for additional de-interference steps on the RAM and REM, and simplifying the processing flow.
[0132] Please refer to the attached instruction manual. Figure 1i The left side of the image shows the original radar amplitude map (RAM), which is the original radar amplitude map without interference suppression. The vertical axis represents the radial distance between the target and the radar (unit: m), and the horizontal axis represents the azimuth angle of the target (unit: °). The image contains three types of signals: background noise: weak clutter distributed at the top of the image, which is the environmental interference signal received by the radar; static human targets: continuous bright stripes within a distance of 3m and an angle of -10° to 10°, which are the reflected signals of stationary human bodies and belong to non-gesture interference targets; gesture targets: local bright areas near a distance of 2m and an angle of 0°, which are the reflected signals of dynamic gestures and are the target signals for this processing.
[0133] Instruction manual attached Figure 1i The right side shows the RAM suppression results. Compared to the left side, background noise, static human targets, and other interference signals have been completely suppressed, retaining only the faint bright spots of the gesture target, while the background appears as a pure deep blue. This result verifies that the technical solution of this application can accurately filter out non-gesture interference and effectively extract gesture targets from RAM.
[0134] Please refer to the attached instruction manual. Figure 1j The left side of the image shows the original radar phase image (REM). This is the original radar phase image without interference suppression. The vertical axis represents the radial distance between the target and the radar (in meters), and the horizontal axis represents the azimuth angle of the target (in degrees). The image also contains three types of signals: background noise: weak clutter distributed at the top of the image; static human targets: continuous colored stripes within a distance of 3 meters and an angle of -20° to 20°, with stable phase characteristics, belonging to non-gesture interference targets; gesture targets: local colored areas near a distance of 2 meters and an angle of 0°, with phase dynamically changing with gestures, representing the target signal being processed in this case.
[0135] Instruction manual attached Figure 1j On the right is the radar phase diagram after the interference suppression processing of this application. Background noise, static human targets, and other interference signals have been completely suppressed, retaining only a faint bright spot of the gesture target, with the background appearing a pure deep blue. This result verifies that the technical solution of this application can accurately filter out non-gesture interference and achieve effective extraction of gesture targets in REM.
[0136] Appendix Figure 1i and attached Figure 1j These two sets of figures visually compare the interference suppression effect of the technical solution in this application on RAM and REM: Interference is completely filtered out: background noise, static human targets, and other interference in the original RAM and REM are completely eliminated, leaving only the valid signal of the gesture target.
[0137] Target feature preservation: The positional features of the gesture target are fully preserved in the suppressed result, providing a high-purity amplitude and phase feature basis for subsequent gesture recognition.
[0138] Technical solution verification: The effectiveness of the "unique index positioning + angle domain 2D-FFT" technical link of this application has been verified. Pure gesture target features can be obtained without additional interference removal processing of RAM and REM.
[0139] Please refer to the attached instruction manual. Figure 1k The diagram below illustrates the principle of the four-dimensional fast Fourier transform (4D-FFT) in the embodiments of this application.
[0140] This figure is a visual illustration of the principle behind the generation of radar amplitude map (RAM) and radar phase map (REM) using four-dimensional fast Fourier transform (4D-FFT) in this application. Its core is to extract the full-dimensional features of the gesture target from the four-dimensional original signal through two two-dimensional fast Fourier transforms (2D-FFT), as detailed below: 1. Left side: Input basis – Distance-Doppler graph (RDM) containing only the gesture target This figure shows the results after interference suppression and target extraction. The horizontal axis represents velocity, and the vertical axis represents distance.
[0141] The purple pixels in the image are unique indexes of the gesture target extracted through cumulative averaging. They represent the precise location of the gesture target in the distance-Doppler dimension and are the starting point for all subsequent processing.
[0142] 2. Middle: Core Processing – Indexing and Dimensional Decomposition of the Four-Dimensional Signal Matrix The blue and green cubes in the figure represent the visualization of the two-dimensional signal matrix after loading the Hanning window in four-dimensional space, including four dimensions: distance, velocity, azimuth, and pitch.
[0143] Based on the purple pixels (unique index) in the left RDM, locate the azimuth dimension data block (red strip) and pitch dimension data block (yellow strip) in the blue and green cubes, respectively.
[0144] The core function of this step is to accurately delineate the original signal of the angle dimension corresponding to the gesture target from the four-dimensional signal, providing a clean input for the subsequent two-dimensional fast Fourier transform (2D-FFT) in the angle domain.
[0145] 3. Right side: Output results – RAM / REM generated by angular domain 2D Fast Fourier Transform (2D-FFT). Perform a Fast Fourier Transform (FFT) on the azimuth dimension data block of the red strip to obtain the range-azimuth map at the top right. The amplitude information of this map constitutes the azimuth component of the radar amplitude map (RAM), and the phase information constitutes the azimuth component of the radar phase map (REM).
[0146] Perform a Fast Fourier Transform (FFT) on the elevation angle dimension data block of the yellow strip to obtain the range-elevation angle map at the bottom right. The amplitude information of this map constitutes the elevation angle component of the radar amplitude map (RAM), and the phase information constitutes the elevation angle component of the radar phase map (REM).
[0147] The combination of amplitude and phase information from the two angular domain maps constitutes the final radar amplitude map (RAM) and radar phase map (REM), which fully characterize the features of the hand gesture target in four dimensions: range, velocity, azimuth, and elevation.
[0148] Step 14: Determine the trajectory of the gesture target based on the RDM containing only the gesture target, the RAM containing only the gesture target, and the REM containing only the gesture target.
[0149] In one alternative implementation, the coordinate-normalized alignment of the RDM containing only the gesture target, the RAM containing only the gesture target, and the REM containing only the gesture target can be performed, and then the coordinate-normalized and aligned RDM, RAM, and REM can be fused to obtain the gesture trajectory.
[0150] In a specific example, step 14 may include the following sub-steps: Sub-step 141: Map the REM containing only the gesture target to a standard coordinate system with a predefined human body position (e.g., radial distance 3m, azimuth angle 0°); By executing sub-step 141, the positional deviation caused by the different actual positions of the human body to subsequent gesture recognition can be eliminated, and the gesture features across positions can be unified.
[0151] In sub-step 141: First, parameter extraction is performed: the radial distance R1 and pitch angle θ1 of the gesture target are extracted from the REM containing only the gesture target, and the radial distance L1 and pitch angle γ are extracted from the REM containing only the static human target, and the horizontal projection distance L1′ of the human target is obtained; the "REM containing only the static human target" can be constructed based on the "RDM containing only the static human target" in a similar way to the construction of the "REM containing only the gesture target" introduced above, which will not be repeated here.
[0152] Then, perform geometric mapping: using Figure 11 The similar triangles and spatial projection relationships shown in (Explanation of the spatial geometric relationship of REM mapping) map the gesture trajectory (red dashed line) at any position to the standard gesture trajectory (blue dashed line) at a predefined position through geometric conversion of angle and distance.
[0153] Please refer to the attached instruction manual. Figure 11 The design goal of this diagram is to solve the problem that "different human body positions will cause the gesture features to shift in the radar phase image (REM). By establishing a spatial geometric mapping relationship, the gesture REM collected at any human body position is uniformly transformed to the standard coordinate system of the predefined human body position (radial distance L0, azimuth angle 0°), thus eliminating the impact of position deviation on subsequent gesture recognition.
[0154] Figure 11 It contains two core scenes and a set of spatial mapping relationships: Arbitrary position (black human body model): represents the user's actual standing position. At this time, the radial distance of the radar-detected gesture target is R1, the pitch angle is θ1, the radial distance of the human body target is L1, and the horizontal projection distance is L1′.
[0155] Predefined position (gray human body model): represents the standard reference position set by the system, with the radial distance of the human body being L0, the radial distance of the gesture target being R0, and the pitch angle being φ0.
[0156] Spatial mapping relationship: Based on the principles of similar triangles and projection geometry, the gesture trajectory (red dashed line) at any position is mapped to the standard gesture trajectory (blue dashed line) at a predefined position through the conversion of distance and angle, thereby achieving spatial alignment of gesture features.
[0157] Figure 11 The meanings of each parameter are as follows: I. Relevant parameters at any position (black human body model) R1: Radial distance of the radar to the gesture target, a core parameter extracted from the radar phase diagram (REM), representing the straight-line distance between the gesture target and the radar.
[0158] θ1: The elevation angle of the hand gesture target relative to the radar, reflecting the angular offset of the hand gesture in the vertical direction, which is directly extracted from the radar phase map (REM).
[0159] L1: Radial distance of the radar to a static human target, representing the straight-line distance between the user's standing position and the radar.
[0160] L1′: The projected distance of a static human target in the horizontal direction, which is the horizontal component of the radial distance L1 and is used for conversion of spatial geometric mapping.
[0161] γ: The elevation angle of a static human target relative to the radar, reflecting the vertical angle characteristics when the user is standing.
[0162] R1′: The projected distance of the gesture target in the horizontal direction, which is the horizontal component of the radial distance R1.
[0163] D1: Height of the human target at any position, representing the vertical height of the user when standing.
[0164] v1: Radial velocity of the gesture target at any position, reflecting the dynamic motion characteristics of the gesture.
[0165] v1′: The horizontal velocity component of the gesture target at any position, which is the horizontal projection of the radial velocity v1.
[0166] c1: The trajectory curve of the gesture movement at any position, representing the actual movement path of the user's gesture.
[0167] α: The azimuth angle between the human standing position and the horizontal line of the radar, reflecting the user's positional deviation in the horizontal direction.
[0168] II. Parameters related to the predefined position (gray human model) R0: The standard radial distance at which the radar reaches the gesture target in a predefined position; it is a reference distance set by the system.
[0169] φ0: The standard elevation angle of the hand gesture target relative to the radar at the predefined position, which is the reference angle set by the system.
[0170] L0: The standard radial distance at which the radar reaches a static human target at a predefined position (e.g., 3m as set in this application), which is the reference distance for spatial mapping.
[0171] L0′: The standard projection distance of a static human target in the horizontal direction at a predefined position, which is the reference horizontal distance for spatial mapping.
[0172] D0: The standard height of the human target at the predefined position, which is the reference height set by the system.
[0173] v0: Standard radial velocity of the gesture target at a predefined position, used for unifying the gesture velocity features after mapping.
[0174] v0′: The standard horizontal velocity component of the gesture target at the predefined position, which is the horizontal projection of the radial velocity v0.
[0175] c0: The standard trajectory curve of the gesture movement in the predefined position, which is the mapped standard gesture path.
[0176] Combined with appendix Figure 11 In this specific example, the specific implementation process of sub-step 141, "mapping the REM containing only the gesture target to a standard coordinate system with a predefined human body position (e.g., radial distance 3m, azimuth angle 0°)," may include: Step 1: Extract raw parameters Extract the following key parameters from REM datasets containing only gesture targets and REM datasets containing only static human targets: Radial distance R1, pitch angle θ1, and radial velocity v1 of the gesture target; Radial distance L1, pitch angle γ, and horizontal projection distance L1′ of the static human target; In addition, the system preset parameters for obtaining the predefined position are: standard radial distance L0 of the human body and horizontal projection distance L0′.
[0177] Step 2: Establish geometric relationships at arbitrary locations according to Figure 11 Based on the spatial physical relationships shown and the law of cosines, establish the parametric constraint equations at arbitrary locations:
[0178] Solving simultaneously yields the arm length C1 and the gesture trajectory angle (the angle of the gesture sweeping across the fan-shaped trajectory) φ1 at any position: Step 3: Establish the geometric relationships of predefined positions Similarly, based on the law of cosines, establish the parametric constraint equations for the predefined positions:
[0179] Step 4: Perform parameter mapping Based on the invariance of physical characteristics (arm length C0=C1, gesture trajectory angle φ0=φ1, body height D0=D1), combined with the known parameter L0 of the predefined position, the radial distance R0 and pitch angle θ0 of the gesture target at the predefined position can be solved.
[0180] Based on the above conditions, the pitch angle of the human target at the predefined position can be obtained. Therefore, the radial distance R0 of the gesture target can be expressed as:
[0181] The pitch angle θ0 can be expressed as:
[0182] After calculating the values of R0 and θ0 according to the above formula, substitute the values of R0 and θ0 into the original REM to complete the spatial mapping of the gesture features and generate a standard radar phase map (REM) unified to the predefined position. At this point, sub-step 141 "mapping the REM containing only the gesture target to the standard coordinate system of the predefined human body position (e.g., radial distance 3m, azimuth angle 0°)" is completed.
[0183] Sub-step 142: Map the RAM containing only the gesture target to a standard coordinate system with a predefined human body position (e.g., radial distance 3m, azimuth angle 0°); Please refer to the attached instruction manual. Figure 1m This figure is a top-down view of a human body performing a gesture. It visually illustrates the horizontal projection relationship between a gesture at any position and the same predefined position. The core idea conveyed by this figure is: based on horizontal geometric transformation, the radial distance, azimuth, velocity, and other features of the gesture at any position are converted into standard parameters for the predefined position. This ultimately unifies the RDM and RAM features across different human standing positions, eliminating the impact of positional offset on gesture recognition.
[0184] The meanings of the parameters in this diagram are as follows: 1. Predefined position (gray human model on the left) L0: The horizontal distance at which the radar reaches the predefined human body position, which is the system's preset reference distance (e.g., 3m).
[0185] Y0: The vertical height at which the radar reaches the predefined human body position, which is the system's preset reference height.
[0186] R0: Radial distance of the radar to the gesture target at a predefined position.
[0187] x0: The horizontal projection coordinates of the gesture target at the predefined position.
[0188] y0: Vertical projection coordinates of the gesture target at the predefined position.
[0189] v0: Radial velocity of the gesture target at the predefined position.
[0190] v0′: The horizontal velocity component of the gesture target at the predefined position.
[0191] θ0: The horizontal azimuth angle of the gesture target at the predefined position.
[0192] 2. Any position (black human figure on the right) L1: The horizontal distance at which the radar reaches any human body position.
[0193] R1: Radial distance of the radar to the gesture target at any position.
[0194] R1′: Horizontal projection distance of the gesture target at any position.
[0195] x1: Horizontal projection coordinates of the gesture target at any position.
[0196] v1: Radial velocity of the gesture target at any position.
[0197] v1′: The horizontal velocity component of the gesture target at any position.
[0198] v1′′: The projection of the horizontal velocity component of the gesture target in any position onto the predefined position direction.
[0199] β: Horizontal azimuth deviation between any human body position and a predefined human body position.
[0200] θ1: The horizontal azimuth angle of the gesture target at any position.
[0201] also, Figure 1m In the diagram, the red dashed line represents the radial distance and velocity direction of the gesture target at a predefined position; the blue dashed line represents the radial distance and velocity direction of the gesture target at any position; and the black solid line represents the horizontal distance and azimuth baseline of the radar reaching any human body position.
[0202] In this embodiment of the application, to map the RDM and RAM of the gesture target from any position to a predefined position, it is necessary to calculate the azimuth angle θ0′ and velocity projection v0′ of the gesture.
[0203] The following is a detailed introduction: First, calculate R1′, L1′, and L0′ according to the following formula:
[0204]
[0205]
[0206] L1′ and L0′ are the projections of L1 and L0, respectively (projections onto the horizontal reference line directly in front of the radar (i.e., the horizontal line connecting the predefined human position and the radar).
[0207] Then, by the right triangle theorem, we can further obtain:
[0208]
[0209]
[0210] Based on the invariance of physical characteristics (arm length C0=C1), it can be concluded that... , , .therefore, It can be represented as:
[0211] Based on R0 (calculation method as described above) and This allows you to map any RAM containing only the gesture target to a predefined location.
[0212] Specifically: Traverse all valid gesture pixels in the RAM of the gesture target: Substitute the original coordinates (R1, θ1′) of each pixel into the formula to calculate the corresponding standard coordinates (R0, θ0′); The amplitude value of the pixel is assigned to the position (R0, θ0′) in the RAM under the standard coordinate system.
[0213] After remapping the coordinates of all gesture pixels, the RAM in the standard coordinate system at the predefined position is obtained.
[0214] This completes sub-step 142, "Mapping the RAM containing only the gesture target to a standard coordinate system with a predefined human body position (e.g., radial distance 3m, azimuth angle 0°)".
[0215] It should be noted that the standard coordinate system mentioned in this application refers to a fixed three-dimensional orthogonal coordinate system established with the radar equipment as the origin and based on a predefined human body position (such as radial distance 3m, azimuth angle 0°, and elevation angle 0°). (Here, for RAM mapping, the focus is on the two-dimensional sub-coordinate system of "radial distance-azimuth angle" on the horizontal plane.) It is the unified target coordinate system for all arbitrary position gesture RAM mappings.
[0216] Sub-step 143: Map the RDM containing only the gesture target to a standard coordinate system with a predefined human body position (e.g., radial distance 3m, azimuth angle 0°); In this embodiment of the application, in order to map the RDM containing only the gesture target to a standard coordinate system with a predefined human body position (e.g., radial distance 3m, azimuth angle 0°) and obtain the RDM at the predefined position, it is necessary to calculate the velocity v0.
[0217] In substep 143, v1′′ and v0′′ can be calculated according to the following formula:
[0218]
[0219] Depend on Figure 1m It can be seen that gestures at different positions have the same velocity in the distance direction, and v0′′=v1′′. Therefore, v0′ can be expressed as:
[0220] In the above formula, v1′ can be determined according to Figure 11 The relationship in is represented as ; It can be done through formula get; β and β can be obtained during the parameter estimation stage; v0′ and v0 are obtained from the calculation formulas listed above.
[0221] Finally, the RDM can be reconstructed at a predefined location using R0 and v0.
[0222] This completes sub-step 143, "Mapping the RDM containing only the gesture target to a standard coordinate system with a predefined human body position (e.g., radial distance 3m, azimuth angle 0°)".
[0223] Please refer to the instruction manual appendix. Figure 1n The accompanying diagram visually demonstrates how the mapping algorithm achieves "position-independent gesture trajectory alignment" through a comparison of three scenarios.
[0224] The following is an explanation of the attached figure: 1. First row (ac): Feature map of the original station position (5m, 0°) (a) RDM (5m): The original distance-Doppler plot when a human makes a gesture at a radial distance of 5m and an azimuth angle of 0°. Bright spots represent the energy distribution of the gesture target in the distance-Doppler dimension.
[0225] (b)RAM (5m): The corresponding original distance-azimuth diagram, with highlights reflecting the amplitude characteristics of the gesture in the distance-azimuth dimension.
[0226] (c)REM (5m): The corresponding original distance-pitch plot, with bright spots reflecting the phase characteristics of the gesture in the distance-pitch dimension.
[0227] Features: Due to the relatively far position (5m), the spatial location and distribution of the bright spots are significantly offset from the 3m reference scene.
[0228] Second row (df): Feature map after mapping (5m→3m) (d) RDM (5m to 3m): The result of transforming the original 5m RDM in the first row to the standard coordinate system (3m, 0°) using a coordinate normalization mapping algorithm.
[0229] (e)RAM (5m to 3m): The result of mapping the original 5m RAM to the 3m standard coordinate system.
[0230] (f)REM (5m to 3m): The result after mapping the original 5m REM to the 3m standard coordinate system.
[0231] Features: The spatial position of the bright spots is now fully aligned with the 3m standard coordinate system compared to the original image in the first row, and the distribution pattern of the bright spots is highly consistent with the 3m reference image in the third row.
[0232] Third row (gi): Feature map of the reference station (3m, 0°) (g)RDM (3m): The original distance-Doppler image of a human body making a gesture directly at a position of 3m and 0°, serving as a reference in the standard coordinate system.
[0233] (h)RAM (3m): Corresponding reference distance-azimuth diagram.
[0234] (i)REM (3m): The corresponding reference distance-pitch angle diagram.
[0235] Features: This is the target alignment template of the mapping algorithm, and the spatial position and distribution of its highlights are a "position-independent" baseline form.
[0236] The comparison clearly shows that: 1. Alignment effect before and after mapping: The second row (after mapping) map and the third row (3m baseline) map almost completely overlap in terms of the position and distribution of the bright spots, which verifies that the mapping algorithm can accurately convert the gesture features of any position (such as 5m) into features under the standard coordinate system (3m).
[0237] 2. Achievement of position independence: The original 5m station map and the 3m reference map have obvious offsets, but after mapping, the two are completely aligned, proving that the algorithm eliminates the influence of station differences on trajectory features and achieves "position-independent gesture trajectory generation".
[0238] In this embodiment of the application, after completing the spatial mapping of RDM, RAM, and REM, a single-frame mapping result is obtained at the standard predefined location corresponding to each frame of data. The characteristics of this result include: Each frame's mapping result is a two-dimensional matrix, and each pixel in the matrix contains the amplitude or phase information of the gesture in the standard coordinate system at that moment; the mapping results of all frames have been aligned to the same "radial distance-azimuth" standard coordinate system, which is independent of the original human body position.
[0239] Based on the single-frame mapping result, the following operations can be performed: First, initialize three accumulation matrices with the same dimensions as the single-frame mapping result, corresponding to RDM, RAM, and REM respectively—this is to accumulate gesture information from multiple frames.
[0240] The initial values of the accumulation matrix are all set to 0, and its size is consistent with the resolution of the standard coordinate system to ensure that it can fully carry the information of all mapping frames.
[0241] Then, iterate through all the mapped frame data and perform the following operation on each frame: Read the three mapping result matrices (RDM / RAM / REM) of the current frame respectively.
[0242] For each matrix read, perform the following: Accumulate the value of each pixel in the matrix into the corresponding accumulation matrix. For example: if the amplitude value of the mapping result at coordinates (R0, θ0′) in frame t is A... t Then the value of the accumulated matrix at that position is updated to: A 累加 (R0,θ0′)=A 累加 (R0,θ0′)+A t Repeat the above process until the mapping results of all frames have been accumulated, thus obtaining three accumulation matrices corresponding to RDM, RAM, and REM, respectively. The value of each pixel in the accumulation matrix represents the total energy or phase accumulation at that location throughout the entire gesture.
[0243] In this embodiment, the three accumulation matrices share the following characteristics: they are all based on the same standard coordinate system (radial distance R0 - azimuth angle θ0′, some of which include pitch angle), have completely identical size and resolution, and correspond one-to-one with each pixel.
[0244] However, the pixel values obtained from the three accumulation matrices have different meanings (corresponding to their respective physical characteristics), but they can all reflect "whether there is a gesture signal at this location": RAM accumulation matrix: Pixel value = total amplitude energy (reflects the spatial position contour of the gesture, which is the most intuitive); RDM Accumulation Matrix: Pixel value = Total Doppler velocity energy (reflects the dynamic motion characteristics of the gesture, excluding static interference); REM Accumulation Matrix: Pixel value = total phase accumulation (reflects the continuity of the gesture's motion trajectory and optimizes breakpoints).
[0245] In this embodiment, the pixels in the accumulation matrix are first thresholded to extract pixels with significantly higher energy than the background—for example, pixels with pixel values > a preset threshold of 40. Thus, for the RAM accumulation matrix, a set of "gesture space contour pixels" can be obtained, which initially presents the approximate shape of the gesture; for the RDM accumulation matrix, a set of "dynamic motion pixels" can be obtained; and for the REM accumulation matrix, a set of "continuous motion pixels" can be obtained.
[0246] Finally, a "final effective pixel set" (initially empty) is created, and the coordinates of pixels that simultaneously satisfy the criteria of "gesture space contour pixels", "dynamic motion pixels" and "continuous motion pixels" are added to the "final effective pixel set". The final gesture trajectory can be generated by sorting and connecting the coordinates in the "final set of valid pixels".
[0247] Since the mapping results of all frames have been unified to the standard coordinate system, the generated gesture trajectory only reflects the dynamic characteristics of the gesture itself and has nothing to do with the original position of the human body.
[0248] The method provided in this application addresses this problem through an ordered process of "interference filtering - precise positioning - multi-dimensional mapping - trajectory fusion": First, an RDM containing only the gesture target is constructed to eliminate non-gesture interference signals at the source; then, based on the unique index of the gesture target in the RDM, the dedicated signal data block is precisely located from the original signal to avoid mixing gesture signals with interference; subsequently, RAM and REM containing only the gesture target are constructed based on the pure signal data block to complete the horizontal and vertical spatial position features of the gesture; finally, the interference-free RDM, RAM, and REM are fused to completely restore the gesture motion features from multiple dimensions such as dynamics, orientation, and pitch, thus completely avoiding the impact of interference on trajectory extraction and achieving accurate acquisition of the gesture trajectory.
[0249] The effectiveness of the method was verified by experimental data. Details are as follows: After interference suppression and position mapping, the parameter map is transformed into a trajectory map of multi-frame gesture actions. Here, the classic trajectory matching algorithm, Dynamic Time Warping (DTW), is used for gesture recognition. The DTW algorithm can obtain the matching distance between two trajectories. Therefore, the gesture dataset is divided into reference data and test data. The DTW algorithm is used to calculate the matching distance between the reference data and the test data, and the action corresponding to the minimum matching distance is taken as the recognition result.
[0250] Hand gesture data was collected at seven different distances and azimuth angles of the human body using millimeter-wave radar. Figure 1oThe experiment showcased the positions of the human target: position 1 (2m 0°), position 2 (3m 0°), position 3 (3m 25°), position 4 (3m -25°), position 5 (5m 0°), position 6 (5m 25°), and position 7 (5m -25°). Data from positions 1, 3, 4, 5, and 6 were mapped to position 2 using a mapping algorithm. Furthermore, eight hand gestures were designed and collected: forward push (PF), push-pull (PP), left swipe (SL), right swipe (SR), radial circle (CR), upward swipe (SU), left-right swipe (LR), and vertical circle (CL). Finally, hand gesture data were collected from four men and two women of different heights. The specific details of the six participants were: User A (male, 182cm), User B (male, 177cm), User C (female, 155cm), User D (male, 173cm), User E (female, 165cm), and User F (male, 175cm). To construct a small sample dataset and avoid the problem of massive data volume in deep neural networks, the test data for the experiment was collected by Users A-E. Each measurer collected 12 samples for each gesture type at position 2 and 6 samples for each gesture type at the other positions. In addition, 10 data samples of User E at position 2 for 8 gesture types were collected as reference data. Therefore, the total number of gesture data samples was 5 measurers. 12 samples 8 movements 1 position 5 surveyors 6 samples 8 movements 6 positions 1 surveyor 10 samples 8 movements One location (a total of 2000 test and reference data points). During data acquisition, the test personnel performed hand gestures facing the radar at all locations.
[0251] for Figure 1o Of the seven locations, since part of the data at location 2 is used as reference data in the recognition model, the recognition results of the test data collected at location 2 are considered "intra-domain recognition," while the recognition results of the test data collected from other locations are considered "cross-domain recognition." The confusion matrix for the recognition accuracy of each gesture type and the recognition accuracy of all gesture types at the seven locations are shown below. Figure 1p As shown.
[0252] The average accuracy rates for intra-domain and cross-domain recognition were 97.7% and 95.4%, respectively, with recognition accuracy rates exceeding 90% for each gesture and over 94% for all positions. Position 2 had the highest accuracy rate for intra-domain recognition. Figure 1p The experimental results effectively validate the effectiveness of the position-independent gesture signal extraction method. Furthermore, it can be seen that the gesture signal processed by this method is a continuous-time gesture trajectory parameter map. Therefore, for gesture recognition, in addition to the DTW algorithm used here, other optimized algorithms with higher accuracy or lower complexity can be used, or a simple neural network model can be trained to improve recognition accuracy and model stability.
[0253] Example 2 To address the problem of accurately acquiring gesture trajectories in existing technologies, and based on the same inventive concept as the above embodiments of this application, Embodiment 2 of this application provides a gesture trajectory extraction device.
[0254] A schematic diagram of the specific structure of the device is shown below. Figure 2 As shown, it includes the following functional units: RDM construction unit 21 is used to construct a corresponding distance-Doppler image RDM containing only the gesture target based on the acquired human gesture digital signal; Data block positioning unit 22 is used to locate the signal data block corresponding to the unique index in the RDM based on the unique index of the gesture target in the human gesture digital signal; RAM and REM construction unit 23 is used to construct corresponding range-azimuth map RAM containing only the gesture target and range-pitch map REM containing only the gesture target based on the signal data block; The trajectory determination unit 24 is used to determine the trajectory of the gesture target based on the RDM, the RAM and the REM.
[0255] In one optional implementation, the RDM construction unit 21 can be specifically used to: construct a corresponding multi-frame initial RDM based on multiple frames of the human gesture digital signal; for each frame of the multi-frame initial RDM, perform background noise suppression and interference suppression in sequence to obtain a multi-frame interference-suppressed RDM; and construct the RDM containing only the gesture target based on the multi-frame interference-suppressed RDM.
[0256] In one alternative implementation, the RDM building unit 21 can be specifically used for: The data points in the RDM after interference suppression in different frames are clustered to obtain the RDM for target extraction in each frame; the RDM for target extraction contains data point clusters corresponding to each target. The data point clusters corresponding to the same target in the RDM to be extracted in each frame are merged to obtain each cumulative cluster in the continuous time dimension; the data point clusters are obtained by clustering the data points in the interference-suppressed RDM using a density-based clustering algorithm; Based on the data points contained in the cumulative cluster with the largest variance, the power value of the specified data points in the RDM to be extracted in each frame is set to 0 to obtain the RDM containing only the gesture target; Wherein, the specified data point refers to the data point contained in the other cumulative clusters besides the cumulative cluster with the largest variance.
[0257] In one optional implementation, the trajectory determination unit 24 can be specifically used to: perform coordinate normalization and alignment on the RDM, the RAM, and the REM; and obtain the gesture trajectory by fusing the coordinate normalized and aligned RDM, the coordinate normalized and aligned RAM, and the coordinate normalized and aligned REM.
[0258] In one optional implementation, the trajectory determination unit 24 can be specifically used to: map the RDM, the RAM, and the REM to the space of the predefined location according to the projection relationship between any pre-established location and the same predefined location.
[0259] The device provided in this application addresses this problem through an ordered process of "interference filtering - precise positioning - multi-dimensional mapping - trajectory fusion": First, an RDM containing only the gesture target is constructed to eliminate non-gesture interference signals at the source; then, based on the unique index of the gesture target in the RDM, the dedicated signal data block is precisely located from the original signal to avoid mixing gesture signals with interference; subsequently, RAM and REM containing only the gesture target are constructed based on the pure signal data block to complete the horizontal and vertical spatial position features of the gesture; finally, the interference-free RDM, RAM, and REM are fused to completely restore the gesture motion features from multiple dimensions such as dynamics, orientation, and pitch, thus completely avoiding the impact of interference on trajectory extraction and achieving accurate acquisition of the gesture trajectory.
[0260] Example 4 Based on the same inventive concept as the foregoing embodiments of this application, Embodiment 4 of this application provides a computing device to solve the problem of how to accurately obtain gesture trajectories from scenarios with various interference noises in the prior art.
[0261] like Figure 3As shown, the computing device includes a memory 31 and a processor 32. The memory 31 can be configured to store various other data to support operation on the electronic device. Examples of such data include instructions for any application or method used to operate on the electronic device. The memory 31 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0262] The processor 32, coupled to the memory 31, is used to execute the computer program stored in the memory 31 to perform a gesture trajectory extraction method as described in Embodiment 1 of this application.
[0263] When the processor 32 executes the computer program in the memory 31, in addition to the functions described above, it can also perform other functions, as detailed in the descriptions of the preceding embodiments.
[0264] Furthermore, such as Figure 3 As shown, the computing device also includes other components such as a display 34, a communication component 33, a power supply component 35, and an audio component 36. Figure 3 The diagram only shows some components and does not mean that the computing device includes only these components. Figure 3 The components shown.
[0265] Accordingly, embodiments of this application also provide a computer-readable storage medium storing a computer program, which, when executed by a computer, can implement the methods provided in the above embodiments.
[0266] Accordingly, this application also provides a computer program product, which stores instructions that, when executed by a computer, cause the computer to implement the methods provided in the above embodiments. The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0267] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0268] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
[0269] The entire process of model training and application involved in this application, including data acquisition and utilization, and the model training process, all comply with ethical and compliance requirements, as detailed below: I. Explanation of whether the data used in the model meets A5 requirements 1. Data Source Legality: All datasets used in training this model were obtained through legal means, covering three categories: publicly authorized data, data authorized by partners, and self-collected compliant data. Publicly authorized data comes from compliant data sources following open-source licenses such as Apache 2.0, with complete copyright attribution and authorization scope clearly marked, and no unauthorized open-source code or data reuse. Data authorized by partners has a formal data usage agreement clearly defining the scope, duration, and confidentiality obligations, with complete authorization chain proof. For self-collected data involving personal information, strict informed consent procedures have been followed, and anonymization processes (including but not limited to field masking, feature anonymization, and differential privacy technology application) have been used to remove personally identifiable information, fully complying with the requirements of the "Interim Measures for the Administration of Generative Artificial Intelligence Services," the "Personal Information Protection Law," and other relevant laws and regulations.
[0270] 2. Data Content Compliance: This dataset has undergone multiple screening and cleaning processes to remove all content that may violate social morality or harm public interests. It contains no obscene, pornographic, violent, discriminatory, or information that endangers national or public safety, nor does it involve the illegal acquisition or use of genetic resources. For data in sensitive fields (such as medical and financial fields), an additional privacy-preserving computation module (including federated learning and secure multi-party computation technologies) ensures that the data is "usable but not visible," avoiding compliance risks during the original data transmission process and ensuring that the data application scenarios and uses comply with public order and good morals and industry regulatory requirements.
[0271] 3. Data Governance Compliance: Establish a complete data traceability system to automatically record the source, collection time, annotation process, cleaning rules, and permission allocation of training data, generating traceable compliance reports to ensure data is verifiable throughout its entire lifecycle. The dataset annotation process is completed by a professional human R&D team, clearly defining the proportion of human creative contributions, avoiding reliance on AI-generated data that has not undergone substantial human modification, and complying with the "human main contribution" examination requirements in AI patent applications.
[0272] II. Explanation of Model Training Process Meeting A5 Requirements 1. Compliance of Training Objectives and Schemes: The training objectives of this model focus on [specific technical scenarios that can be supplemented, such as intelligent driving decision optimization, multimodal information interaction, etc.]. The training scheme and the final output results do not violate any mandatory provisions of laws and administrative regulations, do not harm the public interest or the legitimate rights and interests of others, and do not pose any potential risks of being used for illegal activities, infringing on privacy, or undermining public safety. The model strictly adheres to the ethical principle of "intelligent for good".
[0273] 2. Compliance Management of Training Process: A closed-loop training framework is adopted to ensure compliance and controllability of the training process. The specific process is as follows: First, training samples are obtained through compliant data sources. After the aforementioned data cleaning and desensitization, they are input into the first neural network model to generate preliminary training results. Second, an expert system is introduced to verify the preliminary results. Based on preset rules and human expert experience, the feasibility of the results is evaluated, and outputs that may pose ethical risks or compliance hazards are corrected (such as removing decision logic that violates public order and good morals, and adjusting model parameters that do not comply with safety regulations). Finally, the loss function weights are dynamically optimized based on expert system feedback to strengthen the model's learning of compliant results, avoid overfitting errors or non-compliant labels, and form a closed-loop management system of "data input - model training - expert verification - parameter optimization - result feedback" to ensure that the entire training process complies with A5 ethical review requirements.
[0274] 3. Compliance of Training Environment and Tools: Model training is implemented on a compliant training platform. All open-source frameworks and components used in the training process have obtained the corresponding licenses, and copyright statements and patent citation information are fully retained, with no infringement or reuse. The training environment is built using virtual devices (containers / virtual machines) with fixed random seeds and initial parameter configurations to ensure the reproducibility of the training process. At the same time, through access control and operation log recording, risks such as data leakage and parameter tampering during training are prevented, ensuring the security and compliance of the training process.
[0275] 4. Ethical verification of training results: After the model is trained, it will undergo an additional third-party ethical compliance assessment and algorithm filing review to verify that the model output does not violate social morality or harm public interests. For potentially sensitive scenarios (such as public services and intelligent decision-making), a special result verification mechanism will be established to ensure that the model always complies with A5 and relevant laws and regulations in practical applications.
Claims
1. A method for extracting gesture trajectories, characterized in that, include: Based on the collected digital signals of human gestures, a corresponding distance-Doppler image (RDM) containing only the gesture target is constructed. Based on the unique index of the gesture target in the RDM, locate the signal data block corresponding to the unique index from the human gesture digital signal; Based on the signal data block, construct the corresponding range-azimuth map RAM containing only the gesture target and the range-pitch map REM containing only the gesture target; The trajectory of the gesture target is determined based on the RDM, the RAM, and the REM.
2. The method as described in claim 1, characterized in that, Based on the collected digital signals of human gestures, a corresponding distance-Doppler image (RDM) containing only the gesture target is constructed, including: Based on the multi-frame digital signals of human gestures, a corresponding multi-frame initial RDM is constructed. For each frame of the initial RDM in the multi-frame set, background noise suppression and interference suppression are performed sequentially to obtain the multi-frame interference-suppressed RDM. Based on the RDM after multi-frame interference suppression, the RDM containing only gesture targets is constructed.
3. The method as described in claim 2, characterized in that, Based on the RDM after multi-frame interference suppression, the construction of the RDM containing only gesture targets includes: The data points in the RDM after interference suppression in different frames are clustered to obtain the RDM for target extraction in each frame; the RDM for target extraction contains data point clusters corresponding to each target. The data point clusters corresponding to the same target in the RDM to be extracted in each frame are merged to obtain each cumulative cluster in the continuous time dimension; the data point clusters are obtained by clustering the data points in the interference-suppressed RDM using a density-based clustering algorithm; Based on the data points contained in the cumulative cluster with the largest variance, the power value of the specified data points in the RDM to be extracted in each frame is set to 0 to obtain the RDM containing only the gesture target; Wherein, the specified data point refers to the data point contained in the other cumulative clusters besides the cumulative cluster with the largest variance.
4. The method as described in claim 1, characterized in that, Determining the trajectory of the gesture target based on the RDM, the RAM, and the REM includes: The coordinates of the RDM, RAM, and REM are all normalized and aligned. The gesture trajectory is obtained by fusing the coordinate-normalized and aligned RDM, the coordinate-normalized and aligned RAM, and the coordinate-normalized and aligned REM.
5. The method as described in claim 4, characterized in that, The coordinates of the RDM, RAM, and REM are all normalized and aligned, including: Based on the pre-established projection relationship between any location and the same predefined location, the RDM, the RAM, and the REM are mapped to the space of the predefined location.
6. A device for extracting gesture trajectories, characterized in that, include: RDM construction unit, used to construct the corresponding distance-Doppler image RDM containing only the gesture target based on the acquired human gesture digital signal; A data block localization unit is used to locate a signal data block corresponding to the unique index of the gesture target in the RDM from the human gesture digital signal. RAM and REM construction units are used to construct corresponding range-azimuth map RAM containing only the gesture target and range-pitch map REM containing only the gesture target based on the signal data block; The trajectory determination unit is used to determine the trajectory of the gesture target based on the RDM, the RAM, and the REM.
7. The apparatus as claimed in claim 6, characterized in that, The RDM construction unit is specifically used for: Based on the multi-frame digital signals of human gestures, a corresponding multi-frame initial RDM is constructed. For each frame of the initial RDM in the multi-frame set, background noise suppression and interference suppression are performed sequentially to obtain the multi-frame interference-suppressed RDM. Based on the RDM after multi-frame interference suppression, the RDM containing only gesture targets is constructed.
8. A computing device, characterized in that, include: Memory and processor, among which, The memory is used to store computer programs; The processor, coupled to the memory, is configured to execute the computer program stored in the memory for performing the method according to any one of claims 1 to 5.
9. A computer-readable storage medium storing a computer program, which, when executed by a computer, enables the implementation of the method according to any one of claims 1 to 5.
10. A computer program product, characterized in that, The computer program product stores instructions that, when executed by a computer, cause the computer to perform the method described in any one of claims 1 to 5.