Human-machine interaction method, device and vehicle based on millimeter wave radar

By generating micro-Doppler feature maps using millimeter-wave radar and combining them with an interactive recognition model, the problem of delayed human-machine interaction response during remote takeover of autonomous vehicles was solved, achieving higher real-time performance and reliability.

CN122232649APending Publication Date: 2026-06-19BEIJING AUTOMOBILE RES GENERAL INST

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING AUTOMOBILE RES GENERAL INST
Filing Date
2026-03-31
Publication Date
2026-06-19

Smart Images

  • Figure CN122232649A_ABST
    Figure CN122232649A_ABST
Patent Text Reader

Abstract

This application proposes a human-machine interaction method, device, and vehicle based on millimeter-wave radar. The method includes: acquiring user behavior information and a human motion feature matrix using millimeter-wave radar; generating a micro-Doppler feature map of the user based on the behavior information and the human motion feature matrix; inputting the micro-Doppler feature map into an interaction recognition model to generate the user's interaction intent; and controlling the vehicle based on the interaction intent. Thus, by acquiring human motion data using millimeter-wave radar to generate a micro-Doppler feature map, and combining it with an interaction recognition model to accurately identify the interaction intent to control the vehicle, the real-time performance and reliability of autonomous driving human-machine interaction are improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of autonomous driving technology, and in particular to a human-machine interaction method, device and vehicle based on millimeter-wave radar. Background Technology

[0002] In actual driving, autonomous vehicles are prone to sudden abnormal situations such as sensor failure and extreme road conditions. As a core safety redundancy measure for autonomous driving, remote takeover systems can intervene and control the vehicle in a timely manner when the onboard system malfunctions, effectively ensuring driving safety. However, existing remote takeover human-machine interaction methods have obvious shortcomings: physical control response is lagging, visual and voice interaction is easily affected by environmental interference, robustness and real-time performance are poor, and the interaction mode is limited, failing to meet the stringent safety requirements of remote takeover. Summary of the Invention

[0003] This application aims to at least partially address one of the technical problems in the related art.

[0004] Therefore, the first objective of this application is to propose a human-computer interaction method based on millimeter-wave radar, which can collect human body movements through millimeter-wave radar to generate micro-Doppler feature maps, and combine them with an interaction recognition model to accurately identify interaction intentions in order to control the vehicle, thereby improving the real-time performance and reliability of autonomous driving human-computer interaction.

[0005] The second objective of this application is to propose a human-computer interaction device based on millimeter-wave radar.

[0006] The third objective of this application is to propose a vehicle.

[0007] To achieve the above objectives, the first aspect of this application proposes a human-computer interaction method based on millimeter-wave radar, comprising the following steps: acquiring user behavior information and human motion feature matrix through millimeter-wave radar; generating a micro-Doppler feature map of the user based on the behavior information and human motion feature matrix; inputting the micro-Doppler feature map into an interaction recognition model to generate the user's interaction intent, and controlling the vehicle according to the interaction intent.

[0008] According to the human-machine interaction method based on millimeter-wave radar according to the embodiments of this application, firstly, user behavior information and human motion feature matrix are acquired through millimeter-wave radar. Then, based on the behavior information and human motion feature matrix, a micro-Doppler feature map of the user is generated. Finally, the micro-Doppler feature map is input into an interaction recognition model to generate the user's interaction intent, and the vehicle is controlled according to the interaction intent. Thus, by acquiring human motion data using millimeter-wave radar to generate a micro-Doppler feature map, and combining it with an interaction recognition model to accurately identify the interaction intent to control the vehicle, the real-time performance and reliability of autonomous driving human-machine interaction are improved.

[0009] In addition, the human-computer interaction method based on millimeter-wave radar according to the above embodiments of this application may also have the following additional technical features: In one embodiment of this application, the behavioral information includes the user's gesture distance, gesture speed, and gesture angle. Generating the user's micro-Doppler feature map based on the behavioral information and the human motion feature matrix includes: performing a Fast Fourier Transform on the behavioral information to obtain the user's gesture information, wherein the gesture information includes the user's gesture distance, gesture speed, and gesture angle; performing Doppler motion target display processing on the human motion feature matrix to obtain the user's Doppler velocity information; and generating the user's micro-Doppler feature map based on the Doppler velocity information, gesture distance, gesture speed, and gesture angle.

[0010] In one embodiment of this application, a micro-Doppler feature map of a user is generated based on Doppler velocity information, gesture distance, gesture speed, and gesture angle, including: processing the gesture distance, gesture speed, and gesture angle based on Doppler velocity information to obtain three-dimensional point cloud data corresponding to the gesture information; and mapping the three-dimensional point cloud data onto a two-dimensional plane to obtain a micro-Doppler feature map.

[0011] In one embodiment of this application, the interaction recognition model includes a first neural network and a second neural network. The process of inputting a micro-Doppler feature map into the interaction recognition model to generate a user's interaction intent includes: performing zero-padding expansion processing on the micro-Doppler feature map using the first neural network to obtain a target feature map; modeling the local continuity information of the target feature map using the first neural network to obtain feature overlap blocks; performing a first transformation on the feature overlap blocks using the second neural network to obtain a two-dimensional feature map; determining a first convolutional layer and a second convolutional layer in the interaction recognition model using the second neural network, and obtaining an interaction intent feature map based on the first convolutional layer, the second convolutional layer, and the two-dimensional feature map; performing a second transformation on the interaction intent feature map using the second neural network to obtain target interaction intent features; and recognizing the target interaction intent features using the interaction recognition model to generate the user's interaction intent.

[0012] In one embodiment of this application, an interaction intent feature map is obtained based on a first convolutional layer, a second convolutional layer, and a two-dimensional feature map, including: obtaining a first feature in the convolutional kernel of the first convolutional layer and a second feature in the convolutional kernel of the second convolutional layer; and aggregating the first feature, the second feature, and the two-dimensional feature map to obtain the interaction intent feature map.

[0013] In one embodiment of this application, the two-dimensional feature map is obtained by the following formula:

[0014] in, and The resolution of the original image. Let Seq2Img be the feature space, which represents the function that converts the overlapping block sequence into a two-dimensional feature map.

[0015] In one embodiment of this application, the interaction intent feature map is obtained using the following formula:

[0016] in, It is a two-dimensional feature map. and express Convolution kernel features Represents depthwise convolution Kernel characteristics, This is the dimensional expansion ratio. This represents the convolution operation. This represents the activation function.

[0017] In one embodiment of this application, the target interaction intent feature is obtained by the following formula:

[0018] Here, Img2Seq represents a function that maps image features to a sequence.

[0019] To achieve the above objectives, a second aspect of this application proposes a human-computer interaction device based on millimeter-wave radar, comprising an acquisition module for acquiring user behavior information and a human motion feature matrix via millimeter-wave radar; a first generation module for generating a micro-Doppler feature map of the user based on the behavior information and the human motion feature matrix; and a second generation module for inputting the micro-Doppler feature map into an interaction recognition model to generate the user's interaction intent and control the vehicle based on the interaction intent.

[0020] According to the embodiments of this application, the millimeter-wave radar-based human-machine interaction device first acquires user behavior information and a human motion feature matrix through a millimeter-wave radar via an acquisition module. Then, a first generation module generates a micro-Doppler feature map of the user based on the behavior information and the human motion feature matrix. Finally, a second generation module inputs the micro-Doppler feature map into an interaction recognition model to generate the user's interaction intent and control the vehicle based on the interaction intent. Thus, by acquiring human motion data using millimeter-wave radar to generate a micro-Doppler feature map and combining it with an interaction recognition model to accurately identify the interaction intent and control the vehicle, the real-time performance and reliability of autonomous driving human-machine interaction are improved.

[0021] To achieve the above objectives, a third aspect of this application provides a vehicle comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the human-machine interaction method based on millimeter-wave radar as described above.

[0022] According to the embodiments of this application, when the processor executes the computer program, the human-machine interaction method based on millimeter-wave radar described above is implemented. This method realizes the generation of micro-Doppler feature maps by collecting human body movements through millimeter-wave radar, and the accurate identification of interaction intentions by combining them with an interaction recognition model to control the vehicle, thereby improving the real-time performance and reliability of autonomous driving human-machine interaction.

[0023] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description

[0024] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein: Figure 1 This is a flowchart illustrating a human-computer interaction method based on millimeter-wave radar according to some embodiments of this application; Figure 2 This is a schematic diagram of overlapping blocks for a human-computer interaction method based on millimeter-wave radar according to some embodiments of this application; Figure 3 This is a flowchart illustrating a human-computer interaction method based on millimeter-wave radar according to a specific embodiment of this application; Figure 4 This is a block diagram of a human-computer interaction device based on millimeter-wave radar according to some embodiments of this application; Figure 5 This is a schematic diagram of the structure of a vehicle according to some embodiments of this application. Detailed Implementation

[0025] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.

[0026] The following description, with reference to the accompanying drawings, describes a human-computer interaction method, apparatus, and vehicle based on millimeter-wave radar according to embodiments of this application.

[0027] like Figure 1 As shown, the human-computer interaction method based on millimeter-wave radar in this application embodiment may include the following steps: Step S1: Obtain user behavior information and human motion feature matrix through millimeter-wave radar.

[0028] Specifically, a millimeter-wave radar adapted for remote takeover of autonomous driving is selected as the core sensing device. This millimeter-wave radar is fixedly deployed at a designated location on the remote control terminal, ensuring that the radar detection beam can completely cover the upper limb movement area of ​​the remote operator, avoiding signal loss due to obstruction. Preferably, the millimeter-wave radar can be a 77GHz frequency-modulated continuous wave radar, configured with a three-transmit, four-receive antenna array. After the radar is activated, it continuously detects the user's behavioral information, synchronously collecting the micro-Doppler effect caused by human movements, generating a corresponding human movement feature matrix, including movement speed, direction, acceleration, and posture changes of moving parts, providing rich dynamic information for subsequent action recognition.

[0029] Step S2: Generate the user's micro-Doppler feature map based on behavioral information and human motion feature matrix.

[0030] Specifically, the data processing module first preprocesses the raw data acquired by the millimeter-wave radar, removing static clutter interference through filtering and clutter suppression algorithms while retaining the motion information of dynamic targets. Then, it transforms and enhances the behavioral information and human action feature matrix, respectively, and fuses the processed data to generate a two-dimensional micro-Doppler feature map for subsequent feature learning and intent recognition by deep learning models. This micro-Doppler feature map presents the energy distribution of user actions in both the time and Doppler frequency dimensions, effectively reflecting the motion patterns and dynamic characteristics of the actions.

[0031] In one embodiment of this application, the behavioral information includes the user's gesture distance, gesture speed, and gesture angle. Generating the user's micro-Doppler feature map based on the behavioral information and the human motion feature matrix includes: performing a Fast Fourier Transform on the behavioral information to obtain the user's gesture information, wherein the gesture information includes the user's gesture distance, gesture speed, and gesture angle; performing Doppler motion target display processing on the human motion feature matrix to obtain the user's Doppler velocity information; and generating the user's micro-Doppler feature map based on the Doppler velocity information, gesture distance, gesture speed, and gesture angle.

[0032] Specifically, a Fast Fourier Transform (FFT) is performed on the behavioral information acquired by millimeter-wave radar to convert the time-domain signal to the frequency domain, allowing for the analysis of the user's gesture distance, gesture speed, and gesture angle. Doppler dynamic target display processing is then applied to the human motion feature matrix, constructing a clutter suppression filter to remove static clutter generated by stationary backgrounds (such as seats and vehicle interiors) while retaining the Doppler velocity information of dynamic targets, thus highlighting the motion characteristics of the user's hands and upper limbs. The Doppler velocity information is then fused with the gesture distance, gesture speed, and gesture angle to form a unified data structure representing the user's actions, upon which a micro-Doppler feature map is generated.

[0033] In one embodiment of this application, a micro-Doppler feature map of a user is generated based on Doppler velocity information, gesture distance, gesture speed, and gesture angle, including: processing the gesture distance, gesture speed, and gesture angle based on Doppler velocity information to obtain three-dimensional point cloud data corresponding to the gesture information; and mapping the three-dimensional point cloud data onto a two-dimensional plane to obtain a micro-Doppler feature map.

[0034] Specifically, using Doppler velocity information as a constraint, effective data points belonging to the user's hand movements are extracted from gesture distance, gesture velocity, and gesture angle data. An algorithm such as Euclidean distance segmentation is employed to separate the hand point cloud from the background point cloud, removing interference noise and generating 3D point cloud data containing spatial position and motion information. Each point in this 3D point cloud data includes distance coordinates, velocity vectors, and angular orientation information. The 3D point cloud data is then accumulated over time, projecting the point cloud data from consecutive time frames onto the time-Doppler frequency plane. By reducing the spatial dimension, a two-dimensional micro-Doppler feature map is generated. The energy distribution in this map reflects the dynamic changes of the user's actions in the time-frequency domain, facilitating subsequent feature extraction and recognition.

[0035] Step S3: Input the micro-Doppler feature map into the interaction recognition model to generate the user's interaction intent, and control the vehicle according to the interaction intent.

[0036] Specifically, the micro-Doppler feature map is input into a pre-trained interaction recognition model. A deep learning network extracts and classifies the features from the map, outputting the user's interaction intent and generating corresponding vehicle control commands based on the recognition results. This interaction recognition model can employ a hybrid neural network architecture, combining the advantages of global and local feature extraction networks to effectively learn the spatiotemporal features of user actions from the micro-Doppler feature map. Interaction intents include, but are not limited to, initiating remote takeover, adjusting driving speed, changing driving route, emergency braking, confirming commands, and canceling commands. After receiving the interaction intent output by the interaction recognition model, the vehicle control module generates corresponding control signals according to a preset command mapping relationship and sends the control commands to the vehicle's drive-by-wire system via the vehicle communication bus, enabling remote control of vehicle steering, braking, acceleration, and other operations.

[0037] In one embodiment of this application, the interaction recognition model includes a first neural network and a second neural network. The process of inputting a micro-Doppler feature map into the interaction recognition model to generate a user's interaction intent includes: performing zero-padding expansion processing on the micro-Doppler feature map using the first neural network to obtain a target feature map; modeling the local continuity information of the target feature map using the first neural network to obtain feature overlap blocks; performing a first transformation on the feature overlap blocks using the second neural network to obtain a two-dimensional feature map; determining a first convolutional layer and a second convolutional layer in the interaction recognition model using the second neural network, and obtaining an interaction intent feature map based on the first convolutional layer, the second convolutional layer, and the two-dimensional feature map; performing a second transformation on the interaction intent feature map using the second neural network to obtain target interaction intent features; and recognizing the target interaction intent features using the interaction recognition model to generate the user's interaction intent.

[0038] Specifically, the interactive recognition model can adopt a dual neural network collaborative architecture, such as... Figure 2 As shown, firstly, the edges of the input micro-Doppler feature map are expanded with zero padding using a first neural network to avoid loss of edge information during feature map segmentation, resulting in a complete target feature map. Then, a convolutional kernel with a stride smaller than the image block size is used to segment the expanded feature map into blocks, creating overlapping areas between adjacent image blocks. Each image block, when input to subsequent network layers, not only contains its own information but also incorporates information from its adjacent image blocks above, below, left, and right, effectively reducing information loss and achieving preliminary modeling of the local continuity information of the feature map. Next, a second neural network performs a first transformation on the overlapping feature blocks. Using the Seq2Img function built into the second neural network, the discrete overlapping feature block sequence output by the first neural network is reconstructed into a two-dimensional feature map with a regular size that meets the input requirements of the convolutional layer. Then, preset first and second convolutional layers are called to perform deep feature extraction on the two-dimensional feature map, generating a high-dimensional interactive intent feature map. Preferably, the first convolutional layer is... Convolutional layers are used to replace fully connected layers in traditional feedforward networks for feature transformation. The second convolutional layer is a depthwise convolutional layer. The convolutional kernel is used to aggregate feature information within a local region. Then, a second neural network performs a second transformation on this feature map. An image-to-sequence conversion layer converts the convolutionally processed two-dimensional feature map back into a sequence form to obtain the target interaction intent features. Finally, an interaction recognition model identifies the target interaction intent features. After processing through multiple self-attention layers, a fully connected layer and a Softmax classifier complete the classification and recognition, outputting the final interaction intent recognition result.

[0039] In one embodiment of this application, an interaction intent feature map is obtained based on a first convolutional layer, a second convolutional layer, and a two-dimensional feature map, including: obtaining a first feature in the convolutional kernel of the first convolutional layer and a second feature in the convolutional kernel of the second convolutional layer; and aggregating the first feature, the second feature, and the two-dimensional feature map to obtain the interaction intent feature map.

[0040] Specifically, the first feature is the weight coefficient of a 1×1 convolutional kernel, used to perform linear transformation of the input features along the channel dimension. The second feature is the weight coefficient of a 3×3 depthwise convolutional kernel, used to aggregate features along the spatial dimension while maintaining channel independence. First, the two-dimensional feature map is convolved point-by-point through the first convolutional layer to transform and compress features along the channel dimension. Then, the transformed features are spatially convolved through the depthwise convolutional layer to extract correlation information within local regions. Finally, the channel dimension is transformed again through the first convolutional layer to output the final interactive intent feature map. This aggregation process organically combines local feature extraction with channel feature transformation, enhancing the model's ability to represent local details in the micro-Doppler feature map.

[0041] In one embodiment of this application, the two-dimensional feature map is obtained by the following formula:

[0042] in, and The resolution of the original image. Let Seq2Img be the feature space, which represents the function that converts the overlapping block sequence into a two-dimensional feature map.

[0043] Specifically, the input feature X is an overlapping block sequence obtained after processing by the first neural network. Each image block in this sequence has fused the local continuity information of adjacent regions. The image blocks in sequence are rearranged according to their original spatial position relationship using the Seq2Img function to form a two-dimensional feature map with dimensions h×w×d, providing a data organization form that conforms to the spatial structure for subsequent convolution operations.

[0044] In one embodiment of this application, the interaction intent feature map is obtained using the following formula:

[0045] in, It is a two-dimensional feature map. and express Convolution kernel features Represents depthwise convolution Kernel characteristics, This is the dimensional expansion ratio. This represents the convolution operation. This represents the activation function.

[0046] Specifically, firstly through The first convolution kernel Perform channel upsizing on the 2D feature map, and then... Depth convolution kernel Extracting subtle local motion features compensates for the insufficiency of local feature extraction in the global network, and then... The second convolution kernel Channel dimensionality reduction, size expansion ratio To balance model recognition accuracy and computational complexity, a value of 4 is preferred. The activation function implements non-linear feature mapping, ultimately yielding a highly recognizable interactive intent feature map. .

[0047] In one embodiment of this application, the target interaction intent feature is obtained by the following formula:

[0048] Here, Img2Seq represents a function that maps image features to a sequence.

[0049] Specifically, the two-dimensional interactive intent feature map is generated using the Img2Seq function. Reconstructed target interaction intent features in sequence form It adapts to the sequence input requirements of the subsequent self-attention layer and classification layer of the interactive recognition model, realizes the conversion of image features into recognizable sequence features, ensures that the model can quickly complete intent classification and command matching, and finally output interactive intents that can be directly used for vehicle control.

[0050] This embodiment first acquires user behavior information and a human motion feature matrix using millimeter-wave radar. Then, based on the behavior information and the human motion feature matrix, it generates a micro-Doppler feature map of the user. Finally, the micro-Doppler feature map is input into an interaction recognition model to generate the user's interaction intent, and the vehicle is controlled according to the interaction intent. Thus, by using millimeter-wave radar to collect human motion data and generate a micro-Doppler feature map, combined with an interaction recognition model, it can accurately identify interaction intent to control the vehicle, improving the real-time performance and reliability of autonomous driving human-machine interaction.

[0051] As a specific embodiment of this application, such as Figure 3 As shown, the human-computer interaction method based on millimeter-wave radar may include the following steps: S101 collects the driver's raw echo signal using millimeter-wave radar.

[0052] S102 processes the raw echo signal to obtain user behavior information and human motion feature matrix.

[0053] S103, perform Fast Fourier Transform on the behavioral information to obtain gesture information.

[0054] S104 performs Doppler motion target display processing on the human motion feature matrix to obtain Doppler velocity information.

[0055] S105 generates 3D point cloud data based on Doppler velocity information, gesture distance, gesture speed, and gesture angle.

[0056] S106 maps 3D point cloud data onto a 2D plane to generate a micro-Doppler feature map.

[0057] S107, input the micro-Doppler feature map into the interactive recognition model.

[0058] S108, the micro-Doppler feature map is expanded by zero padding through the first neural network to obtain the target feature map.

[0059] S109, The local continuity information of the target feature map is modeled by the first neural network to obtain the feature overlap block.

[0060] S110, the overlapping feature blocks are transformed by the second neural network to obtain a two-dimensional feature map.

[0061] S111, the two-dimensional feature map is processed by the second neural network based on the first and second convolutional layers to obtain the interactive intent feature map.

[0062] S112, the interaction intent feature map is transformed by the second neural network to obtain the target interaction intent feature.

[0063] S113, the interaction intent features of the target are identified through the interaction recognition model to generate the user's interaction intent.

[0064] S114, control the vehicle according to the interaction intent.

[0065] In summary, the human-machine interaction method based on millimeter-wave radar according to the embodiments of this application first acquires user behavior information and a human motion feature matrix using millimeter-wave radar. Then, based on the behavior information and the human motion feature matrix, a micro-Doppler feature map of the user is generated. Finally, the micro-Doppler feature map is input into an interaction recognition model to generate the user's interaction intent, and the vehicle is controlled according to the interaction intent. Therefore, by acquiring human motion data using millimeter-wave radar to generate a micro-Doppler feature map, and combining it with an interaction recognition model to accurately identify the interaction intent to control the vehicle, the real-time performance and reliability of autonomous driving human-machine interaction are improved.

[0066] Corresponding to the above embodiments, this application also proposes a human-computer interaction device based on millimeter-wave radar.

[0067] like Figure 4 As shown, the human-computer interaction device 200 based on millimeter-wave radar in this application embodiment includes: an acquisition module 210, a first generation module 220, and a second generation module 230.

[0068] The acquisition module 210 is used to acquire user behavior information and human motion feature matrix through millimeter-wave radar; the first generation module 220 is used to generate a micro-Doppler feature map of the user based on the behavior information and human motion feature matrix; and the second generation module 230 is used to input the micro-Doppler feature map into the interaction recognition model to generate the user's interaction intent and control the vehicle according to the interaction intent.

[0069] According to one embodiment of this application, the behavioral information includes the user's gesture distance, gesture speed, and gesture angle. The first generation module 220 is used to generate a micro-Doppler feature map of the user based on the behavioral information and the human motion feature matrix, including: performing a fast Fourier transform on the behavioral information to obtain the user's gesture information, wherein the gesture information includes the user's gesture distance, gesture speed, and gesture angle; performing Doppler motion target display processing on the human motion feature matrix to obtain the user's Doppler velocity information; and generating a micro-Doppler feature map of the user based on the Doppler velocity information, gesture distance, gesture speed, and gesture angle.

[0070] According to one embodiment of this application, the first generation module 220 is further configured to generate a micro-Doppler feature map of the user based on Doppler velocity information, gesture distance, gesture speed and gesture angle, including: processing the gesture distance, gesture speed and gesture angle based on Doppler velocity information to obtain three-dimensional point cloud data corresponding to the gesture information; and mapping the three-dimensional point cloud data onto a two-dimensional plane to obtain a micro-Doppler feature map.

[0071] According to one embodiment of this application, the interaction recognition model includes a first neural network and a second neural network. A second generation module 230 is used to input a micro-Doppler feature map into the interaction recognition model to generate a user's interaction intent. This includes: performing zero-padding expansion processing on the micro-Doppler feature map using the first neural network to obtain a target feature map; modeling the local continuity information of the target feature map using the first neural network to obtain feature overlap blocks; performing a first transformation on the feature overlap blocks using the second neural network to obtain a two-dimensional feature map; determining a first convolutional layer and a second convolutional layer in the interaction recognition model using the second neural network, and obtaining an interaction intent feature map based on the first convolutional layer, the second convolutional layer, and the two-dimensional feature map; performing a second transformation on the interaction intent feature map using the second neural network to obtain target interaction intent features; and recognizing the target interaction intent features using the interaction recognition model to generate the user's interaction intent.

[0072] According to one embodiment of this application, the second generation module 230 is further configured to obtain an interaction intent feature map based on the first convolutional layer, the second convolutional layer, and the two-dimensional feature map, including: obtaining a first feature in the convolutional kernel of the first convolutional layer and a second feature in the convolutional kernel of the second convolutional layer respectively; and aggregating the first feature, the second feature, and the two-dimensional feature map to obtain the interaction intent feature map.

[0073] According to one embodiment of this application, a two-dimensional feature map is obtained by the following formula:

[0074] in, and The resolution of the original image. Let Seq2Img be the feature space, which represents the function that converts the overlapping block sequence into a two-dimensional feature map.

[0075] According to one embodiment of this application, the interaction intent feature map is obtained by the following formula:

[0076] in, It is a two-dimensional feature map. and express Convolution kernel features Represents depthwise convolution Kernel characteristics, This is the dimensional expansion ratio. This represents the convolution operation. This represents the activation function.

[0077] According to one embodiment of this application, the target interaction intent feature is obtained by the following formula:

[0078] Here, Img2Seq represents a function that maps image features to a sequence.

[0079] It should be noted that the above explanation of the embodiments and beneficial effects of the human-computer interaction method based on millimeter-wave radar also applies to the human-computer interaction device based on millimeter-wave radar in the embodiments of this application. To avoid redundancy, it will not be elaborated in detail here.

[0080] In summary, the human-machine interaction device based on millimeter-wave radar according to the embodiments of this application first acquires user behavior information and a human motion feature matrix through a millimeter-wave radar via an acquisition module. Then, a first generation module generates a micro-Doppler feature map of the user based on the behavior information and the human motion feature matrix. Finally, a second generation module inputs the micro-Doppler feature map into an interaction recognition model to generate the user's interaction intent and control the vehicle based on the interaction intent. Therefore, by acquiring human motion data using millimeter-wave radar to generate a micro-Doppler feature map and combining it with an interaction recognition model to accurately identify the interaction intent for vehicle control, the real-time performance and reliability of autonomous driving human-machine interaction are improved.

[0081] Corresponding to the above embodiments, this application also proposes a vehicle.

[0082] Figure 5 This is a structural schematic diagram of the vehicle according to an embodiment of this application, such as... Figure 5 As shown, the vehicle 300 includes: a memory 310, a processor 320, and a computer program stored on the memory 310 and executable on the processor 320. The processor 320 executes the program to implement any of the above-described human-machine interaction methods based on millimeter-wave radar.

[0083] According to the embodiments of this application, when the processor executes the computer program, the vehicle implements any of the above-mentioned human-computer interaction methods based on millimeter-wave radar. This method realizes the generation of micro-Doppler feature maps by collecting human body movements through millimeter-wave radar, and accurately identifies the interaction intentions in combination with the interaction recognition model to control the vehicle, thereby improving the real-time performance and reliability of autonomous driving human-computer interaction.

[0084] Specifically, in the embodiments of this application, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "multiple" means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0085] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0086] Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of this application.

Claims

1. A human-computer interaction method based on millimeter-wave radar, characterized in that, include: The millimeter-wave radar acquires user behavior information and human motion feature matrix. Based on the behavioral information and the human motion feature matrix, a micro-Doppler feature map of the user is generated; The micro-Doppler feature map is input into the interaction recognition model to generate the user's interaction intent, and the vehicle is controlled according to the interaction intent.

2. The human-computer interaction method based on millimeter-wave radar according to claim 1, characterized in that, The behavioral information includes the user's gesture distance, gesture speed, and gesture angle, wherein generating the user's micro-Doppler feature map based on the behavioral information and the human motion feature matrix includes: The behavioral information is processed by Fast Fourier Transform to obtain the user's gesture information, wherein the gesture information includes the user's gesture distance, gesture speed, and gesture angle; The human motion feature matrix is ​​subjected to Doppler motion target display processing to obtain the user's Doppler velocity information; Based on the Doppler velocity information, the gesture distance, the gesture speed, and the gesture angle, a micro-Doppler feature map of the user is generated.

3. The human-computer interaction method based on millimeter-wave radar according to claim 2, characterized in that, The step of generating the user's micro-Doppler feature map based on the Doppler velocity information, the gesture distance, the gesture speed, and the gesture angle includes: The gesture distance, gesture speed, and gesture angle are processed based on the Doppler velocity information to obtain the three-dimensional point cloud data corresponding to the gesture information; The three-dimensional point cloud data is mapped onto a two-dimensional plane to obtain the micro-Doppler feature map.

4. The human-computer interaction method based on millimeter-wave radar according to claim 2, characterized in that, The interaction recognition model includes a first neural network and a second neural network, wherein inputting the micro-Doppler feature map into the interaction recognition model to generate the user's interaction intent includes: The micro-Doppler feature map is expanded by zero-padding using the first neural network to obtain the target feature map; The local continuity information of the target feature map is modeled using the first neural network to obtain feature overlap blocks; The overlapping feature blocks are transformed using the second neural network to obtain a two-dimensional feature map. The first and second convolutional layers in the interaction recognition model are determined by the second neural network, and the interaction intent feature map is obtained based on the first convolutional layer, the second convolutional layer and the two-dimensional feature map. The interaction intent feature map is transformed by the second neural network to obtain the target interaction intent feature. The interaction recognition model is used to identify the target interaction intent features in order to generate the user's interaction intent.

5. The human-computer interaction method based on millimeter-wave radar according to claim 4, characterized in that, The process of obtaining an interaction intent feature map based on the first convolutional layer, the second convolutional layer, and the two-dimensional feature map includes: The first feature in the convolution kernel of the first convolutional layer and the second feature in the convolution kernel of the second convolutional layer are obtained respectively. The first feature, the second feature, and the two-dimensional feature map are aggregated to obtain the interaction intent feature map.

6. The human-computer interaction method based on millimeter-wave radar according to claim 4, characterized in that, The two-dimensional feature map is obtained using the following formula: in, and The resolution of the original image. Let Seq2Img be the feature space, which represents the function that converts the overlapping block sequence into a two-dimensional feature map.

7. The human-computer interaction method based on millimeter-wave radar according to claim 5, characterized in that, The interaction intent feature map is obtained using the following formula: in, It is a two-dimensional feature map. and express Convolution kernel features Represents depthwise convolution Kernel characteristics, This is the dimensional expansion ratio. This represents the convolution operation. This represents the activation function.

8. The human-computer interaction method based on millimeter-wave radar according to claim 4, characterized in that, The target interaction intent feature is obtained through the following formula: Here, Img2Seq represents a function that maps image features to a sequence.

9. A human-computer interaction device based on millimeter-wave radar, characterized in that, include: The acquisition module is used to acquire user behavior information and human motion feature matrix through the millimeter-wave radar; The first generation module is used to generate the user's micro-Doppler feature map based on the behavioral information and the human body action feature matrix; The second generation module is used to input the micro-Doppler feature map into the interaction recognition model to generate the user's interaction intent, and control the vehicle according to the interaction intent.

10. A vehicle, characterized in that, include: A memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing the program to implement the human-computer interaction method based on millimeter-wave radar as described in any one of claims 1-8.