A multi-level human-computer interaction intent recognition method for complex task scenarios
By combining hierarchical target analysis, operation sequence graphs, 1D-CNN, Bi-LSTM, and dynamic Bayesian networks, the accuracy and interpretability issues of multi-level human-computer interaction intent recognition in complex task scenarios are solved, achieving high accuracy and interpretability of multi-level intent recognition.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SOUTHEAST UNIV
- Filing Date
- 2024-08-27
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies struggle to accurately identify multi-level human-computer interaction intentions in complex task scenarios. In particular, changes in human, system, and environmental states during complex tasks limit the usability of intention recognition, and existing methods lack interpretability.
Task analysis is performed by combining hierarchical target analysis and operation sequence diagrams. Low-level intent recognition is achieved by combining 1D-CNN and Bi-LSTM models, and high-level intent recognition is achieved by combining dynamic Bayesian networks. The model is trained by multi-level intent feature datasets to achieve intent recognition from the bottom feature data layer by layer upward.
It achieves multi-level intent recognition in complex task scenarios, improves recognition accuracy and interpretability, and has versatility and flexibility, making it suitable for multi-level intent recognition in complex task scenarios.
Smart Images

Figure CN119150069B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of human-computer interaction intent recognition technology, and in particular to a multi-level human-computer interaction intent recognition method for complex task scenarios. Background Technology
[0002] With the development of machine intelligence, human-machine interaction has gradually evolved into human-intelligent system interaction (Duric Z, Gray WD, Heishman R, et al. Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction[J]. Proceedings of the IEEE, 2002, 90(7):1272-1289.). This interaction relies on mutual understanding and trust; therefore, realizing the intelligent system's recognition of human intentions is a crucial foundation for intelligent human-computer interaction. Intention recognition is a technology that uses internal system sensors to perceive user behavior, system and environmental contextual information, thereby inferring human intentions or predicting their next action. Currently, intention recognition technology has been widely studied and applied in numerous human-computer interaction scenarios, including ubiquitous computing, human-machine collaboration, intelligent driving, and operator-centric complex systems.
[0003] Many real-world work scenarios involve complex task situations, during which there is close information interaction and mutual influence between people, systems, and the environment. Changes in the state of the person, the system, and the environment—that is, contextual information—can lead to changes in the person's current task intention. Current research on intention recognition in complex task scenarios often focuses on a single scale, limiting its usability. Observing a person's interactive action sequence over a certain period can identify simple interactive intentions, but it cannot characterize the complex task intentions reflected behind those intentions. Complex task intentions inferred from human hand and eye interactions lack interpretability and have a significant risk of misjudgment.
[0004] In fact, in human-computer interaction systems with complex task scenarios, human intentions exhibit a multi-level structure and strong contextual correlation. Contextual features include environmental situational context and task context. Environmental situational context represents machine system data, environmental data, and situational data closely related to the task context, and is the external factor causing the shift in human intention. Task context represents the task process performed by the human, and is related to the task procedure and the human's abilities and preferences. It can be considered the internal factor for the shift in human intention, providing a priori rules for inferring human intention. For multi-level intention recognition in complex tasks, both behavioral features and contextual features are indispensable, both providing the necessary information for intention recognition.
[0005] In existing technologies, task analysis methods suitable for complex tasks include Hierarchical Task Analysis (HTA), Goal-Directed Task Analysis (GDTA), Cognitive Task Analysis (CTA), and Hierarchical Goal Analysis (HGA). HTA (Stanton N A. Hierarchical task analysis: Developments, applications, and extensions[J]. Applied ergonomics, 2006, 37(1):55-79.) decomposes complex tasks into increasingly smaller sub-tasks using a hierarchical structure diagram, thus making it suitable for task hierarchy analysis of a wide range of tasks. GDTA (Nasser-dine A, A, Lapalme J. Does explicit categorization taxonomy facilitate performing goal-directed task analysis? [J]. IEEE Transactions on Human-Machine Systems, 2021, 51(3): 177-187.) Based on Endsley's three-level situational awareness theory, this method aims to decompose the target into sub-targets starting from the target task, and then sort out the situational awareness variables under each sub-target. This method focuses on sorting out the situational awareness needs of humans in the process of performing complex tasks. CTA (Lercel D, Andrews DH. Cognitive task analysis of unmanned aircraft system pilots[J]. The International Journal of Aerospace Psychology, 2021, 31(4): 319-342.) emphasizes starting from the human cognitive level, decomposing cognitive tasks into low-level task units and analyzing the cognitive needs of each low-level task unit. Compared to HTA, HGA (Kobierski B. Hierarchical Goal Analysis and performance modelling for the control of multiple UAVs / UCAVs from an airborne platform[R]. DRDC Technical Report CR 2004-063(DRDC Toronto), 2004.), based on perception control theory, emphasizes the hierarchical structure of task objectives rather than breaking down the task itself. Furthermore, HGA can decompose task objectives without assigning operators, so the overall hierarchical structure of task objectives does not need to be modified even if operator roles change. In addition to the aforementioned top-down task analysis methods, some studies have used Operational Sequence Diagrams (OSDs) to conduct task analysis from a bottom-up perspective, outlining the operator's operational sequence throughout the complex task process.
[0006] The input features for multi-level intent recognition mainly come from two aspects: relevant human behavioral features and contextual features related to the task environment. Regarding behavioral features, existing research primarily focuses on extracting multimodal human behavioral features. Currently, many researchers have used eye information collection (Lukander K, Toivanen M, ...). K. Inferring intent and action from gaze in naturalistic behavior: A review [J]. International Journal of Mobile Human Computer Interaction (IJMHCI), 2017, 9(4): 41-57., and hand interaction actions (Wei D, Chen L, Zhao L, et al. A vision-based measure of environmental effects on inferring human intention during human robot interaction [J]. IEEE Sensors Journal, 2021, 22(5): 4246-4256.), etc., are used to infer intent from human behavioral information. In addition to hand-eye coordination, some researchers use physiological parameters such as electroencephalography (EEG) (Wenbo Huang FA, Changyuan Wang SB, Hongbo Jia TC. Ergonomics analysis based on intentioninference[J]. Journal of Intelligent & Fuzzy Systems, 2021, 41(1): 1281-1296.), electromyography (EMG) (Feleke AG, Bi L, Fei W. EMG-based 3D hand motorintention prediction for information transfer from human to robot[J]. Sensors, 2021, 21(4): 1316.), and heart rate (Wang H, Pan T, Si H, et al. Research on influencing factor selection of pilot's intention[J]. International Journal of Aerospace Engineering, 2020, 2020(1): 4294538.) to assist in intention recognition. Regarding the characteristics of the mission environment context, existing research mainly focuses on environmental situational context and mission context. Environmental situational context represents environmental data and situational data that are closely related to the mission context, and is an external factor that causes the shift in operator intent.Some researchers (Xia J, Chen M, Fang W. Air combat intention recognition with incomplete information based on decision tree and GRU network[J]. Entropy, 2023, 25(4): 671.) have incorporated machine system data as a type of situational context. For example, in the process of recognizing pilot intentions, machine system information such as aircraft altitude, speed, pitch angle, and heading are also included as an important situational context feature in the recognition model. Task context represents the task process performed by the operator and is related to the task procedure and the person's abilities and preferences. It can be considered as the intrinsic factor of human intention transfer, providing a priori rules for inferring human intentions. Overall, for multi-level intention recognition of complex tasks, behavioral features and contextual features are indispensable, both of which can provide the necessary information for intention recognition.
[0007] Based on the interpretability of the model, existing intent recognition algorithms can be divided into four types: expert knowledge-based, deep learning-based, machine learning-based, and probability theory-based. Expert knowledge-based intent recognition methods typically employ expert-defined rules and template libraries, using methods such as maximum similarity and D-S evidence theory to infer the degree of matching between feature information and intent templates to determine intent (Tang Z, Li S, Dou Y. Research on Pilot's Intention Reasoning Method Based on D–S Evidence Theory[C] / / Man–Machine–Environment System Engineering: Proceedings of the 17th International Conference on MMESE 17. SpringerSingapore,2018:23-31.). These methods possess complete interpretability, but require explicit organization, abstraction, and description of domain expert experience, making knowledge acquisition and representation difficult, and lacking flexibility to adapt to changing environments. Deep learning-based intent recognition methods rely purely on feature data-driven approaches, aiming to train neural networks with intent classification capabilities. Because human intent is context-dependent, researchers widely utilize recurrent neural network (RNN) methods with temporal processing capabilities, including Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bi-directional Long Short-Term Memory (Bi-LSTM). With the improvement of general computing power, deep learning methods often achieve high intent recognition accuracy, but they are prone to overfitting, struggle to effectively incorporate domain expert knowledge, and lack interpretability. Machine learning-based intent recognition methods are another type of approach that relies purely on feature data, aiming to use typical machine learning classifiers to classify intent. Common classifiers include Support Vector Machine (SVM) and Random Forest (RF). These methods are simpler and consume fewer computational resources than deep learning neural network models, but they often struggle with temporal data and large-scale datasets, and similarly lack interpretability.Intent recognition methods based on probability theory utilize Bayesian theory to conduct inference about uncertain intentions. Examples include Hidden Markov Models (HMMs) and Dynamic Bayesian Networks (DBNs). These methods belong to probabilistic graphical models that use Directed Acyclic Graphs (DAGs) to describe the causal relationships between variables. The network structure of these models can often be defined through expert modeling, and the network parameters can be calculated using a data-driven approach, thereby achieving intent recognition that incorporates expert knowledge and data-driven thinking.
[0008] The challenge remains to be solved in accurately identifying multi-level human-computer interaction intentions in complex task scenarios. Summary of the Invention
[0009] This application provides a multi-level human-computer interaction intent recognition method for complex task scenarios, the technical purpose of which is to accurately identify multi-level human-computer interaction intents in complex task scenarios.
[0010] The above-mentioned technical objective of this application is achieved through the following technical solution:
[0011] A multi-level human-computer interaction intent recognition method for complex task scenarios includes:
[0012] Step S1: Perform task objective analysis on complex tasks and extract feature categories of multi-level intentions;
[0013] Step S2: Conduct experiments in a simulation task environment and collect multi-level intent feature datasets of human behavior and contextual situation information according to feature categories;
[0014] Step S3: Train and test the intent recognition model using a multi-level intent feature dataset, and use the trained intent recognition model to identify intents from low to high levels layer by layer from the bottom feature data, thereby achieving multi-level human-computer interaction intent recognition.
[0015] Further, step S1 includes:
[0016] The hierarchical goal analysis decomposes the goals of complex tasks layer by layer, and correlates the hierarchical task goals with the intentions of people at multiple levels, thus obtaining the hierarchical goal analysis results. The operation sequence diagram is used to sort out the operation sequence of people in the process of complex tasks, and then the underlying features related to the multi-level intentions are sorted out based on the operation sequence, thus obtaining the operation sequence diagram analysis results.
[0017] Based on the hierarchical goal analysis results, the intentions of high-level tasks and the interaction intentions of low-level tasks are extracted;
[0018] Based on the hierarchical target analysis results and operation sequence diagram analysis results, relevant features of high-level task intent and low-level task interaction intent are extracted to obtain feature categories. Among them, the relevant features of high-level task intent and low-level task interaction intent include behavioral features and contextual features. The behavioral features include behavioral data, which includes hand behavior data, eye behavior data, and human physiological data. The contextual features include system data and situational data. The system data includes local system data, local motion data, and local alarm data. The situational data includes geographic situational data, spatiotemporal situational data, and environmental situational data.
[0019] Further, step S2 includes:
[0020] Complex task scenarios are built using a 3D engine;
[0021] Based on the feature categories, multi-level intent feature data in complex task scenarios are extracted to obtain a multi-level intent feature dataset.
[0022] Further, step S3 includes:
[0023] Step S31: Perform a first preprocessing on the multi-level intent feature dataset to obtain the first preprocessed data, and divide the first preprocessed data into a first training set and a first test set; train the low-level interactive intent recognition model using the first training set, and test the trained low-level interactive intent recognition model using the first test set, until the final low-level interactive intent recognition model is obtained.
[0024] Step S32: Perform a second preprocessing on the multi-level intent feature dataset to obtain the second preprocessed data, and divide the second preprocessed data into a second training set and a second test set; train the high-level task intent recognition model using the second training set, and test the trained high-level task intent recognition model using the second test set, until the final high-level task intent recognition model is obtained.
[0025] Step S33: Input the recognition result of the final low-level interaction intent recognition model into the final high-level task intent recognition model for further intent recognition, so as to realize multi-level human-computer interaction intent recognition.
[0026] Further, in step S31, the multi-level intent feature dataset undergoes a first preprocessing to obtain preprocessed data, including:
[0027] One-Hot encoding is used on the discrete data in the multi-level intent feature dataset to convert the categorical variables in the discrete data into binary variables, thus obtaining the first feature data;
[0028] The continuous data in the multi-level intent feature dataset is normalized to unify the dimensions of the continuous data, thus obtaining the second feature data.
[0029] A time sliding window is used to divide the first feature data and the second feature data into continuous subsequences of fixed size to obtain the first preprocessed data.
[0030] Further, in step S32, the multi-level intent feature dataset undergoes a second preprocessing to obtain preprocessed data, including:
[0031] The data in the multi-level intent feature dataset is discretized and divided into time windows to obtain the second preprocessed data.
[0032] Furthermore, the low-level interactive intent recognition model includes a 1D-CNN module and a Bi-LSTM module, and the high-level task intent recognition model includes a dynamic Bayesian network structure.
[0033] The beneficial effects of this application are as follows:
[0034] (1) The multi-level human-computer interaction intent recognition method proposed in this application for complex task scenarios proposes a multi-level intent recognition framework applicable to complex task scenarios. It combines task analysis methods and machine learning methods, possessing both sensitivity to human behavior and contextual information, and interpretability. Its input data consists of situational data that is easily captured by human and machine sensors, which has the advantages of not interfering with human work processes and operability. By simultaneously integrating human behavior, environmental and task context features, it identifies multi-level human intents layer by layer from the bottom data, achieving a more comprehensive understanding of the true human intent by the system, and effectively solving the limitations of existing methods that perform intent recognition from a single level and only based on human behavioral characteristics.
[0035] (2) This method can guarantee the accuracy of multi-level intent recognition for more categories, while achieving interpretable modeling of multi-level intents. In the example scenario of a helicopter-UAV wildfire exploration mission, the method described in this application achieves an interaction intent recognition accuracy of 93.96% for 16 types of interaction intents and a task intent recognition accuracy of 97.12% for 5 types of task intents, ensuring the accuracy requirements of intent recognition. In addition, the "context-intent-behavior" dynamic Bayesian network in the method described in this application completes the interpretable modeling of multi-level intents, which is consistent with the intent generation mechanism in cognitive psychology theory, that is: when a person performs a complex task, he perceives the task situation context, generates an intent through cognitive processing in the brain, and the intent drives the person to perform a series of operations and ultimately drives the helicopter's state change.
[0036] (3) The multi-level intent recognition framework of the method described in this application has universality. Based on the basic framework of complex task context analysis, low-level feature data collection and multi-level intent recognition, the task analysis method and the specific algorithm model used for intent recognition are replaceable. In real-world applications, the corresponding method can be selected in combination with the characteristics of each level of intent to achieve the best recognition effect. Attached Figure Description
[0037] Figure 1 This is a framework diagram of a multi-level human-computer interaction intent recognition method for complex task scenarios in the embodiments of this application;
[0038] Figure 2 This is a multi-level intent tree for an example of a helicopter-drone wildfire exploration mission in this application embodiment;
[0039] Figure 3 This is a schematic diagram of multi-level intent recognition related features in the embodiments of this application;
[0040] Figure 4 This is a framework diagram of a low-level interactive intent recognition model based on 1D-CNN+Bi-LSTM in the embodiments of this application;
[0041] Figure 5 This is a framework diagram of a high-level task intent recognition model based on dynamic Bayesian networks in the embodiments of this application. Detailed Implementation
[0042] The technical solution of this application will be described in detail below with reference to the accompanying drawings.
[0043] like Figure 1 As shown, the multi-level human-computer interaction intent recognition method for complex task scenarios described in this application includes:
[0044] Step S1: Perform task objective analysis on complex tasks and extract feature categories of multi-level intentions.
[0045] Complex task context analysis is the first step in multi-level intent recognition in complex task human-computer interaction. Its purpose is to clarify the standard operating procedures for operators to perform complex tasks, and then analyze the multi-level intents and intent-related features of humans, providing a foundation for subsequent simulation task data acquisition and multi-level intent recognition models. Complex task context analysis includes three parts: complex task analysis, extraction of multi-level intents, and extraction of intent-related features.
[0046] To address the correlation between human intent and complex task objectives, this application employs a combination of Hierarchical Goal Analysis (HGA) and Operational Sequence Diagrams (OSDs) to conduct overall task analysis of complex tasks, parsing the task into a three-layer structure of "task intent - interaction intent - feature data." This application uses a helicopter-UAV wildfire exploration mission as an example scenario for a complex task, and the specific process is described below:
[0047] First, hierarchical goal analysis is adopted to start from the overall task root goal of the complex system, which is "wildfire exploration" in this example. The task goal is decomposed layer by layer from top to bottom, and each goal is decomposed layer by layer. The hierarchical task goals are correlated with the intentions of people at multiple levels, and controlled variables are added for each type of goal.
[0048] Next, an operation sequence diagram is used to organize the sequence of human operations throughout the task process, supplementing the task analysis details from a top-down perspective, and identifying multi-level intent-related underlying features based on the human operation sequence. The operation sequence diagram is used to organize the logical sequence of the task along the timeline, describing the task status of each operator through symbols such as information transmission, information reception, operation, inspection, and decision.
[0049] Finally, human factors engineering experts and systems domain experts jointly compiled the analysis results of hierarchical target analysis and operation sequence diagrams, and sorted out the multi-level intents and intent-related feature classifications.
[0050] Multi-level intents are extracted from the hierarchical goal analysis results. This invention simplifies and decomposes human intent into two layers: high-level task intent and low-level task interaction intent. Taking a helicopter-drone wildfire exploration mission as an example of a complex task, a schematic diagram of the extracted multi-level intents is shown below. Figure 2 As shown.
[0051] The feature classification of intent-related features, derived from the analysis results of hierarchical target analysis and operation sequence diagram analysis, can be broadly categorized into two types: human behavioral features and contextual features. Human behavioral features represent information related to hand, eye, and other interaction modalities during task execution. Contextual features represent human-machine system information and task environment information during complex tasks; this information is often represented through user interface display elements, specifically including system data and situational data. Taking a helicopter-UAV wildfire exploration mission as an example of a complex task scenario, the extracted intent-related features are as follows: Figure 3As shown. Human behavioral characteristics include operator hand behavior data, eye behavior data, and physiological data during the task. Contextual characteristics include system data such as local system data, local motion data, and local alarm data during the task, as well as situational data such as geographic situational information, spatiotemporal situational information, and environmental situational information.
[0052] Step S2: Conduct experiments in a simulation task environment and collect multi-level intent feature datasets of human behavior and contextual situation information according to feature categories.
[0053] Low-level feature data collection forms the data foundation for multi-level intent recognition in complex task human-computer interaction. Its purpose is to reproduce the task scenario in a laboratory environment based on the aforementioned complex task analysis, and to gather participants to collect intent-related feature data from operators during task execution, establishing a multi-level intent feature dataset to provide data samples for subsequent algorithm research. Taking a helicopter-drone wildfire exploration mission as an example of a complex task, the low-level data collection process is as follows:
[0054] First, the complex task scenario of helicopter-drone wildfire exploration is built and reproduced using 3D engines such as Unity3D. Then, the engine's built-in C# scripts are used to collect human hand behavior data, helicopter system data, motion data and situational data at a fixed frequency. An external eye tracker is used to collect human eye behavior data at a fixed frequency.
[0055] Next, the preparation phase for the underlying feature data acquisition experiment will begin. The experimenter will introduce the experimental procedure to the participants beforehand. Afterward, participants will repeat the training at least 3 times. Only after the participants independently complete the entire task process under the supervision of the experimenter and verbally report that they have fully mastered the operation and task process can the formal experiment begin.
[0056] Then, the formal experimental phase began. The formal experiment consisted of three parts: eye-tracking calibration, simulated flight tasks, and intent labeling. First, participants underwent eye-tracking calibration and started screen recording before the experiment. Next, participants conducted a helicopter-drone collaborative wildfire reconnaissance mission in a specific area. Participants were required to follow the basic task procedures, but the specific details and timing of task implementation, such as planning the drone reconnaissance area, selecting the drone, judging the reconnaissance results, and rescuing trapped personnel, were determined by the participants themselves. Finally, after completing the reconnaissance mission in the entire area, participants were required to immediately review and record the screen, and use key bindings to mark the start and end points of high-level task intents and low-level task interaction intents.
[0057] Finally, the experimental data was processed, and the experimental data of all subjects were summarized. After data cleaning processes such as aligning timestamps, merging data, and filling in missing values, the data was finally summarized into a structured dataset.
[0058] Step S3: Train and test the intent recognition model using a multi-level intent feature dataset, and use the trained intent recognition model to identify intents from low to high levels layer by layer from the bottom feature data, thereby achieving multi-level human-computer interaction intent recognition.
[0059] Specifically, step S3 includes:
[0060] Step S31: Perform a first preprocessing on the multi-level intent feature dataset to obtain the first preprocessed data, and divide the first preprocessed data into a first training set and a first test set; train the low-level interactive intent recognition model using the first training set, and test the trained low-level interactive intent recognition model using the first test set, until the final low-level interactive intent recognition model is obtained.
[0061] The low-level interactive intent recognition model employs a 1DCNN+Bi-LSTM deep learning neural network structure to achieve the most accurate multi-class interactive intent classification results. Its framework diagram is shown below. Figure 4 As shown.
[0062] First, preprocessing is performed on the input data. Discrete data in the structured multi-level intent feature dataset collected in the experiment are encoded using One-Hot encoding, while continuous data undergoes normalization to improve the convergence speed and training efficiency of the neural network. One-Hot encoding transforms categorical variables into binary variables; each unique value of a given categorical variable is represented as a vector of 0s and 1s, with only one position set to 1 and the rest to 0. Normalization transforms feature variables of different dimensions and magnitudes to the same scale between 0 and 1. In this application, Min-Max Normalization is used to unify the dimensions of continuous variables. Finally, the preprocessed feature data is divided into fixed-size continuous subsequences using a time sliding window method, representing preceding continuous temporal information segments that represent the operator's intent.
[0063] Next, the structure of the low-level interactive intent recognition model was determined. To reduce the number of model parameters and the model complexity, the input data needs to be processed by convolution through a computationally efficient unidirectional 1D-CNN layer. To achieve intent recognition of temporal data and satisfy the bidirectional dependency between temporal data, a Bi-LSTM network is selected as the main part of the model. Based on the above analysis, the 1D-CNN module and the Bi-LSTM module are integrated into the low-level interactive intent recognition model, and the forward propagation process of the model is determined as follows: First, the feature data is input into a multi-layer 1D-CNN network, and local features of various feature variables in the time dimension are extracted through convolution operations, and gradient vanishing is avoided by using the ReLU non-linear activation function; then, it is input into a multi-layer Bi-LSTM network to capture the correlation between the time series of feature data and interactive intent from both forward and backward perspectives; finally, the input dimension is mapped to the output dimension through a linear layer, and the output data is normalized into a probability distribution of intent categories from 0 to 1 through a Softmax layer.
[0064] Finally, the low-level interactive intent recognition model is trained and validated. The basic process is as follows: First, the structured data undergoes the preprocessing described above to obtain time series of feature data. 80% of the time series is used as the training set, and 20% as the test set. Then, the model is trained using the training set. Finally, the test set is input into the trained model to evaluate its performance.
[0065] Step S32: Perform a second preprocessing on the multi-level intent feature dataset to obtain the second preprocessed data, and divide the second preprocessed data into a second training set and a second test set; train the high-level task intent recognition model using the second training set, and test the trained high-level task intent recognition model using the second test set, until the final high-level task intent recognition model is obtained.
[0066] The high-level task intent recognition model employs a dynamic Bayesian network structure. This application establishes a causal model of "context-intent-behavior" to represent the causal relationship between human intent and context / behavior, thereby achieving interpretable modeling of intent. Furthermore, this application inputs the recognition results of the low-level interactive intent recognition model into the dynamic Bayesian network to enhance the overall accuracy of task intent recognition. The implementation approach is as follows: Figure 5 As shown.
[0067] First, the network structure of the dynamic Bayesian network is determined. This application models intent based on the influence relationship between context, intent, and behavior. The Bayesian network under a single time slice mainly includes three types of nodes: context, intent, and behavior. Context nodes include UAV information-related states, local information-related states, and task information-related states, etc., and the attribute values of these nodes are obtained directly through observation. Intent nodes include task intent and interaction intent. Task intent is defined as implicit, and interaction intent is driven by the recognition results of a low-level interaction intent recognition model. Behavior nodes include human hand behavior and eye behavior, specifically including key information and AOI information of eye gaze during Hotas operation. These nodes are also defined as observable nodes. In addition, in the embodiments of this application, it is assumed that all nodes conform to the first-order hidden Markov condition.
[0068] Next, the high-level task intent recognition model is trained and validated, including three steps: data processing, parameter learning, and intent inference. First, due to technical limitations, Dynamic Bayesian Networks (DBNs) only support discrete inputs. Therefore, the structured dataset is discretized and divided into time windows, with 80% of the time-series dataset used as the training set and the remaining 20% as the test set. Then, the feature data from the training set is assigned to each DBN node, and parameter learning is performed within the specified time window, calculating the conditional probability distribution of each node in the training set. Finally, the temporal feature states are input to observation nodes such as the situation nodes and behavior nodes, and the low-level interactive intent recognition results are input to the interactive intent nodes, thereby inferring the task intent at the current moment and evaluating the accuracy of task intent recognition.
[0069] Step S33: Input the recognition result of the final low-level interaction intent recognition model into the final high-level task intent recognition model for further intent recognition, so as to realize multi-level human-computer interaction intent recognition.
[0070] The above are exemplary embodiments of this application, and the scope of protection of this application is defined by the claims and their equivalents.
Claims
1. A multi-level human-computer interaction intent recognition method for complex task scenarios, characterized in that, include: Step S1: Perform task objective analysis on complex tasks and extract feature categories of multi-level intentions; Step S2: Conduct experiments in a simulation task environment and collect multi-level intent feature datasets of human behavior and contextual situation information according to feature categories; Step S3: Train and test the intent recognition model using a multi-level intent feature dataset, and use the trained intent recognition model to identify intents from low to high levels layer by layer from the bottom feature data, thereby achieving multi-level human-computer interaction intent recognition. Step S1 includes: The hierarchical goal analysis decomposes the goals of complex tasks layer by layer, and correlates the hierarchical task goals with the intentions of people at multiple levels, thus obtaining the hierarchical goal analysis results. The operation sequence diagram is used to sort out the operation sequence of people in the process of complex tasks, and then the underlying features related to the multi-level intentions are sorted out based on the operation sequence, thus obtaining the operation sequence diagram analysis results. Based on the hierarchical goal analysis results, the intentions of high-level tasks and the interaction intentions of low-level tasks are extracted; Based on the hierarchical target analysis results and operation sequence diagram analysis results, relevant features of high-level task intentions and low-level task interaction intentions are extracted to obtain feature categories. The relevant features of high-level task intentions and low-level task interaction intentions include behavioral features and contextual features. Behavioral features include behavioral data, which includes hand behavior data, eye behavior data, and human physiological data. Contextual features include system data and situational data. System data includes local system data, local motion data, and local alarm data. Situational data includes geographic situational data, spatiotemporal situational data, and environmental situational data. Step S3 includes: Step S31: Perform a first preprocessing on the multi-level intent feature dataset to obtain the first preprocessed data, and divide the first preprocessed data into a first training set and a first test set; train the low-level interactive intent recognition model using the first training set, and test the trained low-level interactive intent recognition model using the first test set, until the final low-level interactive intent recognition model is obtained. Step S32: Perform a second preprocessing on the multi-level intent feature dataset to obtain the second preprocessed data, and divide the second preprocessed data into a second training set and a second test set; train the high-level task intent recognition model using the second training set, and test the trained high-level task intent recognition model using the second test set, until the final high-level task intent recognition model is obtained. Step S33: Input the recognition result of the final low-level interaction intent recognition model into the final high-level task intent recognition model for further intent recognition, so as to realize multi-level human-computer interaction intent recognition.
2. The multi-level human-computer interaction intent recognition method as described in claim 1, characterized in that, Step S2 includes: Complex task scenarios are built using a 3D engine; Based on the feature categories, multi-level intent feature data in complex task scenarios are extracted to obtain a multi-level intent feature dataset.
3. The multi-level human-computer interaction intent recognition method as described in claim 2, characterized in that, In step S31, the multi-level intent feature dataset undergoes a first preprocessing to obtain preprocessed data, including: One-Hot encoding is used on the discrete data in the multi-level intent feature dataset to convert the categorical variables in the discrete data into binary variables, thus obtaining the first feature data; The continuous data in the multi-level intent feature dataset is normalized to unify the dimensions of the continuous data, thus obtaining the second feature data. A time sliding window is used to divide the first feature data and the second feature data into continuous subsequences of fixed size to obtain the first preprocessed data.
4. The multi-level human-computer interaction intent recognition method as described in claim 3, characterized in that, In step S32, the multi-level intent feature dataset undergoes a second preprocessing to obtain preprocessed data, including: The data in the multi-level intent feature dataset is discretized and divided into time windows to obtain the second preprocessed data.
5. The multi-level human-computer interaction intent recognition method as described in claim 4, characterized in that, The low-level interactive intent recognition model includes a 1D-CNN module and a Bi-LSTM module, while the high-level task intent recognition model includes a dynamic Bayesian network structure.