Safe flying agent system for executing complex language instructions in high dynamic environments

By employing a high-low layer structure and security function modules in the embodied intelligence system, the problems of task latency and security in highly dynamic environments are solved, enabling the intelligent agent to perform tasks efficiently and securely.

CN119992387BActive Publication Date: 2026-06-16SHANGHAI JIAOTONG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI JIAOTONG UNIV
Filing Date
2025-01-22
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing embodied intelligence systems suffer from latency issues when executing complex user commands in highly dynamic environments, and lack protection for environmental security and privacy.

Method used

The system employs a high-low layer structure for perception, task management, and planning modules, combined with a high-performance large model and a low-frame-rate perception module, and integrates a security function module for privacy and collision detection, ensuring that the intelligent agent can perform tasks efficiently in highly dynamic environments.

🎯Benefits of technology

It achieves efficient task execution in highly dynamic environments, ensures security and privacy protection within the environment, and avoids collisions and potential risks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119992387B_ABST
    Figure CN119992387B_ABST
Patent Text Reader

Abstract

The application provides a safe flight intelligent agent system for executing complex language instructions in a high dynamic environment. The system includes a perception module based on a visual language model, a safety module, a planning module based on a large language model, and a UAV control interface. In the system running phase, the intelligent agent accepts commands from the user, the planning module splits the task and calls the corresponding task memory and code generator to generate the corresponding executable code, calls the perception interface to obtain the task-related perception information, and finally executes the corresponding code to complete the user's instruction after checking by the safety check module. The system combines high-performance low-frame-rate large models in each module, high-frame-rate traditional methods, and safety detection modules related to privacy, potential risks, and collisions to achieve safe and efficient task execution in a high dynamic environment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of embodied intelligent agents and aircraft systems, and more specifically, to a safe flight intelligent agent system that executes complex language commands in a highly dynamic environment. Background Technology

[0002] Embodied intelligence refers to the intelligent behavior exhibited by intelligent entities through their physical form (such as robots, virtual avatars, etc.) interacting with their environment. This intelligence not only originates from computation and algorithms but also encompasses multi-dimensional capabilities such as perception, movement, and environmental interaction. Embodied intelligence particularly emphasizes the following core elements: Interaction between body and environment: Intelligent entities interact and respond to their surroundings in real time through their sensory and motor systems. For example, robots use cameras, sensors, and other devices to perceive their environment and perform actions through motors, joints, etc. The combination of perception and action enables embodied intelligent entities to transform perceptual information (including vision, hearing, touch, etc.) into concrete actions, thereby completing complex tasks. This fusion of perception and action is a key foundation for achieving intelligent behavior. Adaptability and learning ability: Embodied intelligent entities possess the ability to adapt to changes in their environment and can continuously optimize their behavior and decision-making through learning and experience. For example, robots can improve their path planning and task execution through repeated trials.

[0003] Embodied intelligent entities operate in the physical world, therefore physical constraints and resource limitations must be considered, such as energy consumption, computing power, and mechanical structure. The research goal of embodied intelligence is to develop intelligent entities that can autonomously, flexibly, and efficiently perform tasks in physical environments, which has broad application prospects in fields such as robotics, autonomous driving, virtual reality, and augmented reality.

[0004] While some methods and systems for embodied intelligence based on aircraft have been proposed, the real-time limitations of large models mean that the ability of intelligent agents to execute complex user commands in highly dynamic environments remains severely lacking. Furthermore, current embodied intelligence systems rarely consider safety assurance in interactive environments.

[0005] Currently, the long inference time and communication speed limitations of large models severely restrict the capabilities of embodied intelligence in highly dynamic scenarios. Due to the massive number of parameters in high-performance models, inference time is difficult to control effectively on existing hardware platforms. For aircraft, lightweight edge computing platforms are insufficient for deploying models with such large parameters, typically requiring remote cloud deployment and communication with the aircraft. However, perception tasks require transmitting high-resolution images to the model for computation; the model's inference time, coupled with communication latency, further exacerbates system latency. High latency drastically reduces the agent's task performance. Ensuring aircraft performance in highly dynamic environments is a critical but poorly addressed issue. Furthermore, as a highly maneuverable agent, ensuring the privacy and physical security of objects in the environment during task execution is an important but under-considered problem. Summary of the Invention

[0006] To address the shortcomings of existing technologies, the purpose of this invention is to provide a safe flight intelligent agent system that executes complex language commands in highly dynamic environments.

[0007] According to one aspect of the present invention, a safe flight intelligent agent system for executing complex language commands in a highly dynamic environment is provided, comprising:

[0008] The perception function module adopts a high-low layer structure. The high-level structure is a detection visual language model installed on a cloud platform, and the low-level structure is a perception module installed on the flight agent. The detection visual language model combines the information from the perception module and the observation signals collected by the flight agent to perform open set detection. The perception module performs closed set detection based on the observation signals and the information from the detection visual language model.

[0009] The task manager, installed on a cloud platform, includes a task decomposer, a state manager, and a memory module. The task decomposer receives user instructions, performs semantic understanding, and splits tasks. The state manager manages task status, and the memory module stores process variables, task completion status, and user interaction information.

[0010] The planning function module adopts a high-low layer structure. The high-level structure is a large language model planner installed on the cloud platform, and the low-level structure is a planning module installed on the flight intelligent agent. The large language model planner generates execution code based on the task decomposition results of the task manager, the information of the detection visual language model, and the information of the planning module. The planning module obtains the planned path based on the task decomposition results of the task manager, the information of the perception module, and the execution code of the large language model planner.

[0011] Preferably, the task decomposer receives commands from the user and calls the task memory in the memory module to split the task, specifically as follows:

[0012] T t =f decomposition (I t M)={i 1,t i 2,t ,.,i n,t}

[0013] Among them I t This represents the user instruction received by the agent at time t, which serves as the instruction input to the task decomposer. M represents the memories generated by the agent during operation, including historical perception results and historical task completion status; f decomposition The task decomposer model function representing the agent; i n,t This represents the executable subtasks into which user commands are broken down, including: search, follow, navigation, patrol, and direct movement of the drone; T t This represents the set of user instructions split by the agent at time t.

[0014] Preferably, the large language model planner generates execution code based on the task decomposition results of the task manager, the information of the detected visual language model, and the information of the planning module, specifically as follows:

[0015] C n,t =f planner (i n,t ,P,L,M)

[0016] S n,t =f safe_execute (C n,t )

[0017] Among them, i n,t Described as the nth executable subtask of the agent at time t, f planner This is a code generator for executable subtasks based on a large language model; P represents the prompts corresponding to the large language model code generator for the executable task, and L is the set of callable control and perception interfaces in the code generator, including the visual language model interface and related interfaces for UAV control. The input of the visual language model interface is the information detected by the visual language model and the target of the planning module, and the output is the target position provided to the large language model planner; the input of the related interfaces for UAV control is the target point of different tasks of the planning module or the ID of the object to be tracked, and the output is the execution code of the intelligent aircraft; C n,t The executable Python code generated for the nth subtask at time t;

[0018] Among them, fsafe_execute This is a safe execution module for Python code. It executes the code of a subtask and returns a boolean value S. n,t When it is True, it means that the code has no syntax problems and can be executed; otherwise, it means that the generated code has syntax problems and needs to be rewritten.

[0019] Preferably, the planning module includes:

[0020] The obstacle avoidance planner takes the moving target point and the information from the perception module as input, and outputs the obstacle avoidance path planned in real time.

[0021] The tracking planner takes the ID of the object to be tracked and the information from the perception module as input, and outputs a path that can avoid obstacles in real time and follow the target object.

[0022] Preferably, it also includes a safety function module, which adopts a high-low layer structure. The high layer structure is a safety visual language model installed on a cloud platform, and the low layer structure is a collision detector installed on the flying agent. The safety visual language model performs privacy and security detection and potential security detection based on the observation signals collected by the flying agent. The collision detector performs collision detection based on the observation signals collected by the flying agent and the planned path of the planning function module.

[0023] Preferably, the privacy and security detection specifically includes:

[0024] pl t ,pr t =f privacy_check (O t )

[0025] Among them, f privacy_check It is a privacy detection module, based on a fine-tuned visual language model.

[0026] O t It refers to the observation signals collected by the flying intelligent agent at time t and the state of the UAV, pr t and rl t This refers to the privacy level and the reason for the privacy score.

[0027] The potential security detection specifically includes:

[0028] rl t ,rr t= f risk_check (O t )

[0029] Among them, f risk_check It is a potential risk detection module, rl t ,rr tThese are the potential risk score and the reasons for evaluating the potential risk.

[0030] Preferably, the collision detection specifically includes:

[0031] Co t =f collision_check (O t p t )

[0032] f collision_check It is a collision detection module, whose input is the observation signal O at time t. t And the path planning information p at time t t The collision detection function uses the Kalman filter method to predict the position of dynamic objects in the environment and perform collision detection with the planned trajectory.

[0033] Preferably, the security function module performs security assessments and security processing based on privacy detection results, potential security detection results, and collision detection results.

[0034] Preferably, the security assessment specifically includes:

[0035] pl t and rl t pr is an integer from 1 to 4. t With rr t The reason for the rating is in string format;

[0036] 1 indicates no security threat, 2 indicates a minor security threat but within a controllable range, 3 indicates a moderate security threat requiring the user to be informed to terminate the task, and 4 indicates a serious security threat.

[0037] Co t Let p be a Boolean value representing the collision detection result at time t; when this variable is True, it indicates that the UAV is based on p at that time. t and observed signal O t A collision risk has been detected; otherwise, there is no collision risk.

[0038] Preferably, the security process includes:

[0039] When the privacy and potential risk assessment is 3, the security function module directly sends a warning message to the task manager, which then sends it to the user and terminates the task.

[0040] When the risk assessment is 4, the safety processing module controls the flight agent to terminate the mission and return to the previous safe point.

[0041] For collision detection, when the safety function module detects a collision, the collision detector directly notifies the flight agent to immediately perform emergency braking to avoid the collision, and at the same time communicates with the task manager to inform the user.

[0042] Compared with the prior art, the embodiments of the present invention have at least one of the following beneficial effects:

[0043] The safe flight intelligent agent system for executing complex language commands in a highly dynamic environment, according to embodiments of the present invention, combines a high-performance, low-frame-rate big oracle model in each module with high-frame-rate perception and planning modules, and can ensure efficient task execution in a highly dynamic environment.

[0044] The safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to this invention addresses the high latency issue by designing a high-low layer structure for each functional module. The higher layer ensures the performance of the intelligent agent, while the lower layer ensures the rapid response of the intelligent agent to the environment.

[0045] The safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to embodiments of the present invention integrates a large model and a memory module in the task manager and planning function module, respectively, to achieve better environmental understanding and task planning performance.

[0046] The safe flight intelligent agent system for executing complex language commands in a high-dynamic environment according to embodiments of the present invention addresses the safety issues of aircraft by designing a safety function module that integrates a large visual language model to assess the current safety of the UAV, thereby ensuring the safety of people and property in the environment. Attached Figure Description

[0047] Other features, objects, and advantages of the present invention will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:

[0048] Figure 1 This is an embodiment of the intelligent agent framework based on a language model and a visual language model according to the present invention;

[0049] Figure 2 This is a planning module framework based on a large language model, according to an embodiment of the present invention.

[0050] Figure 3 This is a security function module framework based on a visual language model, according to an embodiment of the present invention. Detailed Implementation

[0051] The present invention will now be described in detail with reference to specific embodiments. These embodiments will help those skilled in the art to further understand the present invention, but do not limit the invention in any way. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention. These all fall within the scope of protection of the present invention.

[0052] In one embodiment of the present invention, a safe flight intelligent agent system for executing complex language commands in a highly dynamic environment is provided, such as... Figure 1 As shown, it mainly includes:

[0053] The perception module adopts a high-low structure. The high-level structure is the detection visual language model installed on the cloud platform, and the low-level structure is the perception module installed on the flight agent. The detection visual language model combines the information from the perception module and the observation signals collected by the flight agent to perform open set detection. The perception module performs closed set detection based on the observation signals and the information from the detection visual language model.

[0054] The task manager, installed on the cloud platform, includes a task decomposer, a state manager, and modules. The task decomposer receives user commands, performs semantic understanding and task splitting, the state manager manages task status, and the memory module stores process variables, task completion status, and user interaction information.

[0055] The planning function module adopts a high-low structure. The high-level structure is the large language model planner installed on the cloud platform, and the low-level structure is the planning module installed on the flight intelligent agent. The large language model planner generates execution code based on the task decomposition results of the task manager, the information of the detection visual language model, and the information of the planning module. The planning module obtains the planning path based on the task decomposition results of the task manager, the information of the perception module, and the execution code of the large language model planner.

[0056] The above embodiments, by combining the high-performance, low-frame-rate big oracle model in each module with the high-frame-rate perception and planning modules, can be executed efficiently in highly dynamic tasks.

[0057] The observation signal is acquired through a camera device installed on the flying intelligent agent. To further perceive the information content of this observation signal, a preferred embodiment of the invention provides a preferred perception module, in which the detection visual language model and the perception module are in a high-low hierarchy. The detection visual language model provides high-level open-set detection, while the perception module provides low-level closed-set detection of the 3D position information of objects. They interact to obtain real-time open-set detection results. The perception module's input is the image (observation signal) and the UAV's position information; its output is the closed-set 3D position information and category of pedestrians, bicycles, motorcycles, and vehicles.

[0058] The above embodiments integrate large models and end-to-end 3D position detection models, which can better understand environmental information and obtain accurate target location and category.

[0059] Through the perception module described above, the current environmental information and detected environmental information of the flying agent can be obtained. At this point, the user issues fuzzy language-based commands to the agent. To ensure correct understanding and analysis of these fuzzy commands, in a preferred embodiment of the invention, the task instruction is decomposed, managed, and stored through a task manager. Specifically, a task decomposer based on a large language model is used to decompose the task, and the decomposed sub-tasks include patrol, search, navigation, following, and language control of UAV movement.

[0060] In some specific embodiments, the task decomposition process of the flight agent can be represented as follows:

[0061] T t =f decomposition (I t M)={i 1,t i 2,t ,.,i n,t}

[0062] Among them I t This represents the user instruction received by the agent at time t, which serves as the instruction input to the task decomposer; M is the task memory generated by the agent during task execution; f decomposition The model function representing task decomposition in the task decomposer of the agent; i n,t This represents the nth decomposition task of the user instruction at time t, and these subtasks belong to one of the aforementioned executable subtasks; T t This represents the set of user instructions split by the agent at time t.

[0063] Meanwhile, the state manager can switch the current running state of the agent based on the data information fed back by each module, thereby maintaining the stable operation of the agent.

[0064] Note that at each stage, process variables, task completion status, and user interaction information will be stored in the memory module. This memory information will be provided to the larger model in different modules, allowing the agent to understand its own state and use information from past experiences for planning, thereby comprehensively improving the agent's performance.

[0065] After decomposing the task using the task manager, executable code for each subtask is needed. Therefore, in a preferred embodiment, executable code is generated using a large language model planner. In this embodiment, the task planner utilizes the large language model to coordinate the invocation of the perception module's perception and the perception information interface based on the VLM visual language model, as well as the special code generators for each subtask, to perform task planning by generating executable Python code. This process can be represented as:

[0066] c n,t =f task_planner (i n,t ,P,L,M)

[0067] Among them, i n,t This is expressed as the nth executable subtask command of the agent at time t. task_planner This is a task planner based on a large language model. `p` represents the prompt corresponding to the large language model executable code generator. `L` is the set of callable control and perception interfaces in the code generator, including the visual language model interface and related interfaces for UAV control. The input to the visual language model interface is the information detected from the visual language model and the target of the planning module; the output is the target position provided to the large language model planner. The input to the related interfaces for UAV control is the target point of different tasks in the planning module or the ID of the object to be tracked; the output is the execution code of the intelligent aircraft. C n,t The executable Python code generated for the nth subtask at time t.

[0068] During code generation, the code generators for each subtask based on the large language model refine the code generation related to the executable subtasks in the task planner, and finally output the complete code. This process can be represented as:

[0069] C n,t =f code_generator_j (c n,t )

[0070] f code_generator_j For a large language model-based code generator targeting the j-th subtask, j∈{search, follow, navigation, patrol, direct movement of drones}, this code generator generates executable, complete code C that conforms to the commands in different categories. n,t .

[0071] To ensure the code is error-free, in a preferred embodiment of the present invention, the syntax of the generated complete code is checked, specifically as follows:

[0072] S n,t =f safe_execute (C n,t )

[0073] This is a module for safe execution of Python code. It executes the code of a subtask and returns a boolean value S. n,t When it is True, it means the code has no syntax errors and can be executed; otherwise, it means the generated code has syntax errors and needs to be rewritten. The code is executed when it has been checked and found to be free of syntax errors.

[0074] Furthermore, in a preferred embodiment of the present invention, after the code is executed, a task completion detection module is used to detect whether the task has been completed. This process is described as follows.

[0075] F n,t E n,t =f exec_check (C n,t O t )

[0076] Among them, f exec_check This is the subtask status confirmation module. t For the real-time status of the drone and the observation signals from its sensors, F n,t This is a Boolean value indicating whether the task execution failed. A value of True indicates that the task failed, leading to the failure of the entire task; conversely, a value of True indicates that the subtask succeeded, and the next task can proceed. E n,t This is a Boolean variable used to determine whether a subtask requires further user interaction to ascertain the user's intent. When this variable is True, the system will interact with the user again and provide the current task execution status. After the user confirms their needs, the subtask will be replanned based on those needs. For example... Figure 2 As shown, the process maintains interaction with the user until the task is completed and the next command is issued.

[0077] In another preferred embodiment, the planning module consists of an obstacle avoidance planner and a tracking planner. The obstacle avoidance planner takes the moving target point and information from the perception module as input, and outputs a real-time planned obstacle avoidance path. The tracking planner takes the ID of the object to be tracked and information from the perception module as input, and outputs a real-time obstacle avoidance path that follows the target object.

[0078] In the above embodiments, the cloud-based large language model planner invokes the obstacle avoidance planner and the tracking planner according to the task. The cloud-based large language model planner sends the obtained code to the intelligent agent, which then invokes the various modules for execution.

[0079] To ensure the overall system security, in a preferred embodiment of the present invention, a security function module is also designed into the system. This security function module is a relatively independent module that monitors the privacy security, potential risk security, and planned collision safety of the intelligent agent in the environment in real time, such as... Figure 3 As shown, specifically:

[0080] pl t ,pr t =f privacy_check (O t )

[0081] rl t ,rr t =f risk_check (O t )

[0082] Co t =f collision_check (O t p t )

[0083] Where f privacy_check and f risk_check These are the privacy detection and potential risk detection modules within the detection module. Both modules are based on a fine-tuned visual language model. t This represents the sensor observation signals of the UAV at time t, as well as the UAV's status. t ,pr t and rl t ,rr t These include the privacy level, the reason for the privacy score, the potential risk score, and the reason for the assessment of potential risks. Among these, pl... t and rl t Integers from 1 to 4. t With rr t The reason for the rating is in string format. collision_check This is based on the traditional collision detection method, and the input of this method is the observation signal O at time t. t And the path planning information p at time t t The system uses Kalman filtering to predict the positions of dynamic objects in the environment and performs collision detection with the planned trajectory. If a collision is detected, an early warning is issued.

[0084] In this step, the drone's perception information is input into the privacy and security / potential security detection module in real time. The detector can analyze the scene for privacy and potential security issues based on the perception information according to different rules, and provide reasons and scores. At the same time, if the drone is flying or performing a mission, the collision detector will perform real-time collision detection on the flight path and dynamic objects in the environment based on the current perception information to ensure that the drone does not collide with objects in the environment.

[0085] Furthermore, in one preferred implementation, the safety evaluation module assesses the current safety status of the drone and summarizes the safety information before sending it to the safety processing module. For the privacy and potential risk modules, they output a hazard level from 1 to 4. 1 indicates no safety threat, 2 indicates a minor but manageable safety threat, 3 indicates a moderate safety threat requiring user notification to terminate the mission, and 4 indicates a severe safety threat, in which case the drone will forcibly terminate the mission. If the current situation is safe, the drone's operation will not be affected. For collision detection, if a collision is detected, the drone's mission will be immediately stopped.

[0086] Furthermore, in another preferred implementation, the safety processing module controls the drone to maintain a safe state based on the safety assessment results and different safety issues. For privacy and potential risks assessed as 3, a warning message will be sent to the user, instructing them to terminate the mission. For risks assessed as 4, the drone will terminate the mission and return to the previous safe point. Regarding collision detection, when the system detects a collision, the agent immediately performs emergency braking to avoid it and informs the user.

[0087] In the above embodiment, during the system operation phase, the agent receives commands from the user, the planning module breaks down the task and calls the corresponding task memory and code generator to generate the corresponding executable code, calls the perception interface to obtain task-related perception information, and finally executes the corresponding code to complete the user's instructions after passing the security check module. This system achieves safe and efficient task execution in highly dynamic environments by combining high-performance, low-frame-rate large models from various modules, high-frame-rate traditional methods, and using security detection modules that address privacy, potential risks, and collisions.

[0088] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment, characterized in that, include: The perception function module adopts a high-low layer structure. The high-level structure is a detection visual language model installed on a cloud platform, and the low-level structure is a perception module installed on the flight agent. The detection visual language model combines the information from the perception module and the observation signals collected by the flight agent to perform open set detection. The perception module performs closed set detection based on the observation signals and the information from the detection visual language model. The task manager, installed on a cloud platform, includes a task decomposer, a state manager, and a memory module. The task decomposer receives user commands, performs semantic understanding, and splits tasks. The state manager manages task states. The memory module stores process variables, task completion status, and user interaction information. The planning function module adopts a high-low structure. The high-level structure is a large language model planner installed on the cloud platform, and the low-level structure is a planning module installed on the flight intelligent agent. The large language model planner generates execution code based on the task splitting results of the task manager, the information of the detected visual language model, and the information of the planning module; The planning module obtains the planned path based on the task splitting results of the task manager, the information of the perception module, and the execution code of the large language model planner.

2. The safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 1, characterized in that, The task decomposer receives commands from the user and calls upon the task memory in the memory module to split the task, specifically as follows: ; in This represents the user instruction received by the agent at time t, which serves as the instruction input to the task decomposer. M represents the task memory generated by the agent during operation, including historical perception results and historical task completion status. A function representing the task decomposer model of an intelligent agent; This indicates that user commands are broken down into executable subtasks, including: search, follow, navigation, patrol, and direct movement of the drone; This represents the set of user instructions split by the agent at time t.

3. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 2, characterized in that, The large language model planner generates execution code based on the task decomposition results of the task manager, the information of the detected visual language model, and the information of the planning module, specifically as follows: ; ; in, The expression is the agent's first time at time t. One executable subtask It is a code generator for executable subtasks based on a large language model; P is the prompt corresponding to the code generator of the large language model for the executable task, and L is the set of control and perception interfaces that can be called in the code generator, including the visual language model interface and the related interfaces of UAV control. The input of the visual language model interface is the information of the visual language model and the target of the planning module, and the output is the target position provided to the large language model planner; the input of the related interfaces of UAV control is the target point of different tasks of the planning module or the ID of the object to be tracked, and the output is the execution code of the intelligent aircraft. For in for The first moment Each subtask generates executable Python code; in, This is a safe execution module for Python code. It executes the code of subtasks and returns a boolean value. When it is A positive result indicates that the code has no syntax problems and can be executed; otherwise, it indicates that the generated executable Python code has syntax problems and needs to be rewritten.

4. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 3, characterized in that, The planning module includes: The obstacle avoidance planner takes the moving target point and the information from the perception module as input, and outputs the obstacle avoidance path planned in real time. The tracking planner takes the ID of the object to be tracked and the information from the perception module as input, and outputs a path that can avoid obstacles in real time and follow the target object.

5. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 1, characterized in that, It also includes a safety function module, which adopts a high-low layer structure. The high layer structure is a safety visual language model installed on a cloud platform, and the low layer structure is a collision detector installed on the flying intelligent agent. The safety visual language model performs privacy and security detection and potential security detection based on the observation signals collected by the flying intelligent agent. The collision detector performs collision detection based on the observation signals collected by the flying intelligent agent and the planned path of the planning function module.

6. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 5, characterized in that, The privacy and security detection specifically includes: ; in, It is a privacy detection module; yes The observation signals collected by the real-time flying intelligent agent and the status of the drone and The reasons are related to privacy level and privacy score; The potential security detection specifically includes: ; in, It is a potential risk detection module. These are the potential risk score and the reasons for evaluating the potential risk.

7. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 5, characterized in that, The collision detection specifically includes: ; It is a collision detection module, whose input is the observation signal at time t. and path planning information at time t The collision detection function uses the Kalman filter method to predict the position of dynamic objects in the environment and perform collision detection with the planned trajectory.

8. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 5, characterized in that, The security function module performs security assessments and security actions based on privacy detection results, potential security detection results, and collision detection results.

9. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 8, characterized in that, The security assessment includes: and Integers from 1 to 4 and The reason for the rating is in string format; 1 indicates no security threat, 2 indicates a minor security threat but within a controllable range, 3 indicates a moderate security threat requiring the user to be informed to terminate the task, and 4 indicates a serious security threat. for A Boolean value representing the collision detection result at any given time; when this variable is... At that moment, it indicates that the drone is based on and observation signals A collision risk has been detected; otherwise, there is no collision risk.

10. A safe flight intelligent agent system for executing complex language commands in a highly dynamic environment according to claim 9, characterized in that, The security process includes: When at least one of the privacy and potential risk assessments is 3, the security function module directly sends a warning message to the task manager, which then sends it to the user and terminates the task. When at least one of the privacy and potential risk assessments is 4, the security processing module controls the flight agent to terminate the mission and return to the previous safe point; For collision detection, when the safety function module detects a collision, the collision detector directly notifies the flight agent to immediately perform emergency braking to avoid the collision, and at the same time communicates with the task manager to inform the user.