Language instruction analysis and scene control method and system for digital twin platform
By standardizing and logically controlling natural language commands on the water conservancy digital twin platform, the problems of operational complexity and precise control have been solved, achieving efficient and accurate three-dimensional scene control and a user-friendly interactive experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHANGJIANG SPATIAL INFORMATION TECH ENG CO LTD (WUHAN)
- Filing Date
- 2026-02-04
- Publication Date
- 2026-06-19
AI Technical Summary
Existing water conservancy digital twin platforms are complex to operate, making it difficult for non-professional users to simulate specific working conditions and navigate through scenes. Furthermore, general-purpose large models are difficult to accurately control professional 3D simulation scenes.
Through a closed-loop process of standardized function encapsulation, context-aware parsing, logical task decomposition, and precise execution feedback, a large language model is used to parse natural language instructions, and an API mapping library is used to achieve precise control of the 3D scene, including receiving instructions, extracting state information, recognizing intent, verifying parameters, and executing feedback.
It lowers the barrier to user operation, enables high-precision semantic control and automated orchestration of complex tasks, and improves business simulation efficiency and human-computer interaction experience.
Smart Images

Figure CN122240093A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of natural language interactive control technology for water conservancy digital twin platforms. Specifically, it relates to a language command parsing and three-dimensional scene control method and system for water conservancy digital twin platforms. It cross-integrates computer graphics, artificial intelligence and water conservancy information technology, and is suitable for precise control scenarios based on natural language-driven three-dimensional simulation scenarios. Background Technology
[0002] With the rapid development of digital twin technology, 3D panoramic visualization platforms built on high-fidelity rendering engines such as 3D game engines and geographic information systems have become core tools for the water conservancy industry in conducting basin flood control, water conservancy project simulation, and simulation of four domains (water, land, air, and underground). These platforms can integrate massive amounts of geographic information data (GIS), building information models (BIM), and complex water conservancy mechanism models (such as hydrodynamic models and rainfall-runoff models), providing intuitive and scientific basis for water conservancy decision-making.
[0003] However, existing water conservancy digital twin platforms are typically complex, with user interfaces cluttered with numerous menus, parameter input boxes, and technical jargon configuration items. For non-professional users or high-level decision-makers, mastering the operation of such platforms to simulate specific operational conditions, such as simulating the evolution of a once-in-a-century flood in a river basin or performing virtual tours from a specific perspective, often presents a very high learning curve and operational difficulty. Traditional interaction methods primarily rely on mouse clicks and keyboard input, lacking natural and intuitive interaction tools, and are ill-suited to meet the extremely high demands for timeliness and convenience in emergency command scenarios.
[0004] Meanwhile, artificial intelligence technology, particularly large language models (LLMs) represented by the Transformer architecture, has made groundbreaking progress in recent years. This new generation of large models not only possesses powerful natural language understanding and generation capabilities but also demonstrates superior logical reasoning and function calling abilities. Large models can understand complex task descriptions, break them down into specific steps, and invoke external tools to solve problems according to predefined interface specifications. The rise of this agent technology provides new insights into solving the interaction challenges of complex software systems.
[0005] Despite the rapid development of AI technology, its application in the field of water conservancy digital twins is currently mainly concentrated at the knowledge question-answering level based on retrieval-enhanced generation (RAG), primarily addressing the problem of finding information. At the level of deep control—specifically, how to directly drive complex 3D simulations, control scene navigation, and adjust environmental parameters using natural language—mature solutions are still lacking. The main challenges lie in the complexity of water conservancy business logic, with instructions often containing strict spatiotemporal constraints and specialized parameters such as recurrence intervals and water level thresholds; and the significant semantic gap between the underlying API interfaces of 3D platforms and natural language, making it difficult for general-purpose AI models to directly understand and accurately invoke them.
[0006] Chinese invention patent with publication number CN 118822307 A relates to a water conservancy information management method and system based on digital twin technology. The patent revolves around digital twin technology and covers water conservancy information management methods and systems such as multi-source data collection, model generation, and monitoring and analysis. However, it lacks natural language interaction technology and has insufficient interaction convenience.
[0007] Therefore, there is an urgent need to develop a method that can standardize and encapsulate the complex functions of a water conservancy digital twin platform and utilize the reasoning capabilities of large models to achieve accurate mapping of natural language to three-dimensional scene control commands, so as to reduce the user threshold and improve the human-computer interaction experience and business simulation efficiency. Summary of the Invention
[0008] The purpose of this invention is to solve the technical problems of existing water conservancy digital twin platforms, such as complex operation, high interaction threshold, and difficulty in accurately controlling professional 3D simulation scenes using general large models, and to provide a natural language command parsing and 3D scene control method and system for water conservancy digital twin platforms.
[0009] To achieve the above objectives, the present invention provides the following technical solution: A natural language command parsing and 3D scene control method for digital twin platforms is disclosed. This method achieves natural language-driven control of 3D scenes through a closed-loop process of standardized function encapsulation, context-aware parsing, logical task decomposition, and precise execution feedback. The method is based on a pre-built water conservancy function description specification that encapsulates the underlying functional functions of the water conservancy digital twin platform into a standardized API mapping library. This API mapping library establishes a one-to-one mapping between function description templates and the underlying functions of the 3D rendering engine and the water conservancy mechanism model through middleware interfaces. The water conservancy function description specification includes structured function description templates, which define function names, function explanations, required parameters, optional parameters, and parameter data types. The parameter data types cover water conservancy business types. Specifically, the following logically related steps are included: S1: Receives the user's natural language commands and simultaneously extracts the current state information of the water conservancy 3D digital twin scene in real time through the data interface of the 3D engine. The state information is stored in a structured manner as a session context to provide a scene environment basis for command parsing. The natural language commands are received through a voice recognition interface or a text input interface. The state information includes time environment parameters, weather and meteorological parameters, camera view coordinates, and the current simulation working condition status. S2: The natural language instruction and the session context input instruction parsing module; S3: Use the instruction parsing module to identify the user's core business intent and extract water conservancy professional parameters; S4: Based on preset business logic constraints, the core business intent is decomposed into a sequence of atomic operations containing logical dependencies, and an executable toolchain is generated in conjunction with the water conservancy function description specification. S5: For the executable toolchain generated in step S4, perform business logic verification and format conversion on the water conservancy professional parameters to ensure that the parameters meet the calling requirements of the underlying API of the 3D scene. This step is a pre-verification step for toolchain execution; specifically, it determines whether the parameters are within the preset valid value range and whether the geographic entity name exists in the GIS database. If the verification passes, the fuzzy parameters are converted into a precise format that the underlying API can recognize. If the verification fails, feedback information containing the error reason is generated. S6: After completing parameter verification and format conversion in step S5, the executable toolchain is executed using an asynchronous scheduling mechanism. A progress monitoring thread is established for time-consuming water conservancy simulation calculation tasks, and the progress is displayed. This drives the real-time updating of the water conservancy 3D digital twin scene. At the same time, the execution results are fed back to the user through speech synthesis or text, and the session context is updated, forming a closed-loop interaction of "input-parsing-execution-feedback". This provides updated scene information for subsequent multi-round command interactions.
[0010] Preferably, the construction of the water conservancy function description specification specifically includes: defining a structured function description template, the template including function name, function explanation, required parameters, optional parameters, and parameter data type; wherein, the parameter data type at least covers water conservancy business-specific types, including return period probability, water level value, watershed geographic entity, and time span; the API mapping library establishes a middleware interface to map the definitions in the function description template one-to-one with the underlying functions of the 3D rendering engine and water conservancy mechanism model.
[0011] Preferably, step S1 specifically includes: receiving the natural language command through a voice recognition interface or a text input interface; simultaneously, reading the current time environment parameters such as simulation time, lighting conditions, weather parameters such as rainfall, cloud cover, camera view coordinates such as latitude and longitude, elevation, pitch angle, and the current simulation status from the rendering loop in real time through the data interface of the 3D engine; and storing the status information in a structured manner in the session state memory module as the context for subsequent multi-round dialogues.
[0012] Preferably, step S2 specifically includes: based on the natural language instructions and conversation context obtained in step S1, constructing a system prompt containing system role settings, current scene state description and historical dialogue records; combining the natural language instructions and system prompt to form a complete input sequence and transmitting it to the instruction parsing module built on a large language model to achieve collaborative input of instructions and scene context; Preferably, step S3 specifically includes: using the semantic understanding capability of the instruction parsing module to classify user instructions into preset intent categories such as scene roaming, information query, simulation deduction, or environmental control; based on the parameter definitions in the water conservancy function description specification, identifying and extracting key entities and values that match the intent from natural language, such as specific reservoir names, water level elevation values, or rainfall recurrence interval levels.
[0013] Preferably, step S4 specifically includes: based on the business logic constraints preset in the water conservancy function description specification and the core business intent and water conservancy professional parameters extracted in step S3, using the reasoning ability of the instruction parsing module to determine whether the natural language instruction contains multiple sub-tasks; if so, determining the serial or parallel execution order of each sub-task according to business process constraints such as "water conservancy simulation calculation function takes precedence over 3D visualization rendering function call"; then retrieving the functional interface matching each sub-task from the API mapping library; and assembling and generating an executable toolchain containing atomic operation sequences with logical dependencies according to the execution order, thereby realizing the structured decomposition of complex intents.
[0014] Preferably, step S5 specifically includes: determining whether the extracted water conservancy professional parameters are within a preset effective value range, such as whether the water level is lower than the dead water level or higher than the dam crest elevation, and verifying whether the geographic entity name exists in the GIS database; if the parameter passes the verification, the fuzzy parameter described in natural language, such as next year's flood season, is converted into a precise numerical format recognizable by the underlying API of the water conservancy digital twin platform, such as a specific timestamp or floating-point number; if the parameter fails the verification, feedback information containing the reason for the error is generated.
[0015] Preferably, step S6 specifically includes: sequentially calling the functional interfaces in the executable toolchain through an asynchronous scheduling mechanism; for time-consuming hydraulic simulation calculation tasks, establishing a progress monitoring thread to obtain the calculation progress in real time and display it on the front end; after execution, driving the 3D scene to perform corresponding rendering updates such as camera fly-in, water material changes, and dynamic loading of the flow field, and outputting the execution results to the user through speech synthesis or text, while updating the session state memory module.
[0016] This invention also provides a natural language command parsing and 3D scene control system for a water conservancy digital twin platform, comprising: a function encapsulation module for storing water conservancy function description specifications and API mapping libraries; an input and state management module for receiving natural language commands and extracting key state information of the 3D scene; a command parsing and generation module for identifying business intent, extracting parameters, and decomposing complex commands into executable toolchains; and an execution scheduling and feedback module for verifying toolchain parameters and scheduling the water conservancy 3D digital twin scene for updates and feedback.
[0017] This invention discloses an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the natural language instruction parsing and three-dimensional scene control method for a water conservancy digital twin platform.
[0018] This invention discloses a computer-readable storage medium storing a computer program thereon, characterized in that, when the computer program is executed by a processor, it implements the natural language instruction parsing and three-dimensional scene control method for a water conservancy digital twin platform.
[0019] Compared with the prior art, the beneficial effects of the present invention are as follows: 1. Lowering the operational threshold of professional platforms: This invention establishes a mapping mechanism between natural language and the underlying functions of the 3D platform, enabling users to drive complex functions such as flood evolution and flight navigation simply through voice or text, without needing to master complex operation menus and professional parameter configurations. This greatly simplifies the use of the water conservancy digital twin platform, making it particularly suitable for emergency command scenarios. Field tests on three water conservancy digital twin platforms of different scales, with 10 professional technicians and 10 non-professionals as test users, reduced the operation time for non-professional users to complete complex simulation tasks from 60 minutes to 5 minutes, facilitating the application of this invention's method in emergency command scenarios in the engineering field.
[0020] 2. Achieved high-precision semantic control: Unlike general chatbots, this invention, through the construction of a dedicated Water Conservancy Function Description Specification (DDS) and parameter verification mechanism, can accurately identify and convert industry-specific parameters such as once-in-a-century floods and dead water levels, avoiding operational errors caused by the illusions of AI models and ensuring the scientific rigor of simulation. Test data comes from a test set containing 1000 water conservancy business instructions, covering four core intents: scene roaming, information query, simulation, and environmental control. The intent recognition accuracy and parameter extraction F1 score of this invention are superior to those of directly calling general large models. Specific test results are as follows: The intent recognition accuracy and parameter extraction F1 score of this invention are calculated based on the number of correctly identified intents, the precision of parameter extraction, and the recall rate, respectively, in the test set; the comparison data for directly calling general large models are the native output results of the models under the same test set, without any domain adaptation optimization.
[0021] 3. Supports automated orchestration of complex tasks: Leveraging the reasoning capabilities of large models, this invention can automatically decompose user macro-level instructions, such as simulating floods and viewing disaster conditions, into a toolchain of atomic operation sequences containing multiple steps, including simulation calculations, scene rendering, and perspective switching. It also automatically handles dependencies between steps, significantly improving business processing efficiency. Tests on the success rate of complex task orchestration and the accuracy of dependency identification were conducted based on 500 complex instructions containing three or more sub-tasks. These instructions originated from actual business scenarios in the water conservancy industry. The orchestration success rate was the percentage of instructions that successfully generated an executable toolchain, and the dependency identification accuracy was the percentage of instructions that correctly identified sub-task dependencies. No circular dependency misjudgments were found during testing. The correctness of the toolchain's dependency logic was manually verified during the testing process.
[0022] 4. Context-aware multi-turn interaction capability: This invention extracts real-time 3D scene states such as current time, viewpoint, and weather as AI context, enabling the system to understand commands that depend on prior states, such as "make the rain heavier" or "fly closer to see," achieving a truly smooth and continuous human-computer interaction loop. Multi-turn interaction testing was conducted based on 100 multi-turn dialogue scenarios, each containing 3-5 rounds of continuous commands. Context understanding accuracy was measured as the percentage of dialogue groups that correctly responded to commands dependent on prior states. Parameter reuse accuracy was measured as the percentage of dialogue groups that correctly reused known parameters without repeated input. Test data was derived from interaction records and result verification during the dialogue process. Attached Figure Description
[0023] Figure 1 This is a system architecture diagram provided for an embodiment of the present invention.
[0024] Figure 2 This is a flowchart of a method provided in an embodiment of the present invention. Detailed Implementation
[0025] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0026] This invention leverages the reasoning capabilities of a large language model to achieve semantic-driven natural language command parsing and 3D scene control for water conservancy digital twin scenarios. The embodiments of this invention provide a method for natural language command parsing and 3D scene control for water conservancy digital twin platforms. This invention pre-constructs water conservancy function description specifications and encapsulates the underlying function sets of the water conservancy digital twin platform into a standardized API mapping library. The construction of water conservancy function description specifications includes the establishment of a water conservancy professional terminology dictionary containing over 2,000 core terms and semantic explanations. The terms are derived from authoritative water conservancy industry standards and technical documents. A 512-dimensional semantic vector is obtained through Word2Vec model training and embedded into the word embedding layer of the large model. The training process follows the standard Word2Vec model training procedure. Water conservancy business rule constraints are injected into the prompt words, such as "simulation calculation functions take precedence over rendering update function calls" and "water level parameters must satisfy dead water level ≤ numerical value ≤ dam crest elevation." These rules are derived from actual water conservancy business operation procedures and safety regulations. A few-shot learning method is employed, inputting 100 labeled water conservancy instruction-toolchain mapping samples into the model. These samples are derived from manually labeled historical business data, guiding the model to understand the priority relationships and parameter constraints of water conservancy operations. The number of samples for few-shot learning is based on commonly used configurations adapted to the large model domain.
[0027] Constructing a specification for water conservancy function descriptions is fundamental to achieving semantic-driven implementation. Water conservancy digital twin platforms, such as those built on UE5 and Cesium, have a large number of C++ or Blueprint functions at their underlying layer, including `SetWeather()` for weather control, `FlyTo()` for camera control, and `RunSimulation()` for water conservancy model control. Exposing these functions directly to AI is difficult for large models to understand.
[0028] Therefore, this embodiment first defines a set of Domain Description Specification for Hydrology (DDS-Hydrology). This specification uses JSON Schema format to semantically define each callable functional tool. The definition includes: Tool name: e.g., simulate flood evolution.
[0029] Function Description: A natural language description used to tell the large model the purpose of the tool, such as to start a flood evolution simulation of a specified watershed, and to support setting different rainfall return periods.
[0030] Parameter definition: List the parameter name, type, unit, and enumeration value in detail. For example, for the recurrence period parameter, define its type as String, and the valid enumeration values are ["once in 10 years", "once in 20 years", "once in 50 years", "once in 100 years"], etc.
[0031] Simultaneously, an API mapping library is constructed. Using FastMCP or a similar middleware framework, the virtual tools defined in DDS are bound to the underlying functions of the 3D platform. For example, when the system calls the `simulate flood evolution` tool, the middleware automatically triggers the corresponding flood dynamics calculation module in the 3D engine.
[0032] The goal of the API mapping library is to establish an executable binding between the virtual tools defined by the Water Function Description Specification (DDS-Hydrology) and the underlying functions of the 3D digital twin platform. The construction steps include the following seven steps: determining the binding input, selecting an appropriate middleware framework and completing the basic configuration, developing the middleware adaptation interface and binding logic, building the mapping library index and caching mechanism, testing, verification and optimization iteration, and document compilation and maintenance specification formulation.
[0033] Determine the input endpoint: Identify all callable virtual tools in the DDS-Hydrology specification and clarify the unique identifier of each tool. The following information is compiled to create a "Virtual Tool List": ID, function definition, parameter requirements (e.g., required / optional), data type, and valid value range. This ensures that all input objects are included and clearly defined. The output binding is determined by examining the underlying functions of the 3D digital twin platform. This includes core functions of 3D rendering engines like UE5 and Cesium (e.g., camera control, scene rendering, weather adjustment) and calculation functions for water mechanism models like hydrodynamics and rainfall-runoff models. The complete path, input parameter format, output result type, and calling protocol (e.g., RPC, IPC) for each function are compiled into a "Bottom-Level Function List." Core calculation functions for water mechanism models, such as hydrodynamic models and rainfall-runoff models (e.g., iterative calculations of flood evolution and inundation range determination), are existing technologies in the field of water conservancy simulation. The input / output formats and calling protocols (e.g., RPC, IPC) for these functions are industry standards and will not be elaborated upon here. Binding constraints are clearly defined: the core principle of one-to-one mapping is defined—one virtual tool corresponds to one underlying function, avoiding many-to-many ambiguous mappings. Business process constraints are clarified, such as water conservancy simulation calculation functions needing to be bound and called before 3D visualization rendering functions, and technical constraints, such as long-running functions needing to support asynchronous calls.
[0034] Select a suitable middleware framework and complete the basic configuration: Middleware framework selection includes choosing a framework that supports cross-module and cross-language calls, such as FastMCP, gRPC, and Spring Cloud, based on the 3D platform architecture, underlying function types, and business requirements. Prioritize frameworks commonly used in the field of water conservancy digital twins that have strong compatibility. Middleware basic environment setup: Deploy the middleware service, configure network parameters such as IP and port, communication protocols such as HTTP / 2 for gRPC and a custom private protocol for FastMCP, and adapt the connection timeout threshold to water conservancy simulation scenarios with long processing times; it is recommended to set it to 30 minutes to 1 hour. Establish a connection channel between the middleware and the platform: Through the interfaces provided by the 3D engine SDK and the water conservancy model API, realize bidirectional communication connection between the middleware and the 3D rendering engine and the water conservancy mechanism model, and test the connection stability to ensure no packet loss and that latency is within an acceptable range.
[0035] Choose a suitable middleware framework and complete the basic configuration: Establish a tool-function mapping table: clearly define the correspondence between each virtual tool and the underlying function in tabular form. Core fields include: Virtual tool ID, Virtual tool name, Full path of the underlying function, Call type (synchronous or asynchronous), Parameter mapping relationship, and Return result mapping relationship. Define parameter mapping details: For water conservancy professional parameters such as return period probability, water level values, spatial parameters such as latitude and longitude, elevation angle, and time parameters such as next year's flood season, formulate specific mapping rules: Semantic-numerical mapping converts natural language related parameters, such as once-in-a-century events, into values that the underlying function can recognize, such as 0.01; Entity-identifier mapping converts geographical entity names, such as Reservoir A, into unique identifiers supported by the underlying function through GIS database queries, such as BaseinID=1001; Format-type mapping converts JSON format virtual tool parameters into data types supported by the underlying function, such as int, float, and structure. Define return result mapping rules: unify the format of the output results of the underlying functions, and convert the native output of different types of functions, such as C++ structs, Python dictionaries, and engine callback data, into a standardized JSON format, including execution status, result data, and error information if applicable, to ensure that the upper-level module instruction parsing and scenario updates can be uniformly identified.
[0036] Develop middleware adaptation interfaces and binding logic: Develop virtual tool adaptation interfaces: For each tool in the "Virtual Tool List," develop corresponding adaptation interfaces in the middleware. These interfaces must strictly adhere to the parameter requirements defined by DDS, implementing the function of receiving and parsing virtual tool call commands. Develop low-level function call interfaces: Based on the "Low-Level Function List," develop corresponding call interfaces for different types of low-level functions, such as engine functions written in C++ and hydraulic model functions written in Python. Support synchronous / asynchronous call modes. Synchronous call interfaces are suitable for fast-response operations such as perspective switching and weather adjustment, achieving real-time interaction of call-wait-return. Asynchronous call interfaces are suitable for long-running operations such as flood evolution simulation, supporting callback mechanisms and providing real-time feedback on execution progress. Long-running operations are defined as those with an expected execution time exceeding 10 seconds. Minute-long operations require dedicated computing resources; implementation of binding logic code development: write mapping rule execution code in the middleware to automate the entire process of virtual tool call command → adaptation interface reception → parameter mapping conversion → underlying function call → return result encapsulation → result feedback; integration of parameter verification pre-processing logic: embed a parameter verification module in the binding logic to verify the completeness and validity of virtual tool parameters before calling, such as whether the geographic entity exists in the GIS database and whether the water level is within the valid value range. If the verification passes, the mapping and call are executed; if it fails, an error message is returned directly.
[0037] Constructing a mapping library index and caching mechanism: Establishing a mapping relationship index: Based on the "Tool-Function Mapping Table," a memory-level index is built in the middleware, using the virtual tool ID as the key to quickly locate the corresponding underlying function information, parameter mapping rules, and calling interfaces, improving call routing efficiency; Designing a caching mechanism: Frequently called virtual tool-underlying function mapping relationships and commonly used parameter mapping rules, such as recurrence period enumeration values, are cached in the middleware memory, reducing database query and rule parsing time and improving mapping response speed; Supporting hot index updates: A dynamic index update interface is designed. When the DDS specification adds / modifies virtual tools or upgrades / replaces underlying functions, the mapping relationship index can be updated in real time through the interface without restarting the middleware service.
[0038] Testing, Verification, and Optimization Iteration: Unit tests are conducted individually on the binding logic between each virtual tool and the underlying function to verify the accuracy of parameter mapping, call success rate, and completeness of return results, ensuring that no individual binding relationships are abnormal. Integration tests simulate tool call scenarios after natural language command parsing, testing the collaborative working effect of the API mapping library, command parsing module, and 3D platform, verifying the mapping accuracy and process continuity of complex commands such as multi-subtask toolchain calls. Stress tests are conducted for high-frequency call scenarios and long-duration function call scenarios to verify the concurrent processing capability and stability of the mapping library, such as the response time and absence of crashes when simultaneously calling the flood simulation tool 100 times. Optimization Iteration: Based on the test results, mapping rules are optimized, such as adjusting parameter mapping logic to improve accuracy; middleware configuration is optimized, such as timeout thresholds and caching strategies; and interface performance is improved, such as reducing redundant data transmission, ensuring that the API mapping library meets the business needs and performance requirements of the water conservancy digital twin platform.
[0039] Documentation and Maintenance Standards: Compile technical documentation for the API mapping library, detailing its architecture design, middleware configuration information, tool-function mapping table, parameter mapping rules, API call descriptions, and test reports to facilitate future maintenance and expansion; establish maintenance standards to clarify the update process of the mapping library, such as the steps for adding mappings when adding new virtual tools, the mapping adjustment process when upgrading underlying functions, the troubleshooting process, such as log queries when calls fail, problem location methods, and version management rules, to ensure the long-term stable operation of the mapping library.
[0040] like Figure 2 As shown, the method of the present invention mainly includes steps S1 to S6.
[0041] S1: Receives natural language commands from the user and extracts key state information of the current three-dimensional digital twin scene of water conservancy as the session context in real time.
[0042] In this step, the tool used to extract key state information of the current three-dimensional digital twin scene of water conservancy is a device or component that is well-known and directly available in the fields of water conservancy digital twins and computer science, as shown in Table 1: Table 1: Tools used to extract key status information of the current 3D digital twin scenario of water conservancy
[0043] This embodiment supports multimodal input. Users can input voice commands via microphone, such as changing the current weather to heavy rain, which are then converted into text by the ASR (Automatic Speech Recognition) module; or they can directly input text in the dialog box.
[0044] Meanwhile, to enable the AI to understand the current 3D scene, the system extracts scene state information (State Context) from the rendering engine in real time through polling or event-triggered mechanisms. This information includes, but is not limited to: Time environment: Current simulation time, such as 14:00 on July 1, 2024, weather conditions (sunny / rainy / snowy), and light intensity.
[0045] Spatial location: current camera's latitude, longitude, altitude, pitch, and yaw.
[0046] Business status: Whether a model such as flood evolution is currently running, or the currently selected geographic entity such as Reservoir A.
[0047] These status information are formatted as part of a system prompt, such as the current scene status: Time = 2024-07-01, Weather = Sunny, Viewpoint = Above the A reservoir dam.
[0048] S2: The natural language instruction and the session context input instruction parsing module.
[0049] The S2 step constructs a system prompt containing system role settings, a description of the current scene state, and historical dialogue records. Then, it combines the natural language command with the system prompt to form a complete input sequence, which is transmitted to the command parsing module built on a large language model. Specifically, this includes: S21: Determine the construction rules and content of system prompt words. The system roles are set as fixed templates, defining the exclusive roles and capability boundaries of the instruction parsing module to ensure that the large model focuses on water conservancy business scenarios. The system roles are fixed as follows: responsible for identifying the user's water conservancy-related business intentions, including scene roaming, information query, simulation and deduction, and environmental control; extracting water conservancy professional parameters; and decomposing tasks based on water conservancy function description specifications and business logic constraints, without responding to irrelevant instructions.
[0050] The current scene state description is in a structured format: key state information stored in step S1 is extracted from the session state memory module and converted into a natural language description in a hierarchical format of "category-parameter-value". For example, the current 3D scene state is as follows: Time environment = simulation time 2024-07-01 15:00, lighting conditions strong light; Weather = rainfall heavy rain, cloud cover 100%; Spatial location = camera longitude 119.3°, latitude 33.6°, elevation 120m, pitch angle 30°; Business status = current running model flood evolution model, selected entity A reservoir, entity water level 35.5m.
[0051] Historical dialogue record filtering: Extract the historical interaction records of the current session from the session state memory module, filter the last 5 rounds or configure the content related to the current command according to actual needs, and integrate them in the format of "user command - system response", such as "user command = simulate a once-in-a-century flood in Reservoir A; system response = flood evolution simulation has been started, current progress is 40%". System prompt word assembly: The above system role settings, current scene status description, and historical dialogue record filtering are assembled in the order of "role settings → current scene status → historical dialogue records → parsing requirements" to form a complete system prompt word. The parsing requirements are fixed as follows: combining the above scene status and historical dialogue, first identify the user's core business intent, and then extract the water conservancy professional parameters that match the intent. The parameters must conform to the water conservancy function description specifications.
[0052] S22: Preprocessing of Natural Language Instructions Instruction format standardization: The natural language instructions received in step S1 are cleaned to remove invalid characters such as special symbols and redundant modifiers, and uniformly converted into UTF-8 encoded text strings; Command validity check: Determine whether the command is related to water conservancy business, such as excluding irrelevant commands like exiting the system or adjusting volume. If the check passes, proceed to the next step; if it fails, return to the feedback prompt asking for water conservancy-related commands and terminate the subsequent process.
[0053] S23: Construct the complete input sequence Combination rules: The fixed combination order of "system prompt words + separators + preprocessed natural language instructions" is adopted; Standardized format: The combined content is encapsulated into a JSON-formatted input sequence. The core fields include system_prompt (system prompt word), user_instruction (preprocessed natural language instruction), and session_id (unique session identifier used to associate with historical states).
[0054] Format validation: Checks the JSON format of the input sequence for validity, such as closed quotation marks and complete fields. If the validation passes, it proceeds to the transmission stage; otherwise, it automatically corrects format errors, such as adding missing quotation marks before transmission.
[0055] S24: Input sequence is transmitted to instruction parsing module Communication method selection: Select the transmission method based on the deployment mode of the instruction parsing module: local deployment / remote service. Local deployment: The input sequence is read directly using a shared memory mechanism, with a transmission latency of ≤10ms; Remote service: Call the dedicated interface of the instruction parsing module (such as / api / parse / input) via HTTP / 2 protocol, and set the timeout to 500ms; Encryption and verification during transmission: The JSON-formatted input sequence is Base64 encoded for encryption during transmission to prevent data leakage. At the same time, a verification code is added based on the MD5 value calculated from the content of the input sequence to ensure that the data is not tampered with during transmission. Receive confirmation: After receiving the input sequence, the command parsing module returns a response signal indicating successful reception. If no response is received (e.g., a timeout), the module will retry the transmission twice. If the retry fails, the module will notify the user that the command parsing module did not respond and that the user should try again later.
[0056] S25: Temporary storage of the input sequence The successfully transmitted complete input sequence is stored in the current input buffer of the session state memory module, associated with the session ID and the transmission timestamp, for backtracking and retransmission when subsequent instruction parsing fails. The storage period is 10 minutes after the current instruction is parsed to avoid consuming too many storage resources.
[0057] S3: Use the instruction parsing module to identify the user's core business intent and extract water conservancy professional parameters.
[0058] The instruction parsing module is built based on the Qwen-Max large-scale language model and adopts an adaptation method that combines domain fine-tuning and prompt word engineering: a water conservancy professional corpus is constructed, containing 50,000 water conservancy business instructions, 30,000 professional term explanations, and 20,000 historical interaction records. The pre-trained model is fine-tuned with an initial learning rate of 1e-5, 50 training rounds, and a batch size of 32. Water conservancy-specific prompt word templates are designed, embedding semantic constraints of professional terms such as "recurrence period," "dam crest elevation," and "dead water level," and clarifying parameter extraction rules and intent classification standards. A professional term vector library is established, and water conservancy-specific vocabulary is embedded into the vector space through Word2Vec model training to improve the model's semantic understanding accuracy of domain terms. The Transformer architecture attention mechanism of the instruction parsing module is used to semantically encode the input sequence. Attention weights are assigned by calculating the dot product similarity between the query vector and the key vector to enhance the semantic representation of water conservancy terminology. This mechanism is based on the mathematical principles of semantic encoding in deep learning. Intent classification is achieved based on the Softmax function, and the classification loss function adopts cross-entropy loss. Iterative optimization is performed until the loss value is lower than 0.01. The optimization objective is based on the mathematical optimization logic of the classification task. The parameter extraction adopts the Conditional Random Field (CRF) algorithm to perform sequence labeling on named entities, improving the extraction accuracy of numerical and geographic entity parameters. This algorithm is a commonly used technique for named entity recognition in the field of natural language processing, and its principle is based on a probabilistic statistical model.
[0059] Steps S2 and S3 constitute the core parsing process. The instruction parsing module is primarily based on pre-trained large-scale language models with strong reasoning capabilities, such as Qwen-Max. The goal of step S3 is to identify the user's core business intent and extract professional parameters that conform to water conservancy business specifications, providing a clear basis for subsequent task decomposition. Step S3 utilizes the semantic understanding capabilities of the instruction parsing module to classify natural language instructions into preset intent categories such as scene roaming, information query, simulation deduction, or environmental control; based on the parameter definitions in the water conservancy function description specifications, it identifies and extracts key entities and values matching the intent from natural language, such as reservoir name, water level elevation, and rainfall recurrence interval level. The system inputs the user's instruction, such as "Simulate the situation of Reservoir A experiencing a once-in-a-century flood," and the context obtained in step S1 into the model. The model first performs intent recognition to determine whether the user's intent belongs to "information query," "scene roaming," or "simulation deduction." Step S3 specifically includes: S31: Preset Intent Category Definition and Priority Configuration The core definitions and boundaries of the four types of intents are clearly defined: Scene roaming involves adjusting the observation perspective or position of a 3D scene, such as flying to the A reservoir dam or zooming in to view the spillway, without involving simulation model operation or parameter modification. Information querying involves obtaining existing scene data or model results, such as querying the current water level of A reservoir or displaying the inundation area of a 100-year flood, without changing the scene state. Simulation simulation involves starting, adjusting, or terminating the hydraulic simulation model, such as simulating a 100-year flood at A reservoir or adjusting the rainfall intensity to 50 mm / h, which will generate new simulation data. Environmental control involves adjusting the environmental parameters of the 3D scene, such as changing the weather to heavy rain or setting the simulation time to next year's flood season, without involving the core calculations of the hydraulic model.
[0060] Set intent recognition priority: Set priorities in the order of simulation simulation > information query > environmental control > scene roaming. When an instruction contains multiple types of intents at the same time, such as simulating a flood and flying to the deepest part of the flood, the high-priority intent is determined as the core intent, and the low-priority intent is treated as a related sub-intent.
[0061] S32: Constructing an Intent-Parameter Mapping Table Based on the Hydraulic Function Description Specification From the water conservancy function description specification constructed in this invention, the intent attribution and parameter requirements of all virtual tools are extracted, and a standardized mapping table is constructed. The core fields are shown in Table 2: Table 2: Intent-Parameter Mapping Table
[0062] The mapping table is stored in the local cache of the instruction parsing module, which supports quick retrieval of associated parameters by intent category, ensuring that parameter extraction is fully aligned with the water conservancy function description specifications.
[0063] S33: Identification and Determination of Core Business Intent Instruction semantic segmentation and keyword extraction: The preprocessed natural language instructions transmitted in step S2 are segmented using a water conservancy-specific word segmentation dictionary, which includes professional terms such as recurrence period, water level, and river basin. Core keywords are extracted, such as the original instruction = "Simulate the once-in-a-century flood in the Yangtze River basin next year" → segmentation result = "simulate, Yangtze River basin, next year's flood season, once-in-a-century, flood" → core keywords = "simulate, Yangtze River basin, once-in-a-century, flood"; Intent matching and core determination: Based on the association rules between keywords and preset intents, if the keywords include "simulation," "deduction," or "run," then "simulation deduction" is matched; if the keywords include "query," "display," or "view," then "information query" is matched, thus initially identifying candidate intents. Based on intent priority, determine the core business intent: if the candidate intent includes multiple ones, such as simulating a flood and querying the inundation range simulation and information query, the simulation is determined to be the core intent and the information query is the related sub-intent according to priority. Output core intent identifiers such as "simulation" to correspond to simulation deduction and associated sub-intents, ensuring that subsequent parameter extraction focuses on core requirements.
[0064] S34: Extraction and Verification of Water Conservancy Professional Parameters Parameter extraction logic: Based on the core intent, determine the types of parameters to be extracted from the intent-parameter mapping table, including required parameters and optional parameters, and then extract the corresponding parameters from the instructions. Required parameters must be extracted: If no parameters are extracted, such as when the simulated flood does not mention the name of the watershed, a follow-up question will be generated, such as "Please specify the name of the watershed to be simulated", and feedback will be given to the user and the subsequent process will be paused. Optional parameters can be extracted as needed: If not mentioned in the instructions, the default values defined in the water conservancy function description specifications will be used, such as the simulation duration being 24 hours by default; Example of parameter extraction: Command = "Simulate a once-in-a-century flood in the Yangtze River basin next year" → Core intent = Simulation and deduction → Associated tool = simulate flood evolution → Required parameters = Basin name (extract a certain basin), Rainfall recurrence period (extract once-in-a-century) → Optional parameters = Simulation duration (default 24 hours), Initial water level (default current water level); Preliminary parameter validation: Type validation checks whether the data type of the extracted parameters conforms to the specifications (e.g., the recurrence period should be a string indicating a once-in-a-century occurrence or a numerical value of 0.01). Relevance validation checks whether the parameters match the core intent (e.g., the flight speed parameter should not appear in the simulation intent). Result processing: If the validation passes, proceed to the next step; if it fails, the reason for the error is reported (e.g., the flight speed parameter does not match the simulation intent).
[0065] S35: Parameter Standardization and Structured Output The extracted and verified parameters will be standardized according to the requirements of the water conservancy function description specification: Semantic conversion: Parameters described in natural language are converted into standardized expressions, such as "next year's flood season" → "2025-06-01 to 2025-09-30", and "once in a century" → "once in 100 years". Format standardization: All parameters are encapsulated in JSON format, with core fields including core intention, related intention, required params, and optional params. The structured parameters are then transmitted to the S4 step and synchronously stored in the session state memory module for subsequent toolchain generation and traceability.
[0066] S36: Special handling for multiple intents or / and multiple parameters Multi-intent processing: When an instruction contains related sub-intents, such as simulating a flood and flying to the deepest point of the flood, while outputting the core intent and parameters, the related sub-intents and required parameters are marked, such as the target location required by the sub-intent scene roaming, and the results of the simulation are marked. Multi-parameter conflict handling: When multiple parameters extracted from the command have logical conflicts, such as simulating a 50-year flood and a 100-year flood in the Yangtze River Basin, the principle of priority of the parameter mentioned later shall be used to determine the conflict, or a conflict prompt shall be fed back. For example, if the recurrence period parameter in the command conflicts, please specify a single recurrence period. Fuzzy parameter handling: When the parameter description is vague but can be supplemented by the context, such as the instruction to simulate the flood of the basin, combined with the scene status of step S1, the currently selected entity = Yellow River Basin, the parameter is automatically supplemented, such as basin_name = Yellow River Basin, and the output is noted that the parameter comes from the currently selected entity in the scene.
[0067] The pre-trained large-scale language model extracts water conservancy-related parameters from instructions. In this embodiment of the invention, the model identifies the geographical entity as "Reservoir A" and the simulation condition parameter as "once-in-a-century". Thanks to the definition in the Water Conservancy Function Description Specification 1, the model can understand that "once-in-a-century" is a key input parameter, rather than an ordinary adjective.
[0068] S4: Based on preset business logic constraints, the core business intent is decomposed into a sequence of atomic operations containing logical dependencies, and an executable toolchain is generated in conjunction with the water conservancy function description specification.
[0069] This is a key step in demonstrating the "intelligent agent" characteristics of this invention. For complex instructions, such as "simulate a flood and take me to see the deepest part of the flood," a single API call cannot complete the task. The instruction parsing module decomposes the task according to preset business process constraints: Task breakdown: The model determines that the instruction contains two subtasks: (1) Run a flood simulation; (2) Move the camera's perspective.
[0070] Dependency analysis: The model deduces through logical reasoning that the results of the flood simulation (i.e., the generation of inundation depth data) are required before the "deepest inundation point" can be found. Therefore, task (1) is a prerequisite for task (2).
[0071] Toolchain generation: The model selects the corresponding tools from the API mapping library to generate the following toolchain: Op1: call tool 'simulate_flood(return_period="100 years")' Op2: Call the tool 'get_max_inundation_location()' (depends on the output of Op1) Op3: call tool 'fly_to_location(target=Op2.result)' Step S4 utilizes the reasoning capabilities of the instruction parsing module to determine whether a natural language instruction contains multiple subtasks. If so, the execution order is determined according to business process constraints. Then, matching functional interfaces are retrieved from the API mapping library and assembled in sequence to generate an executable toolchain. The business process constraints include the pre-dependencies between water conservancy simulation calculation and 3D visualization rendering.
[0072] Step S4 involves breaking down the abstract core business intent into an executable sequence of tool calls with logical dependencies, ensuring precise adaptation to the water conservancy business processes and API mapping library. The specific steps are as follows: S41: Define atomic operations and business process constraints Atomic operation definition: An atomic operation is a complete call to a single virtual tool in the API mapping library. It is indivisible and its core elements include: virtual tool ID, input parameter set, execution priority, and output data type. Improve business process constraint rules: Pre-dependent constraints: Simulation calculation operations, such as flood simulation, are prerequisites for data-dependent operations, such as querying the inundation range and jumping the view to the inundation area; environmental adjustment operations, such as setting rainstorms, can be executed in parallel with scene roaming operations, such as flight view. Execution order constraints: Serial execution requires sequential dependencies, such as simulation → query → roaming; parallel execution has no dependencies and can be executed synchronously, such as setting weather + adjusting lighting. Resource usage constraints: Long-running operations, such as simulations exceeding 10 minutes, are allocated independent computing resources to avoid affecting the response speed of real-time operations, such as viewpoint switching.
[0073] S42: Subtask Identification and Decomposition Subtask determination rules: Based on the core intent and associated sub-intents output by step S3, if there are operations corresponding to different intents, such as core intent simulation and deduction + associated sub-intent information query, it is determined to be multiple subtasks. Under a single intent, if multiple virtual tools need to be called to complete the task, such as flying to the target location and adjusting the viewpoint height in scene roaming, it is also determined to be multiple subtasks. Subtask breakdown process: Based on the structured output of step S3, extract the core intent and the corresponding operation objectives of related sub-intents. For example, the instruction to simulate a once-in-a-century flood in the Yangtze River basin and fly to the deepest flooded area has the core intent of simulation and the operation objective of generating flood data. The related sub-intent is information query, and the operation objective of obtaining the maximum flooded location. The operation objective of scene roaming is to jump to the target location. Based on the principle of one subtask corresponding to one operational objective, a list of subtasks is obtained: Subtask 1: flood simulation; Subtask 2: querying the maximum flooding location; Subtask 3: perspective jump.
[0074] S43: Subtask Logical Dependency Analysis and Execution Order Sorting Dependency type identification: Data dependency means that the later subtask needs to use the output data of the previous subtask, such as subtask 2 needing flood data from subtask 1, and subtask 3 needing location data from subtask 2; No dependency means that there is no data relationship between subtasks, and they can be executed in parallel, such as setting the weather and adjusting the camera tilt angle. Execution order determination: Serial sorting arranges subtasks with data dependencies in reverse order from dependent to dependent, such as subtask 1 → subtask 2 → subtask 3. Parallel sorting marks subtasks without dependencies as a parallel group, which can be triggered synchronously. For example, subtask A sets up a rainstorm, and subtask B adjusts lighting; these are marked as a parallel group and executed simultaneously. Output sorting results: Generate a list of subtasks including execution order, dependencies, and priorities.
[0075] S44: Subtask and API mapping library functional interface matching Interface matching rules: Based on the operation goal and intent category of the subtask, a unique matching virtual tool is retrieved from the API mapping library. If the intent is simulation, then the simulation tool is matched. If the operation goal is flood evolution, then the simulate flood evolution tool is matched precisely. Matching verification: If multiple candidate tools are retrieved, such as "query data" corresponding to multiple query tools, they are filtered according to the similarity between the tool function and the sub-task target ≥90%. The similarity is calculated by semantic matching of the functional explanation in the water conservancy function description specification. Interface parameter binding: The water conservancy professional parameters extracted in step S3 are bound to the matching virtual tools according to the required / optional parameters in the water conservancy function description specification. For example, subtask T1 is bound to the simulate flood evolution tool, with parameters such as basin name = a certain basin and return period = once in 100 years.
[0076] S45: Assembly and formatting of the executable toolchain Toolchain structure definition: The toolchain is an ordered array in JSON format, with each element corresponding to an atomic operation virtual tool call. The core fields include: atomic operation ID, virtual tool ID, parameter set, dependent atomic operation ID, execution type, and timeout threshold. Parallel operation assembly: If parallel subtasks exist, mark them as parallel group nodes in the toolchain; Toolchain Validation: Checks if all virtual tool IDs in the toolchain exist in the API mapping library, if dependencies form a closed loop (e.g., no circular dependencies), and if parameters are complete. If the validation passes, proceed to the next step; otherwise, it returns a toolchain assembly failure message with the specific reason, such as the virtual tool ID not existing.
[0077] S46: Toolchain Storage and Indexing The formatted toolchain is associated with the session ID and core intent identifier and stored in the toolchain cache module. It is persistently stored in JSON file format and cached in memory to improve the efficiency of subsequent calls. Establish a toolchain index: using the session ID plus the core intent as the index key, it supports fast query and tracing. The index information includes the toolchain length, estimated execution time, and core dependencies, which facilitates the rapid identification of long-running operations and dependencies when S6 steps are scheduled for execution.
[0078] The toolchain dependency analysis employs a Directed Acyclic Graph (DAG) construction method: a unique node ID is assigned to each subtask, and a node association matrix is constructed based on parameter dependencies. A matrix element of 1 indicates the existence of a dependency, and 0 indicates no dependency. The matrix construction rules refer to common standards in the field of data dependency analysis. A topological sorting algorithm (Kahn's algorithm) is used to sort the nodes, ensuring that the processing order conforms to the rule of "data-dependent subtasks are chained together, and non-dependent subtasks are chained in parallel." This algorithm is commonly used in directed graph sorting and can effectively avoid circular dependencies. Nodes without dependencies are marked as parallel groups and executed in parallel using a thread pool. The number of threads is dynamically allocated according to the number of CPU cores, with a maximum concurrency of no more than 8. This configuration is based on the balance between mainstream server hardware configuration and task execution efficiency. Circular dependency detection uses a depth-first search (DFS) algorithm. If a circular dependency is detected, an error feedback is generated, prompting the user to split conflicting instructions. The detection logic is designed based on the principle of circular path identification in graph theory. The topological sorting algorithm (Kahn's algorithm) and the depth-first search (DFS) algorithm are commonly used techniques in the field and will not be elaborated upon here.
[0079] S5: Perform business logic validation and format conversion on the parameters in the executable toolchain of step S4.
[0080] In order to prevent AI from making erroneous API calls due to "illusions" after the large model generation toolchain, such as generating a non-existent reservoir name or setting a negative water level, this invention introduces a deterministic parameter verification module.
[0081] This module checks whether the extracted parameters are within the valid range defined by DDS. For example, it checks whether "Reservoir A" exists in the system's GIS database. If the verification passes, the module will also perform formatting conversion, converting the natural language parameters into a format that the underlying code can recognize. For example, it converts "once in a century" into the model input parameter p=0.01, or converts the time description "tomorrow afternoon" into a specific timestamp.
[0082] The parameter formatting conversion uses the following specific algorithms: Recurrence period conversion: Based on the hydraulic formula P=1 / T (P is the exceedance probability, T is the recurrence period), "once in a century" is converted to P=0.01, "once in 50 years" is converted to P=0.02, and the fuzzy expression "approximately once in a century" is processed as P=0.01±0.002; Time conversion: "next year's flood season" is automatically mapped to June-September based on the basin's historical flood season data, converted to a timestamp range [1746336000, 1756972800], and "heavy rain period" is converted to a continuous 24-hour time window; Geographic entity conversion: Through spatial index query in the GIS database, "a certain reservoir" is converted to a unique identifier ID, while verifying whether the entity is located within the target basin boundary. The R-tree index algorithm is used to improve query efficiency, with a query time ≤50ms. The R-tree index algorithm is an existing technology for geospatial queries and will not be elaborated upon here.
[0083] Step S5 performs business logic verification and formatting transformation on the water conservancy professional parameters in the executable toolchain generated in step S4, specifically including the following sub-steps: S51: Parameter Validation Preparation and Validation Dimension Definition The preset "Specification of Effective Value Range for Water Conservancy Professional Parameters" is retrieved from the session state memory module. This specification contains the verification standards for all water conservancy-specific parameters. For example, numerical parameters include water level, rainfall, and return period probability. The effective value range for these numerical parameters is clearly defined. For example, water level values must meet the requirement of "dead water level ≤ water level ≤ dam crest elevation", and rainfall must meet the requirement of "0 mm / h ≤ rainfall ≤ 500 mm / h". Geographic entity parameters, such as reservoir name, basin name, and geographic coordinates, are clearly verified based on the system's built-in GIS database, such as the entity directory and spatial range of PostGIS. For example, the reservoir name must exist in the "Water Conservancy Projects - Reservoirs" data table in the GIS database, and the geographic coordinates must fall within the spatial boundary of the target basin.
[0084] Extract the parameter sets of each atomic operation in the executable toolchain, classify and organize them into "numerical type, geographic entity type, time type, and enumeration type", generate a list of parameters to be verified, and associate the list with the atomic operation ID, parameter name, parameter original value, and the tool ID to which the parameter belongs.
[0085] S52: Multi-dimensional business logic validation execution. Validate each type of parameter one by one in the following order. If all dimensions pass the validation, the parameter is considered valid; if any dimension fails the validation, the parameter is considered invalid. Integrity verification: Check whether the necessary parameters of each atomic operation are missing according to the definition of necessary parameters in the water conservancy function description specification. For example, if the necessary parameters of the flood evolution simulation tool, such as the basin name and rainfall recurrence period, are not extracted, the parameters are directly determined to be invalid. Validity verification: For numerical parameters, the original value is verified to be within the range specified in the "Specification for Valid Value Range of Water Conservancy Professional Parameters". For example, if the water level value is 35m, and the corresponding dead water level of the reservoir is 10m and the dam crest elevation is 40m, then the verification passes; if the water level value is 45m, the verification fails. For geographic entity parameters, the existence of the parameter is verified through the GIS database query interface. For example, if a query is performed to see if a reservoir exists in the database, the verification passes if it returns that the reservoir exists. Consistency check: Check whether the associated parameters of cross-atomic operations in the same toolchain are logically consistent. For example, if atomic operation 1 sets the rainfall parameter of rainstorm weather to 100mm / h, and atomic operation 2 simulates the rainfall return period of flood evolution with a rainfall of 80mm / h, the parameter is deemed invalid if the two conflict. Relevance verification: Verify whether the parameter is related to the core business intent and the current scenario status. For example, in the H River Basin scenario, if the parameter is a Yangtze River reservoir that is not related to the H River Basin, the relevance verification will fail.
[0086] S53: Validation result processing. If validation passes, proceed to the parameter formatting and conversion stage to execute S54. If validation fails, generate structured error feedback information, including: atomic operation ID, invalid parameter name, error type (e.g., value out of domain, geographic entity not found, parameter missing), error reason (e.g., water level 35m exceeds the dam crest elevation of a reservoir by 30m, a watershed does not exist in the GIS database), and correction suggestions (e.g., please enter a water level value within the range of 10-30m, please select a watershed name that exists in the database). Transmit the error feedback information to the execution scheduling and feedback module, which outputs it to the user via text or voice. Terminate the execution of the currently executable toolchain, and store the error information and toolchain number in the system log for subsequent problem tracing.
[0087] S54: Parameter Formatting and Conversion. For parameters that have passed verification, standardized format conversion is performed according to the input requirements of the underlying API of the water conservancy digital twin platform to ensure that the parameters can be directly called by the underlying functions. The specific conversion rules are as follows: Numerical parameter conversion: converting fuzzy numerical descriptions to precise values, such as converting high water levels to 25m based on the normal water level of the corresponding reservoir, and large rainfall to 150mm / h based on the basin-wide rainstorm standard; converting percentage / probability descriptions to decimals, such as converting a 100-year return period to 0.01, and 50% rainfall intensity to 0.5; and unifying unit conversions, etc. Geographic entity parameter conversion: Entity name is converted to a unique identifier ID, such as a reservoir queried from a GIS database, which is converted to ReservoirID=2001; Natural language spatial description is converted to coordinate values, such as "top of the reservoir dam" is converted to latitude and longitude (119.32°, 33.65°) + elevation 40m; the middle and lower reaches of the watershed are converted to the boundary coordinate set {(x1,y1),(x2,y2),...}). Time-related parameter conversion: Fuzzy time descriptions are converted to precise timestamps, such as converting next year's flood season to 2025-06-01 00:00:00 to 2025-09-30 23:59:59; time formats are uniformly converted to Unix timestamps, such as 2025-07-15 14:30:001752609000; Enumeration class parameter conversion: Natural language descriptions are converted into enumeration codes supported by underlying functions, such as rainstorms are converted into rainstorms, and flood evolution is converted into floods. Post-conversion validation: Verify whether the format of the converted parameters fully matches the requirements of the underlying API, such as data type int / float / string and field length meeting the limit. If the converted parameters do not meet the requirements, return a parameter conversion failure feedback and terminate the toolchain execution.
[0088] S55: Encapsulation and transmission of converted parameters All transformed parameters are encapsulated according to atomic operation dimensions to generate a "standardized parameter package" corresponding to the executable toolchain. The parameter package adopts JSON format and contains the following fields: atomic operation ID, tool ID, standardized parameter set, transformation timestamp, and verification pass indicator. By using middleware interfaces such as FastMCP, standardized parameter packages can be bound to the original executable toolchain to form an executable toolchain with standardized parameters. The bound toolchain is transmitted to the execution scheduling and feedback module, and simultaneously stored in the verified parameter cache of the session state memory module for a period of 30 minutes after the toolchain is executed, for use in execution backtracking and re-execution.
[0089] S6: Execute the executable toolchain to drive the real-time update of the water conservancy 3D digital twin scene and provide feedback on the execution results.
[0090] Finally, the generated toolchain is executed through a Multimodal Service Scheduler (MCP Scheduler).
[0091] For time-consuming operations such as hydrodynamic model calculations, the scheduler will start asynchronous tasks and display a progress bar on the front-end interface.
[0092] For real-time operations such as perspective switching, the scheduler directly calls the 3D engine interface to smoothly move the camera.
[0093] Step S6 executes the executable toolchain verified and transformed in step S5, driving the real-time update of the water conservancy 3D digital twin scene and synchronously feeding back the execution results. This includes the following sub-steps: S61: Toolchain Scheduling Preparation and Execution Strategy Configuration The executable toolchain with standardized parameters is read from the session state memory module. The core attributes of the toolchain are parsed, including the number of atomic operations, the execution type of each operation (synchronous or asynchronous), and dependencies. Execution scheduling strategies, such as resource allocation strategies, are configured. High-priority operations are prioritized for independent computing threads (e.g., ≥2 CPU cores, ≥4GB memory) to avoid resource contention with low- and medium-priority operations such as perspective switching. Low- and medium-priority operations share a thread pool with a maximum concurrency of 10. Dependency execution strategies execute sequential operations according to the atomic operation IDs in the toolchain; the next operation is triggered only after the previous operation completes and returns a success status. Parallel operations are started in parallel through the thread pool, with the progress of each operation monitored synchronously. Components required for scheduling are initialized: the Multimodal Service Scheduler (MCP Scheduler), scene update monitoring module, and result feedback generation module are started. Communication connections are established between the scheduling unit and the 3D rendering engine and hydraulic mechanism model, using the gRPC protocol.
[0094] S62: Scheduled execution of executable toolchains The multimodal service scheduling unit executes the toolchain according to the following process: traversing atomic operations in the toolchain, generating an execution sequence queue according to dependencies, marking the execution order of serial operations such as Op1→Op2→Op3, and the parallel groups of parallel operations such as OpA and OpB; synchronous operation execution: for atomic operations marked as synchronous, such as view switching and weather adjustment, the corresponding underlying interface of the 3D rendering engine is directly triggered through local function calls, blocking the current thread until the operation is completed and the result is returned; asynchronous operation execution: for atomic operations marked as asynchronous, such as flood evolution simulation and rainfall runoff calculation, an independent asynchronous task is created through a thread pool, a standardized parameter package is passed in, the main thread is released after the task is started, and the execution status is fed back by the task callback function.
[0095] Execution process monitoring: Status monitoring uses callback interfaces of the 3D rendering engine SDK and status feedback APIs of the hydraulic mechanism model to obtain the execution status of each atomic operation in real time, such as not started / in execution / success / failure; Progress monitoring: For asynchronous high-level operations, a separate progress monitoring thread is started to query the task execution progress at 100ms intervals, such as the number of iterations of simulation calculation and the amount of data processed. The progress data is stored in the format of percentage completed and description of the current stage; Error interception: If an error code is returned during the operation execution process, such as API call failure or parameter format mismatch, the error is immediately intercepted and recorded, and subsequent operations are terminated according to the dependency relationship.
[0096] S63: Real-time updating of 3D digital twin scenes for water conservancy projects Scene update triggering mechanism: Synchronous operation triggering means that after a synchronous operation such as FlyTo viewpoint switching is completed, the 3D rendering engine's instant update interface is triggered directly, without waiting for other operations; Asynchronous operation triggering means that asynchronous operations such as flood simulation are triggered in a phased update and final update mode. Phased updates, such as updating dynamic data in the scene, such as water level height and flooded area boundaries, are triggered synchronously every time 10% of the simulation progress is completed. The final update triggers a full scene rendering refresh after the operation is completed; Data transmission protocol: Update data is compressed and transmitted in Protocol Buffers (Protobuf) format, including scene update type such as viewpoint / weather / water body / terrain / entity status, update data content such as camera coordinates, water body material parameters, flooded area vector data, and timestamps.
[0097] Scene update implementation details: Viewpoint updates utilize the camera control interface of the 3D rendering engine, such as UE5's SetActorLocationAndRotation, to smoothly adjust the camera position based on standardized parameters like latitude, longitude, elevation, and pitch. Linear interpolation is used with a 500ms movement time to avoid abrupt viewpoint changes. Environment updates involve setting meteorological parameters like rainfall and cloud cover through the weather system interface, triggering / adjusting particle effects for rain and snowfall, and simultaneously updating light intensity and skybox materials. Water body updates are based on water level data output from the hydraulic model, using the water body rendering interface to adjust the water height field. The material shader updates water transparency in real-time, with lower transparency as the submerged depth increases, and dynamically loaded flow field textures such as water velocity vector maps are loaded simultaneously. Terrain and entity updates are based on submerged area data, using the terrain shading interface to mark submerged areas and updating the status identifiers of hydraulic engineering entities such as dams and spillways.
[0098] S64: Execution Result Feedback and Session State Update Standardized integration of results data: Successful scenarios: Integration by operation-level results and overall results format. Operation-level results include the ID of each atomic operation, execution status, execution time, output data such as simulation result file path, and coordinates after view update; overall results include the total execution time of the toolchain and the scene update completion status; Failed scenarios: Integration of error information, including failed operation ID, error type, error reason, and recovery suggestions.
[0099] Multimodal feedback output: Text feedback includes generating standardized JSON format text results, which are displayed through pop-up windows on the platform's interactive interface; Voice feedback involves synthesizing text results into speech (TTS) and broadcasting them in natural language, such as "Toolchain execution successful," "Completion of the 100-year flood simulation and scenario update for Reservoir A," etc.; Progress feedback is for long-running asynchronous operations, displaying a progress bar and a description of the current stage on the front-end interface.
[0100] Session state update: Store the toolchain execution results and the latest state after scene update, such as the current camera coordinates, weather parameters, and simulation working condition status, into the session state memory module, overwriting the original scene state information.
[0101] S65: Abnormal Scenario Handling and Backup Mechanism Toolchain execution failure: If a single atomic operation fails but does not affect other operations, such as the failure of weather adjustment in parallel operations, then the operation is marked as failed, and other operations continue to be executed. The feedback result will indicate that part of the operation failed. If a core operation, such as the simulation calculation in a series operation, fails, then the entire toolchain execution will be terminated, and a complete error message will be fed back. Scene update timeout: If no update completion callback is received from the rendering engine within 5000ms after the scene update is triggered, it is determined that the update has timed out. The update will be automatically retried once. If the retry fails, a scene update timeout will be reported. Please refresh the page to try again. At the same time, the update type and trigger data of the timed-out scene will be recorded.
[0102] like Figure 1 As shown in the figure, this invention also provides a natural language command parsing and 3D scene control system for a water conservancy digital twin platform. Its logical architecture adopts a top-down, progressive, and bidirectional interactive four-layer design. Data flow and functional collaboration between each layer and internal modules are achieved through standardized interfaces. The specific logical relationships are as follows: The Water Resources Digital Twin Functional Encapsulation Layer serves as the underlying foundation and capability support layer of the entire system. Its core comprises two modules: a water resources functional description specification module and a functional API mapping library. The water resources functional description specification module defines function templates, parameter types, and validation rules specific to water resources operations, providing a unified semantic standard for upper-layer modules. The functional API mapping library, through middleware interfaces, binds the virtual tools defined in the specification to the underlying functions of the 3D rendering engine and water resources mechanism model, transforming business functions into executable code. The output of this layer is a standardized tool call interface, a fundamental prerequisite for upper-layer modules to implement scene control; the execution of all upper-layer module functions depends on the encapsulation capabilities provided by this layer.
[0103] Command Reception and State Management Layer: As the system's perception end, it undertakes user interaction and environmental perception functions, including a multimodal command reception module and a 3D scene state extraction module. Its logical input comes from user operations and real-time 3D scene data, and its output is structured commands plus contextual data, directly connecting to the next level. The multimodal command reception module receives user requests through interfaces such as voice and text and converts them into standardized text; the 3D scene state extraction module synchronously collects real-time state information such as time environment, spatial location, and business operation from the digital twin platform and stores it in the session state memory module. The two work together to form complete interactive input data, providing dual evidence of user intent and scene background for command parsing, realizing the linkage of understanding commands and understanding the environment.
[0104] The instruction parsing and toolchain generation layer takes as its logical input the instructions passed from the previous layer plus structured contextual data. Its core module is an instruction parsing module built on a Large Language Model (LLM), which also integrates sub-modules such as parameter extraction and task decomposition. This layer uses the semantic understanding capabilities of LLM to identify the user's core business intent and extracts professional parameters based on the water conservancy function description specifications. Then, leveraging the reasoning capabilities of LLM, it decomposes complex intents into sequences of atomic operations with logical dependencies, matches corresponding interfaces from the function API mapping library, and assembles them into an executable toolchain. Its output is a toolchain carrying standardized parameters, serving as the hub connecting perception and execution, enabling intelligent transformation between thought intent and task orchestration. Furthermore, this layer needs to call the specifications and mapping library of the water conservancy function encapsulation layer in real time during the parsing process to ensure the professionalism and executability of the parsing results.
[0105] 3D Scene Execution and Feedback Layer: As the system's execution end, its logical input is the executable toolchain generated by the previous layer. The core includes the MCP service scheduling module and the execution feedback module, directly connecting to the 3D rendering engine and hydraulic mechanism model of the water conservancy digital twin platform. The MCP service scheduling module, based on the toolchain's logical dependencies (e.g., serial or parallel connections and resource occupancy levels), calls the underlying APIs asynchronously or synchronously to drive the 3D scene to perform operations such as perspective adjustment, environment updates, and simulation. The execution feedback module monitors the execution progress and results in real time. On one hand, it synchronizes the scene update status to the session state memory module of the instruction receiving and state management layer to complete context updates; on the other hand, it provides feedback to the user through voice, text, etc. The output of this layer is the real-time scene update effect and user feedback information, forming a closed loop of execution-feedback-context update. Simultaneously, its execution status provides data support to the upper-layer modules, such as the updated scene state for the next round of instruction parsing.
[0106] The core logical relationship of the four-layer architecture of this invention is as follows: the functional encapsulation layer provides basic capability support; the instruction receiving and state management layer collects input and environmental data; the instruction parsing and toolchain generation layer completes intelligent decision-making; the 3D scene execution and feedback layer implements operations and performs closed-loop interaction; the execution scheduling and feedback layer transmits tool call requests, including virtual tool IDs and standardized parameter packages, to the functional encapsulation layer; the functional encapsulation layer returns call results, including execution status and simulation data / rendering instructions, to the execution scheduling and feedback layer; the interaction frequency is synchronized according to the execution rhythm of atomic operations; the triggering condition is the completion of tool call request reception / generation of execution results; the functional progression between each layer is achieved through unidirectional flow of input-processing-output; and the context linkage across layers is achieved through the session state memory module to ensure the coordination and interaction continuity of the entire system.
[0107] This invention fully leverages the powerful semantic understanding and logical reasoning capabilities of modern large-scale models, deeply integrating them with professional water conservancy digital twin operations. Through standardized functional encapsulation and intelligent instruction parsing, it successfully solves the problems of complex operation and rigid interaction in traditional water conservancy digital twin platforms, providing a new, efficient, and intelligent solution for the digital transformation of the water conservancy industry.
[0108] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for language instruction parsing and scene control for a digital twin platform, the method being based on a pre-constructed water conservancy function description specification that encapsulates the underlying functional functions of the water conservancy digital twin platform into a standardized API mapping library, characterized in that... The method includes: S1: Receive user natural language commands, extract the current state information of the water conservancy 3D digital twin scene in real time through the data interface of the 3D engine, and store the state information in a structured manner as the session context; S2: Input the natural language instruction and the conversation context into the instruction parsing module. The instruction parsing module is built based on a large language model and has semantic understanding and logical reasoning capabilities. S3: Utilizing the semantic understanding capability of the instruction parsing module, based on the constructed water conservancy function description specification, identify the user's core business intent, and extract water conservancy professional parameters matching the intent from natural language; S4: Based on preset business logic constraints, the core business intent is decomposed into a sequence of atomic operations containing logical dependencies. By analyzing the data dependencies between subtasks and determining the serial and parallel execution order, an executable toolchain is constructed using an ordered array in JSON format, and the parallel group nodes of parallel operations are marked. S5: Perform business logic validation and format conversion on the parameters in the executable toolchain described in step S4; S6: After completing parameter verification and format conversion in step S5, execute the executable toolchain, establish a progress monitoring thread for the time-consuming water conservancy simulation calculation task and display the progress, drive the water conservancy 3D digital twin scene to be updated in real time, and provide feedback on the execution results.
2. The language instruction parsing and scene control method for a digital twin platform according to claim 1, characterized in that: The water conservancy function description specification includes a structured function description template, which defines the function name, function explanation, required parameters, optional parameters and parameter data types. The parameter data types cover water conservancy business types.
3. The language instruction parsing and scene control method for a digital twin platform according to claim 1, characterized in that, In step S2, the system prompts include system role settings, current scene status descriptions, and historical dialogue records. After natural language instruction preprocessing, the system prompts are encapsulated together as a JSON format input sequence. The input sequence is transmitted to the instruction parsing module via local memory sharing or HTTP / 2 protocol.
4. The language instruction parsing and scene control method for a digital twin platform according to claim 1, characterized in that, Step S3 includes: S31: Predefine the preset intents, including the definitions and boundaries of scene roaming, information query, simulation, and environmental control, and set the priority of each intent recognition; S32: Based on the constructed water conservancy function description specification, extract the intent attribution and parameter requirements of all virtual tools from the specification, construct an intent-parameter mapping table and store it in the local cache of the instruction parsing module; S33: Perform semantic word segmentation on the preprocessed natural language instructions transmitted in step S2, extract core keywords using a word segmentation dictionary specifically for the water conservancy field, and preliminarily identify candidate intentions by combining the association rules between keywords and preset intentions; S34: Based on the business intent, determine the parameter types to be extracted from the S32 step intent-parameter mapping table, and extract key entities and values from the natural language instructions in a targeted manner; S35: Validate the parameters extracted in step S34, including checking whether the parameter data type conforms to the specification and whether it is associated with and matches the intent. After the validation is passed, convert the parameters described in natural language into a standardized expression, then encapsulate them into structured parameter data in JSON format and store them in the session state memory module.
5. The language instruction parsing and scene control method for a digital twin platform according to claim 1, characterized in that, The S4 step includes: Pre-build an atomic operation definition and business process constraint system: clearly define an atomic operation as a complete call to a single virtual tool in the API mapping library, and set business process constraints; Based on the core business intent and related sub-intents output by the S3 steps, identify and split multiple sub-tasks. If the intent contains multiple operation goals or a single intent needs to call multiple virtual tools, split it according to the principle of one operation goal corresponding to one sub-task to form a sub-task list. Analyze the logical dependencies between subtasks, identify data dependencies and no dependencies, determine the execution order of each subtask according to the rule of serializing data-dependent subtasks and parallelizing no-dependency subtasks, and generate a list of subtasks containing dependencies and priorities. Retrieve functional interfaces that match each subtask from the API mapping library, match virtual tools based on the operation goals and intent categories of the subtasks, and then bind the water conservancy professional parameters extracted in step S3 to the corresponding tool interfaces according to the specifications. Assemble and format the executable toolchain, including encapsulating atomic operations into ordered arrays in JSON format according to execution order, and marking parallel group nodes of parallel operations for subsequent scheduling and execution.
6. The language instruction parsing and scene control method for a digital twin platform according to claim 1, characterized in that, Step S5 includes: Verification preparation: Retrieve the preset water conservancy function description specifications from the session state memory module to clarify the verification standards for various parameters; extract the parameter sets of each atomic operation in the executable toolchain, organize them by category to generate a list of parameters to be verified, and associate key information such as atomic operation ID and parameter name; Verify each parameter, including checking whether necessary parameters are missing, whether numerical parameters are within the valid value range, whether geographic entity parameters exist in the GIS database, whether the related parameters of cross-atomic operations in the same toolchain are logically consistent, and whether the parameters are related to the core business intent and the current scenario status. Verification result processing: If the verification passes, the parameter formatting and conversion stage will begin; if the verification fails, a structured error feedback message will be generated, transmitted to the execution scheduling and feedback module for output to the user, and the execution of the current executable toolchain will be terminated and recorded in the system log. Parameter formatting and conversion: For parameters that pass the verification, perform conversion according to the input requirements of the underlying API of the water conservancy digital twin platform, converting fuzzy numerical descriptions into precise numerical values, geographic entity names into unique identifiers (IDs), fuzzy time descriptions into precise timestamps, and natural language enumeration descriptions into enumeration codes; after conversion, verify whether the parameter format matches the underlying API requirements, and if it fails, return a conversion failure message and terminate the process. Conversion parameter encapsulation and transmission: The converted parameters are encapsulated into standardized parameter packages in JSON format according to the atomic operation dimension. The standardized parameter packages are then bound to the original executable toolchain through the middleware interface to form an executable toolchain with standardized parameters. The toolchain is then transmitted to the execution scheduling and feedback module and stored in the verified parameter cache area of the session state memory module.
7. The language instruction parsing and scene control method for a digital twin platform according to claim 1, characterized in that, Step S6 includes: Scheduling preparation: Read the executable toolchain with standardized parameters from the session state memory module, parse the number of atomic operations, execution type and dependencies, configure the execution scheduling strategy, initialize the multimodal service scheduling unit, scene update monitoring module and result feedback generation module, and establish communication connection between the multimodal service scheduling unit and the 3D rendering engine and hydraulic mechanism model through gRPC protocol; Toolchain execution: Traverse the atomic operations in the toolchain, generate an execution sequence queue according to dependencies, and mark the serial order and parallel group; for synchronous operations, trigger the underlying interface through local function calls, blocking the thread until execution is completed; for asynchronous operations, create independent tasks through a thread pool, pass in standardized parameter packages, release the main thread, and let the callback function report the execution status. Real-time updates of 3D scenes: Synchronous operations are triggered immediately by the engine's update interface upon completion; asynchronous operations are triggered by two nodes: phased and final completion. Execution result feedback: Successful scenarios are output to the user via text pop-up and speech synthesis; failure scenarios integrate error information and provide feedback. Session state update: Store the toolchain execution results and the latest state after the scene update in the session state memory module, overwriting the original scene state information, and provide the updated context for subsequent multi-round command interactions.
8. A language instruction parsing and scene control system for a digital twin platform, characterized in that, The system includes: The functional encapsulation module is used to store water conservancy function description specifications and standardized API mapping library. The water conservancy function description specifications define structured function description templates and water conservancy business-specific parameter types. The API mapping library establishes a one-to-one mapping between function description templates and the underlying functions of the 3D rendering engine and water conservancy mechanism model through middleware interfaces, providing a unified function call standard and semantic support for upper-layer modules. The input and status management module has a data input terminal that connects to the user interaction device and the digital twin platform. It is used to receive natural language commands input by the user through the voice or text interface, and at the same time extracts status information in real time from the rendering loop of the digital twin platform. The natural language commands and status information are structured and stored as a conversation context and output to the command parsing and generation module. The instruction parsing and generation module, whose data input end is connected to the output end of the input and state management module, is used to receive the natural language instructions and the session context. Based on the water conservancy function description specifications provided by the function encapsulation module, it identifies the user's core business intent and extracts water conservancy professional parameters. Then, according to the preset business logic constraints, it decomposes the complex intent into a sequence of atomic operations with logical dependencies, retrieves matching function interfaces from the API mapping library, assembles and generates an executable toolchain, and outputs it to the parameter verification and conversion module. The parameter verification and conversion module, whose data input end is connected to the output end of the instruction parsing and generation module, is used to perform business logic verification and format conversion on the parameters in the executable toolchain to ensure that the parameters meet the calling requirements of the underlying API of the 3D scene. After the verification is passed, an executable toolchain with standardized parameters is output. The execution scheduling and feedback module connects its data input end to the output end of the parameter verification and conversion module, and forms a two-way data interaction with the function encapsulation module. It is used to call the API mapping library interface in the function encapsulation module through an asynchronous scheduling mechanism to drive the 3D digital twin scene to update in real time. At the same time, it monitors the scene rendering status and model calculation progress, provides feedback on the execution results to the user in the form of voice or text, and sends the updated scene status back to the input and status management module to update the session context, forming a closed-loop interaction.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method of claim 1.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method of claim 1.