A terminal control method and system for an intercom based on an AI agent

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By deploying an AI intelligent agent control module in the walkie-talkie terminal, natural language interactive control is achieved, solving the problems of cumbersome operation and poor scene adaptability in existing technologies, and providing a professional functional control solution with high reliability and fast response.

CN122201279APending Publication Date: 2026-06-12SHANGHAI SHUGUO TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SHANGHAI SHUGUO TECH CO LTD
Filing Date: 2026-01-19
Publication Date: 2026-06-12

Application Information

Patent Timeline

19 Jan 2026

Application

12 Jun 2026

Publication

CN122201279A

IPC: G10L15/22; G10L15/18; G10L15/26; G06F40/30

AI Tagging

Application Domain

Semantic analysis Speech recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing smart walkie-talkies have cumbersome terminal control methods, poor scene adaptability, deeply fragmented professional functions, and general voice assistants cannot meet the high reliability requirements, especially in complex professional scenarios where it is difficult to achieve rapid response and safe control.

⚗Method used

An AI intelligent agent control module is deployed locally on the walkie-talkie terminal. It performs semantic parsing and intent recognition through natural language commands, generates executable commands using a local command mapping rule base, and executes operations through a predefined reliability verification control interface to ensure high reliability and real-time performance in unstable network environments.

🎯Benefits of technology

It enables efficient, accurate, and reliable control of various professional functions of the walkie-talkie through natural language without the need for a touchscreen or memorizing complex buttons. It is especially suitable for professional operating environments that require rapid response, high concentration, or limited network conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122201279A_ABST

Patent Text Reader

Abstract

The application relates to the technical field of intercom control, and discloses an intercom terminal control method and system based on an AI agent, the method comprising the following steps: S1, receiving a natural language instruction input by a user through an AI agent control module deployed locally on an intercom terminal, the AI agent control module being configured to preferentially operate in an offline mode, and the natural language instruction comprising at least one of a voice instruction and a text instruction; the application constructs a complete and closed-loop natural language interaction control process by deploying an AI agent control module operating in an offline preferential mode locally on an intercom terminal, and the application is particularly suitable for professional working environments requiring quick response, high concentration or limited network conditions, and can overcome the defects of the prior art, such as complicated operation, poor scene adaptability, and deep split between general voice assistants and professional functions, and cannot meet high reliability requirements.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical field of walkie-talkie control, and in particular to a terminal control method and system for walkie-talkies based on AI intelligent agents. Background Technology

[0002] With the deep integration of communication technology and smart terminal technology, Android-based smart walkie-talkies have become core communication and operational tools in professional fields such as public safety, transportation, industrial manufacturing, and emergency command. Modern smart walkie-talkies, while inheriting the traditional group calling function, have deeply integrated rich smart terminal capabilities, including multimedia processing, multiple business applications, complex network connections, and high-definition displays. This has transformed walkie-talkies from simple voice communication devices into integrated mobile intelligent information platforms. Consequently, the complexity of terminal control has significantly increased, requiring users to manage numerous functions such as volume, mode, applications, network connections, and display settings. This places new and higher demands on human-computer interaction methods.

[0003] Currently, the terminal control of these Android walkie-talkies mainly relies on traditional physical buttons, touchscreen gestures, and multi-level graphical menu navigation. These interaction methods face significant limitations in complex professional scenarios: First, the operation path is lengthy, requiring users to memorize specific button combinations or search for the target function through a series of menus, which is inefficient when there are many functions and is not conducive to rapid response in emergency situations; Second, the interaction methods place high demands on the user's hands and the environment. Touch or precise button operation becomes difficult when hands are occupied (such as equipment operation or material handling), in dim lighting, or in operations requiring high visual attention. There are even security risks; furthermore, the existing control methods have fragmented function calls. For example, adjusting the call volume and launching a work order application usually require switching between different interfaces, which cannot achieve natural cross-function operation. Although general voice assistants have been widely adopted in the consumer electronics field, their instruction sets and control depth are mainly designed around consumer-level application scenarios. They lack understanding and support for the functions unique to professional walkie-talkies (such as proprietary communication mode switching, dispatch command issuance, and hardware-level security control). Moreover, their cloud-based architecture is difficult to meet the control requirements of high reliability and low latency in private network environments with poor network conditions or those that emphasize privacy and real-time performance. Summary of the Invention

[0004] To address the problems of cumbersome operation, poor adaptability to various scenarios, deep disconnect from professional functions, and inability of general-purpose voice assistants to meet the high reliability requirements of private networks in the existing technologies, this application provides a terminal control method and system for walkie-talkies based on AI intelligent agents.

[0005] In a first aspect, this application provides a terminal control method and system for walkie-talkies based on AI intelligent agents, employing the following technical solution: A terminal control method for walkie-talkies based on AI intelligent agents, the method comprising: S1. The AI intelligent agent control module deployed locally on the walkie-talkie terminal receives natural language commands input by the user. The AI intelligent agent control module is configured to operate in offline mode first. The natural language commands include at least one of voice commands and text commands. S2. The AI intelligent agent control module uses an integrated natural language understanding model to perform semantic parsing and user intent recognition on the received natural language instructions to obtain structured intent and parameters, and outputs the confidence evaluation value of the recognition result; wherein the intent is the category of control operation that the user expects to perform, and the parameters are the specific attributes and target values of the control operation. S3. Match and map intents and parameters using a pre-stored, extensible instruction mapping rule base to generate executable terminal control instructions compatible with the Android operating system or specific industry applications within the walkie-talkie. S4. Execute executable terminal control instructions by calling a predefined behavior-determined control interface to achieve at least one function control of the walkie-talkie terminal. The predefined behavior-determined control interface refers to a software interface in a specific hardware and software environment of the walkie-talkie terminal where there is a stable mapping relationship between the calling behavior and the system response. The function control includes volume control, audio playback and tone control, application lifecycle management control, dedicated walkie-talkie working mode setting, terminal hardware and connection status control, display font adjustment, and screen display parameter adjustment. S5. After the executable terminal control command is executed, the system will provide feedback on the command execution result status to the user through the local voice synthesis module or screen display module of the walkie-talkie terminal.

[0006] By adopting the above technical solution, a complete natural language control method with local AI intelligent agent as the core and offline priority is constructed, realizing full-process intelligent control from instruction parsing to function execution, covering touchless control of various key functions of walkie-talkies.

[0007] Optionally, steps S1 to S3 are all completed locally offline on the walkie-talkie terminal; When the local instruction mapping rule base cannot match, or when the confidence level of the local natural language understanding model is lower than a preset threshold, the AI intelligent agent control module initiates a network request to obtain auxiliary parsing data.

[0008] By adopting the above technical solution, the principle of processing the core process entirely locally offline was established, and network requests were used only as a backup auxiliary means, which effectively ensured the reliability, real-time performance and privacy security of operations in unstable network environments.

[0009] Optionally, in step S2, the semantic parsing and user intent recognition process includes: The AI intelligent agent control module utilizes its integrated natural language understanding model to perform deep semantic understanding of the instruction text; The deep semantic understanding includes: identifying and extracting the intent categories related to terminal control and the operation parameters associated with those intents from the instruction text; The natural language understanding model outputs a confidence assessment of the specific intent expressed by the current instruction and its associated parameters through computational analysis. The AI intelligent agent control module determines whether the preset reliability standard is met based on the confidence assessment result; When the preset reliability standard is met, the intent category and operation parameters output by the model are adopted to form a structured intent and parameters, and step S3 is executed.

[0010] By adopting the above technical solution, a reliability judgment mechanism for identification results based on confidence assessment is introduced. The intention is adopted only when the preset standard is met, thereby significantly reducing the error rate and improving the accuracy of control.

[0011] Optionally, in step S2, the AI intelligent agent control module performs semantic parsing and user intent recognition on the natural language instructions, and the structured output may contain one or more intents and parameters. When the structured result contains two or more intentions and parameters, the method further includes the following steps before step S3: Perform local task planning and dynamic priority sorting on multiple stated intentions and parameters; Among them, the real-time execution priority of the k-th intent and its parameters Calculate using the following formula: ; In the formula, The base weights are assigned to the k-th intent and parameters based on predefined rules. The estimated time required to execute the operation corresponding to the k-th intention and parameters. The maximum system response latency allowed in the current scenario. Let be the coefficient representing the safety criticality level of the operation corresponding to the k-th intention and parameters. , , These are the weighting coefficients; Based on the calculated priority order, step S3 is executed sequentially for each intent and parameter to generate the corresponding executable terminal control instructions.

[0012] By adopting the above technical solution, an intelligent task scheduling method for processing complex instructions is provided. By dynamically calculating the priority of multiple sub-intents through a quantification formula and executing them in order, the rational and orderly processing of complex instructions is achieved.

[0013] Optionally, the predefined behavior-defined control interface undergoes reliability verification for the specific hardware and software environment of the walkie-talkie terminal before being deployed thereon. In simulated or actual operational scenarios, the candidate control interface is called multiple times for testing, and the total number of calls and the number of calls that successfully trigger the expected system state change are recorded. Based on the recorded test data, the call success rate of the interface is calculated, and the call success rate is defined as the proportion of the number of calls that successfully trigger the expected state change to the total number of calls; The calculated call success rate is compared with a preset reliability threshold; When the success rate of the call is higher than the preset reliability threshold, the candidate control interface is finally adopted and used to execute the executable terminal control command in step S4.

[0014] By adopting the above technical solution, a pre-verification screening mechanism for the control interface was established. Success rate testing ensures that the behavior of each interface is deterministic and reliable in a specific environment, thus guaranteeing the stability of instruction execution from the bottom layer.

[0015] Optionally, the method optimizes the entire processing flow consisting of steps S1 to S4 to ensure that the total delay experienced from receiving user instructions to the corresponding function control taking effect meets the real-time requirements of professional operation scenarios. The total latency is obtained by summing up four parts: instruction input time, semantic parsing and intent recognition time, instruction mapping and generation time, and control instruction execution time. Wherein, the instruction input time refers to the time consumed from the user's input to the AI intelligent agent control module fully receiving the natural language instruction; the semantic parsing and intent recognition time refers to the time consumed in completing step S2; the instruction mapping and generation time refers to the time consumed in completing step S3; and the control instruction execution time refers to the time consumed in completing step S4. By employing local processing, optimizing interface calls, and pre-allocating system resources, the total latency under typical operating conditions is lower than the maximum allowable latency time set for specific professional operation scenarios, while ensuring that the overall success rate of instruction execution is higher than the minimum reliability requirements set for that scenario.

[0016] By adopting the above technical solution, the end-to-end latency is decomposed into four measurable stages, and a quantitative target for optimizing latency while ensuring success rate is clearly defined, providing clear guidance for system performance design.

[0017] Optionally, the method further includes: S6. On the local side of the walkie-talkie terminal, continuously collect and analyze the performance data generated during the execution instruction processing link; the performance data includes at least the parsing time of step S2, the mapping and generation time of step S3, the execution time of step S4, the call success rate of each control interface, and the instruction correction operation records initiated by the user. S7. Based on the analysis of historical performance data, dynamically adjust the local processing resource allocation strategy of the AI intelligent agent control module, and iteratively update the mapping paths of high-frequency instructions in the instruction mapping rule base. While meeting the preset minimum requirements for overall instruction execution success rate, minimize the average total instruction latency based on long-term statistics; The overall success rate of instruction execution is defined as the proportion of the number of successfully executed instructions to the total number of instructions within the statistical period. The average total instruction latency is defined as the average of the total processing time from receiving an instruction to the activation of function control within a statistical period.

[0018] By adopting the above technical solutions, the system is equipped with self-optimization capabilities based on local performance data and user feedback, and can dynamically adjust strategies to continuously improve processing success rate and efficiency.

[0019] Optionally, the dedicated intercom working mode setting includes switching between group call mode, one-way call mode, broadcast mode, monitoring mode, silent mode, emergency alarm mode and power saving mode; The application lifecycle management control includes the control over the startup, switching, background operation, and shutdown of dispatch console software, inspection management system, or work order processing application.

[0020] By adopting the above technical solution, the control range of the dedicated intercom mode and industry applications is specifically defined, realizing direct voice control of professional core functions and demonstrating the domain-specificity of the solution.

[0021] Optionally, the terminal hardware and connection status control includes enabling or disabling the Bluetooth module, Wi-Fi module, cellular mobile data module, GPS positioning module, physical alarm button, and dedicated service channel; The volume control supports independent adjustment of media volume, call volume, ringtone volume, and private network notification tone volume; The screen display parameter adjustment includes automatically adjusting the screen brightness and color temperature based on ambient light sensor data or instructions.

[0022] By adopting the above technical solutions, the ability to comprehensively and precisely control underlying parameters such as hardware connection, audio, and display has been refined, meeting the diverse management needs of device status in professional scenarios.

[0023] Secondly, this application provides a terminal control system for walkie-talkies based on AI intelligent agents, which adopts the following technical solution: A walkie-talkie terminal control system based on an AI agent, used to implement the walkie-talkie terminal control method based on an AI agent as described in any one of the above, the system comprising: The user interaction module provides a natural language command input interface and a command execution result feedback interface. A local AI intelligent agent control module, deployed locally on the terminal as a core processing unit, is connected to the user interaction module and includes: A speech recognition unit is used to convert speech commands into text locally; The Natural Language Understanding Unit is used for local semantic parsing and intent recognition; The local instruction mapping and generation unit connects to a local, extensible instruction mapping rule base and is used to generate executable instructions. A behavior-defined control interface adaptation layer is connected to the local AI agent control module and encapsulates a set of system-level and application-level control interfaces that have been verified for reliability, for executing the generated executable instructions; A local feedback generation module, connected to the control interface adaptation layer for behavior determination and the user interaction module, is used to generate and output feedback information; The performance monitoring and self-optimization module is used to monitor the performance indicators of the instruction processing link in real time and dynamically adjust the system parameters.

[0024] By adopting the above technical solution, a dedicated system for implementing the aforementioned method is provided. Through modular design, it integrates local processing, reliable interfaces, and self-optimization capabilities, forming a complete hardware implementation solution.

[0025] In summary, this application includes at least one of the following beneficial technical effects: This invention constructs a complete, closed-loop natural language interactive control process by deploying an AI intelligent agent control module locally on the walkie-talkie terminal, operating in an offline-first mode. First, it receives and parses the user's voice or text commands using a local model, transforming them into explicit control intentions. Then, by querying the local instruction mapping rule base, the intentions are mapped into executable instructions that can directly call system or application interfaces. Finally, the operation is executed through a set of pre-determined and reliable control interfaces, providing clear feedback to the user. This method addresses the shortcomings of existing technologies, such as cumbersome operation, poor scenario adaptability (e.g., hands-free operation, harsh environments), and the deep separation between general voice assistants and professional functions, failing to meet high reliability requirements. It achieves a significant improvement by enabling efficient, accurate, and reliable control of various professional walkie-talkie functions through natural language, without the need for touchscreens or memorizing complex buttons. It is particularly suitable for professional operating environments requiring rapid response, high concentration, or limited network conditions. Attached Figure Description

[0026] Figure 1 This is a flowchart illustrating a terminal control method for walkie-talkies based on AI intelligent agents proposed in this invention.

[0027] Figure 2 This is a schematic block diagram of a terminal control system for walkie-talkies based on an AI intelligent agent, as proposed in this invention. Detailed Implementation

[0028] The embodiments of this application are described in detail below, and examples of the embodiments are shown in the accompanying drawings.

[0029] In the description of this specification, the references to "certain embodiments," "one embodiment," "some embodiments," "illustrative embodiment," "example," "specific example," or "some examples" refer to specific features, structures, materials, or characteristics described in connection with the described embodiment or example, which are included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0030] This application discloses a terminal control method for a walkie-talkie based on an AI agent, referring to... Figure 1 The method includes: S1. The AI intelligent agent control module deployed and running locally on the walkie-talkie terminal device receives natural language commands input by the user. The AI intelligent agent control module is normally configured to run in an offline mode that prioritizes completing processing tasks independently on the terminal side. The natural language commands it can receive include at least one of voice commands and text commands. S2. The AI intelligent agent control module uses an integrated natural language understanding model to perform semantic parsing and user intent recognition on the received natural language instructions to obtain structured intent and parameters, and outputs the confidence evaluation value of the recognition result; wherein the intent is the category of control operation that the user expects to perform, and the parameters are the specific attributes and target values of the control operation. S3. Query the expandable instruction mapping rule library pre-stored locally on the walkie-talkie terminal, and match and map the structured intents and parameters obtained in step S2. The mapping rule library defines the conversion relationship from intents and parameters to specific system call instructions, and generates executable terminal control instructions that can be recognized and executed by the underlying interface of the Android operating system or specific industry applications in the walkie-talkie. S4. Execute executable terminal control instructions by calling a predefined behavior-determined control interface to achieve at least one function control of the walkie-talkie terminal. The predefined behavior-determined control interface refers to a software interface in a specific hardware and software environment of the walkie-talkie terminal where there is a stable mapping relationship between the calling behavior and the system response. The function control includes volume control, audio playback and tone control, application lifecycle management control, dedicated walkie-talkie working mode setting, terminal hardware and connection status control, display font adjustment, and screen display parameter adjustment. S5. After the executable terminal control command is executed, the local speech synthesis module of the walkie-talkie terminal is invoked to provide feedback to the user on the execution result status of the current command through voice broadcast or text prompts in a designated interface area via the screen display module.

[0031] Through the above technical solution, this embodiment provides a terminal control method for walkie-talkies based on AI intelligent agents. The method constructs a complete, closed-loop natural language interactive control process by deploying an AI intelligent agent control module locally on the walkie-talkie terminal, operating in an offline-first mode. The method first receives and uses a local model to parse the user's voice or text commands, converting them into explicit control intentions. Subsequently, by querying the local instruction mapping rule base, the intentions are mapped into executable instructions that can directly call system or application interfaces. Finally, the operation is executed through a set of pre-determined and reliable control interfaces, and clear feedback is given to the user. This method addresses the shortcomings of existing control methods, such as cumbersome operation, poor scene adaptability (e.g., hands-free operation, harsh environment), and the deep separation between general voice assistants and professional functions, which cannot meet high reliability requirements. It achieves significant progress in controlling various professional functions of walkie-talkies efficiently, accurately, and reliably through natural language without the need for touchscreens or memorizing complex buttons. It is particularly suitable for professional operating environments that require rapid response, high concentration, or limited network conditions.

[0032] In one embodiment, steps S1 to S3 are all completed locally offline on the walkie-talkie terminal; When the local instruction mapping rule base cannot match the intent and parameters, or when the confidence evaluation value output by the natural language understanding model is lower than the preset confidence threshold, the AI intelligent agent control module initiates a network request to obtain auxiliary parsing data from the cloud knowledge base or model, and fuses or selects the auxiliary results returned by the network with the local results to finally complete the instruction parsing and mapping. The preset confidence threshold is usually determined during the system development stage through performance evaluation of the model on the validation set. For example, by plotting the precision-recall curve, a threshold is selected that can effectively filter out most low-quality recognition results while ensuring a high instruction execution success rate (e.g., >95%).

[0033] In step S2, the semantic parsing and user intent recognition process includes: The AI intelligent agent control module utilizes its integrated natural language understanding model to perform deep semantic understanding of the instruction text; The deep semantic understanding includes: using the sequence labeling and classification capabilities of the model to identify and extract the intent category labels related to the control of the walkie-talkie terminal in the instruction text, as well as the specific operation parameter entities associated with the intent; The natural language understanding model outputs one or more candidate intent categories, corresponding parameter entities, and a confidence assessment value for the current instruction expressing the specific intent and its associated parameter combination through its internal forward propagation calculation and analysis. This value reflects the model's confidence in its own recognition results. The confidence assessment value comes directly from the output layer of the natural language understanding model (such as a Transformer-based classifier or a sequence-to-sequence model). For classification tasks, it is usually the highest category probability value after normalization by the Softmax function; for sequence labeling tasks, it can be a certain aggregation (such as average or minimum) value of the probabilities of all entity labels. It is a quantitative measure of the uncertainty within the model's current inference result. High confidence means that the model's judgment of the current input is very clear, which is mapped to the probability distribution of the target intent and parameters; low confidence means that the model's judgment is ambiguous or that the input differs greatly from the training data. The AI intelligent agent control module determines whether the preset reliability standard is met based on the confidence assessment result; When the preset reliability standard is met, the intent category and operation parameters output by the model are adopted to form a structured intent and parameters, and step S3 is executed; otherwise, the local parsing result is considered unreliable, which may trigger the network request process disclosed in claim 2.

[0034] In step S2, the AI intelligent agent control module performs semantic parsing and user intent recognition on the natural language instructions, and the structured output may contain one or more intents and parameters. When the structured result contains two or more intentions and parameters, the method further includes the following steps before step S3: Perform local task planning and dynamic priority sorting on multiple stated intentions and parameters; Among them, the real-time execution priority of the k-th intent and its parameters Calculate using the following formula: ; In the formula, The base weight is assigned to the k-th intent and parameters based on predefined rules. It is statically preset based on business rules and reflects the inherent importance of different function types. For example, the weight of "switching to emergency mode" is higher than that of "adjusting volume". The estimated time required to execute the operation corresponding to the k-th intent and parameters can be obtained through historical data analysis of the system operation log, or by preseting an empirical value based on the complexity of the operation. The maximum system response latency allowed for the current scenario is a preset constant based on the urgency of the scenario. It can be set by the system configuration file or manually by the user according to the work scenario (such as daily or emergency). For example, it can be set to 2 seconds for daily inspection and 500 milliseconds for emergency response. This is a coefficient representing the safety criticality level of the operation corresponding to the k-th intention and parameters. It can be graded and assigned a coefficient based on the degree of impact of the function on personal safety and communication security. For example, operations involving alarms or communication interruptions are considered high-safety levels and assigned a larger coefficient. , , The weighting coefficient can be determined during the system debugging phase through methods such as grid search, expert experience, or reinforcement learning to optimize the overall task completion efficiency and safety in specific scenarios. It is used to balance the influence of the three factors of basic weight, time urgency, and safety criticality on the final priority. Based on the calculated priority order, step S3 is executed sequentially for each intent and parameter to generate the corresponding executable terminal control instructions.

[0035] The predefined behavior-defined control interface undergoes reliability verification for the specific hardware and software environment of the walkie-talkie terminal before being deployed thereon. In simulated or actual operational scenarios, the candidate control interface is called multiple times for testing, and the total number of calls and the number of calls that successfully trigger the expected system state change are recorded. Based on the recorded test data, the call success rate of the interface is calculated, and the call success rate is defined as the proportion of the number of calls that successfully trigger the expected state change to the total number of calls; The calculated call success rate is compared with a preset reliability threshold. The preset reliability threshold can be determined according to the security integrity level of the walkie-talkie application and industry standards. For example, it can be set to 0.99 for non-critical functions and 0.999 or higher for critical communication or security functions. It is the minimum quality standard for whether an interface can be put into use. Only interfaces that pass this threshold test are considered to be qualified to provide deterministic behavior in the corresponding scenario. When the success rate of the call is higher than the preset reliability threshold, the candidate control interface is finally adopted and used to execute the executable terminal control command in step S4.

[0036] Through the above technical solutions, this embodiment provides a high-reliability assurance method for walkie-talkie systems. This method, through an offline-first dynamic processing strategy and a recognition result filtering mechanism based on quantified confidence, first ensures autonomous, rapid, and accurate instruction parsing in most scenarios from the source, effectively intercepting low-quality recognition. For complex instructions, a multi-factor intelligent scheduling algorithm integrating functional importance, time urgency, and safety criticality is used to achieve reasonable and safe sequencing and execution of multiple sub-intents. Finally, by rigorously verifying the preset success rate and conducting access screening on the control interface, the stability and certainty of each operation are ensured from the execution layer. This method, with its interconnected steps, constructs a reliability enhancement system across the entire chain from intent understanding and task planning to instruction execution. It systematically addresses the stringent requirements of operational efficiency, complex instruction processing, environmental adaptability, and execution certainty in professional scenarios, thus achieving high-quality, reliable, and intelligent control that is fundamentally different from general solutions.

[0037] In one embodiment, the method optimizes the entire processing flow consisting of steps S1 to S4 to ensure that the total delay experienced from receiving user instructions to the corresponding function control taking effect meets the real-time requirements of professional operation scenarios. The total latency is obtained by summing up four parts: instruction input time, semantic parsing and intent recognition time, instruction mapping and generation time, and control instruction execution time. Wherein, the instruction input time refers to the time consumed from the user's input to the AI intelligent agent control module fully receiving the natural language instruction; the semantic parsing and intent recognition time refers to the time consumed in completing step S2; the instruction mapping and generation time refers to the time consumed in completing step S3; and the control instruction execution time refers to the time consumed in completing step S4. By employing local processing, optimized interface calls, and pre-allocated system resources, the total latency under typical operating conditions is lower than the maximum allowable latency time set for a specific professional operation scenario. At the same time, it ensures that the overall success rate of instruction execution is higher than the minimum reliability requirement set for that scenario. The two key performance indicators, the maximum allowable latency time and the minimum reliability requirement, are usually set based on industry standards, user experience research, or specific business contract requirements. They are objective quantitative benchmarks for measuring whether the method meets the needs of a specific professional scenario.

[0038] The method further includes: S6. On the local side of the walkie-talkie terminal, continuously collect and analyze the performance data generated during the execution instruction processing link; the performance data includes at least the parsing time of step S2, the mapping and generation time of step S3, the execution time of step S4, the call success rate of each control interface, and the instruction correction operation records initiated by the user. S7. Based on the analysis of historical performance data, dynamically adjust the local processing resource allocation strategy of the AI intelligent agent control module, and iteratively update the mapping paths of high-frequency instructions in the instruction mapping rule base. While meeting the preset minimum requirements for overall instruction execution success rate, minimize the average total instruction latency based on long-term statistics; The overall success rate of instruction execution is defined as the proportion of all successfully executed instructions to the total number of instructions within a statistical period (usually a preset time window or instruction number window, which can be set by the system default or adjusted according to the learning rate). The average total instruction latency is defined as the average of the total processing time from receiving an instruction to the activation of function control within a statistical period.

[0039] The dedicated intercom working mode settings include switching between group call mode, one-call mode, broadcast mode, monitoring mode, silent mode, emergency alarm mode, and power saving mode; The application lifecycle management control includes the control over the startup, switching, background operation, and shutdown of dispatch console software, inspection management system, or work order processing application.

[0040] The terminal hardware and connection status control includes enabling or disabling the Bluetooth module, Wi-Fi module, cellular mobile data module, GPS positioning module, physical alarm button, and dedicated service channel. The volume control supports independent adjustment of media volume, call volume, ringtone volume, and private network notification tone volume; The screen display parameter adjustment includes automatically adjusting the screen brightness and color temperature based on ambient light sensor data or instructions.

[0041] Through the above technical solutions, this embodiment provides a complete method integrating performance optimization, self-learning, deep function integration, and refined control. This method defines and monitors quantitative indicators and performance thresholds for end-to-end latency, providing clear engineering benchmarks and optimization directions for the real-time performance and reliability of target scenarios. Furthermore, by constructing a closed-loop self-optimizing system based on local performance data collection and analysis, the system can continuously adapt to actual usage patterns, dynamically improving success rate and efficiency. This method explicitly covers the control of various communication modes and core industry applications unique to professional walkie-talkies, as well as comprehensive and refined management of various hardware connections, independent audio channels, and screen display parameters. These solutions work together to not only systematically solve the problems of cumbersome operation and fragmented function calls in existing technologies, but also achieve operational efficiency, scenario adaptability, and control depth far exceeding fixed menus or general assistants through quantitative optimization and adaptive learning, fully meeting the stringent requirements of professional fields for terminal control.

[0042] This application also discloses a terminal control system for walkie-talkies based on AI intelligent agents, referring to... Figure 2 The system is used to implement the AI-based terminal control method for walkie-talkies described in any one of the above-mentioned methods, the system comprising: The user interaction module provides a natural language command input interface and a command execution result feedback interface. A local AI intelligent agent control module, deployed locally on the terminal as a core processing unit, is connected to the user interaction module and includes: A speech recognition unit is used to convert speech commands into text locally; The Natural Language Understanding Unit is used for local semantic parsing and intent recognition; The local instruction mapping and generation unit connects to a local, extensible instruction mapping rule base and is used to generate executable instructions. A behavior-defined control interface adaptation layer is connected to the local AI agent control module and encapsulates a set of system-level and application-level control interfaces that have been verified for reliability, for executing the generated executable instructions; A local feedback generation module, connected to the control interface adaptation layer for behavior determination and the user interaction module, is used to generate and output feedback information; The performance monitoring and self-optimization module is used to monitor the performance indicators of the instruction processing link in real time and dynamically adjust the system parameters.

[0043] Through the above technical solution, this embodiment provides a terminal control system for walkie-talkies based on AI intelligent agents. This system, through modular integration of a local AI intelligent agent control core, a reliable deterministic control interface adaptation layer, and a performance monitoring and self-optimization module, constructs a hardware and software collaborative architecture that extends from natural language interactive input to underlying function execution and has continuous learning and optimization capabilities. This system fully carries and implements the aforementioned method claims. By localizing all key processing capabilities, determinizing interface behavior, and making system status monitorable, it provides walkie-talkies with an intuitive, responsive, reliable, and continuously self-improving intelligent control solution in complex professional scenarios. It fundamentally overcomes the systemic defects of existing technologies, such as strong dependence on physical operation, separation from professional functions, and insufficient reliability of general solutions.

[0044] Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of this application.

Claims

1. A terminal control method for a walkie-talkie based on an AI intelligent agent, characterized in that, The method includes: S1. The AI intelligent agent control module deployed locally on the walkie-talkie terminal receives natural language commands input by the user. The AI intelligent agent control module is configured to operate in offline mode first. The natural language commands include at least one of voice commands and text commands. S2. The AI intelligent agent control module uses an integrated natural language understanding model to perform semantic parsing and user intent recognition on the received natural language instructions to obtain structured intent and parameters, and outputs the confidence evaluation value of the recognition result; wherein the intent is the category of control operation that the user expects to perform, and the parameters are the specific attributes and target values of the control operation. S3. Match and map intents and parameters using a pre-stored, extensible instruction mapping rule base to generate executable terminal control instructions compatible with the Android operating system or specific industry applications within the walkie-talkie. S4. Execute executable terminal control instructions by calling a predefined behavior-determined control interface to achieve at least one function control of the walkie-talkie terminal. The predefined behavior-determined control interface refers to a software interface in a specific hardware and software environment of the walkie-talkie terminal where there is a stable mapping relationship between the calling behavior and the system response. The function control includes volume control, audio playback and tone control, application lifecycle management control, dedicated walkie-talkie working mode setting, terminal hardware and connection status control, display font adjustment, and screen display parameter adjustment. S5. After the executable terminal control command is executed, the system will provide feedback on the command execution result status to the user through the local voice synthesis module or screen display module of the walkie-talkie terminal.

2. The terminal control method for a walkie-talkie based on an AI agent according to claim 1, characterized in that, Steps S1 to S3 are all completed offline locally on the walkie-talkie terminal; When the local instruction mapping rule base cannot match, or when the confidence level of the local natural language understanding model is lower than a preset threshold, the AI intelligent agent control module initiates a network request to obtain auxiliary parsing data.

3. The terminal control method for a walkie-talkie based on an AI agent according to claim 2, characterized in that, In step S2, the semantic parsing and user intent recognition process includes: The AI intelligent agent control module utilizes its integrated natural language understanding model to perform deep semantic understanding of the instruction text; The deep semantic understanding includes: identifying and extracting the intent categories related to terminal control and the operation parameters associated with those intents from the instruction text; The natural language understanding model outputs a confidence assessment of the specific intent expressed by the current instruction and its associated parameters through computational analysis. The AI intelligent agent control module determines whether the preset reliability standard is met based on the confidence assessment result; When the preset reliability standard is met, the intent category and operation parameters output by the model are adopted to form a structured intent and parameters, and step S3 is executed.

4. The terminal control method for a walkie-talkie based on an AI agent according to claim 3, characterized in that, In step S2, the AI intelligent agent control module performs semantic parsing and user intent recognition on the natural language instructions, and the structured output may contain one or more intents and parameters. When the structured result contains two or more intentions and parameters, the method further includes the following steps before step S3: Perform local task planning and dynamic priority sorting on multiple stated intentions and parameters; Among them, the real-time execution priority of the k-th intent and its parameters Calculate using the following formula: ； In the formula, The base weights are assigned to the k-th intent and parameters based on predefined rules. The estimated time required to execute the operation corresponding to the k-th intention and parameters. The maximum system response latency allowed in the current scenario. Let be the coefficient representing the safety criticality level of the operation corresponding to the k-th intention and parameters. , , These are the weighting coefficients; Based on the calculated priority order, step S3 is executed sequentially for each intent and parameter to generate the corresponding executable terminal control instructions.

5. The terminal control method for a walkie-talkie based on an AI agent according to claim 4, characterized in that, The predefined behavior-defined control interface undergoes reliability verification for the specific hardware and software environment of the walkie-talkie terminal before being deployed thereon. In simulated or actual operational scenarios, the candidate control interface is called multiple times for testing, and the total number of calls and the number of calls that successfully trigger the expected system state change are recorded. Based on the recorded test data, the call success rate of the interface is calculated, and the call success rate is defined as the proportion of the number of calls that successfully trigger the expected state change to the total number of calls; The calculated call success rate is compared with a preset reliability threshold; When the success rate of the call is higher than the preset reliability threshold, the candidate control interface is finally adopted and used to execute the executable terminal control command in step S4.

6. The terminal control method for a walkie-talkie based on an AI agent according to claim 5, characterized in that, The method optimizes the entire processing flow consisting of steps S1 to S4 to ensure that the total delay experienced from receiving user instructions to the corresponding function control taking effect meets the real-time requirements of professional operation scenarios. The total latency is obtained by summing up four parts: instruction input time, semantic parsing and intent recognition time, instruction mapping and generation time, and control instruction execution time. Wherein, the instruction input time refers to the time consumed from the user's input to the AI intelligent agent control module fully receiving the natural language instruction; the semantic parsing and intent recognition time refers to the time consumed in completing step S2; the instruction mapping and generation time refers to the time consumed in completing step S3; and the control instruction execution time refers to the time consumed in completing step S4. By employing local processing, optimizing interface calls, and pre-allocating system resources, the total latency under typical operating conditions is lower than the maximum allowable latency time set for specific professional operation scenarios, while ensuring that the overall success rate of instruction execution is higher than the minimum reliability requirements set for that scenario.

7. A terminal control method for a walkie-talkie based on an AI agent according to claim 6, characterized in that, The method further includes: S6. On the local side of the walkie-talkie terminal, continuously collect and analyze the performance data generated during the execution instruction processing link; the performance data includes at least the parsing time of step S2, the mapping and generation time of step S3, the execution time of step S4, the call success rate of each control interface, and the instruction correction operation records initiated by the user. S7. Based on the analysis of historical performance data, dynamically adjust the local processing resource allocation strategy of the AI intelligent agent control module, and iteratively update the mapping paths of high-frequency instructions in the instruction mapping rule base. While meeting the preset minimum requirements for overall instruction execution success rate, minimize the average total instruction latency based on long-term statistics; The overall success rate of instruction execution is defined as the proportion of the number of successfully executed instructions to the total number of instructions within the statistical period. The average total instruction latency is defined as the average of the total processing time from receiving an instruction to the activation of function control within a statistical period.

8. The terminal control method for a walkie-talkie based on an AI agent according to claim 1, characterized in that, The dedicated intercom working mode settings include switching between group call mode, one-call mode, broadcast mode, monitoring mode, silent mode, emergency alarm mode, and power saving mode; The application lifecycle management control includes the control over the startup, switching, background operation, and shutdown of dispatch console software, inspection management system, or work order processing application.

9. A terminal control method for a walkie-talkie based on an AI agent according to claim 1, characterized in that, The terminal hardware and connection status control includes enabling or disabling the Bluetooth module, Wi-Fi module, cellular mobile data module, GPS positioning module, physical alarm button, and dedicated service channel. The volume control supports independent adjustment of media volume, call volume, ringtone volume, and private network notification tone volume; The screen display parameter adjustment includes automatically adjusting the screen brightness and color temperature based on ambient light sensor data or instructions.

10. A terminal control system for a walkie-talkie based on an AI intelligent agent, characterized in that, A system for implementing a walkie-talkie terminal control method based on an AI agent according to any one of claims 1-9, the system comprising: The user interaction module provides a natural language command input interface and a command execution result feedback interface. A local AI intelligent agent control module, deployed locally on the terminal as a core processing unit, is connected to the user interaction module and includes: A speech recognition unit is used to convert speech commands into text locally; The Natural Language Understanding Unit is used for local semantic parsing and intent recognition; The local instruction mapping and generation unit connects to a local, extensible instruction mapping rule base and is used to generate executable instructions. A behavior-defined control interface adaptation layer is connected to the local AI agent control module and encapsulates a set of system-level and application-level control interfaces that have been verified for reliability, for executing the generated executable instructions; A local feedback generation module, connected to the control interface adaptation layer for behavior determination and the user interaction module, is used to generate and output feedback information; The performance monitoring and self-optimization module is used to monitor the performance indicators of the instruction processing link in real time and dynamically adjust the system parameters.