Human-computer collaborative adaptive regulation method, system and electronic device

By decoding user cognitive states in real time and dynamically intervening, and using POMDP and contextual multi-armed slot machine algorithms to generate compound prompt word templates, the cognitive dependence and streaming output problems in human-computer collaborative learning are solved, achieving safe and dynamic adaptive control and maintaining the safety and smoothness of user interaction.

CN122241694APending Publication Date: 2026-06-19TIANJIN NORMAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TIANJIN NORMAL UNIVERSITY
Filing Date
2026-05-21
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies in human-machine collaborative learning suffer from problems such as cognitive dependence, expertise reversal effect, neglect of emotional dimension, and inability to achieve real-time secure control of streaming output.

Method used

By decoding the user's cognitive engagement in real time, using POMDP and contextual multi-armed slot machine algorithms to select intervention actions, generating compound prompt word templates, and intercepting and asynchronously rewriting the text before streaming output, dynamic adaptive control of cognitive and emotional dimensions is achieved.

Benefits of technology

It achieves real-time secure control without revealing the original answer, avoids cognitive outsourcing and expertise reversal, maintains user interaction motivation, and provides low-latency secure intervention text.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241694A_ABST
    Figure CN122241694A_ABST
Patent Text Reader

Abstract

This invention relates to the field of human-computer interaction technology, providing a human-computer collaborative adaptive control method, system, and electronic device. The method involves: acquiring human-computer interaction data and decoding it in real time to obtain a probability sequence of cognitive engagement states; using POMDP confidence smoothing to determine the state and duration, and outputting a control signal when a threshold is reached; extracting interaction features from the response signals, using contextual multi-armed slot machine to pre-select intervention actions and cross-validating them; generating a composite prompt word template constrained by cognitive and emotional dimensions; monitoring the streaming output of the main model, intercepting and caching it when not silent; after the streaming ends, calling the auxiliary model to asynchronously rewrite according to the template, generating secure text and sending it according to the server-side event protocol; subsequently collecting feature increments to calculate rewards, and updating the slot machine weight vector and covariance matrix. This achieves real-time secure intervention and dynamic adaptive control of streaming output.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of human-computer interaction technology, and more specifically, to a human-computer collaborative adaptive control method, system, and electronic device. Background Technology

[0002] With generative large language models deeply integrated into educational scenarios, human-machine collaborative learning has become an important paradigm for intelligent knowledge construction. However, AI systems with strong generative capabilities can easily induce cognitive dependence in users, causing them to abandon independent review and verification of the output content, thus leading to cognitive outsourcing. To avoid these risks, existing technologies mostly use static, fixed prompt word scaffolds to intervene in user interaction.

[0003] However, in practical applications, the existing solutions have the following significant drawbacks:

[0004] First, static prompts lack a fine-grained distinction between the user's real-time cognitive engagement, which can easily trigger a expertise reversal effect. For users who are already in a state of deep active building, forced delayed questions or redundant prompts not only fail to provide effective support, but also interrupt their cognitive process and add to their external cognitive load.

[0005] Secondly, existing prompting technologies are mainly limited to metacognitive regulation, neglecting the constraints of the emotional dimension. The system output generally exhibits high information density and low interactivity, lacking a sense of social presence and making it difficult to maintain users' intrinsic motivation and long-term exploration intentions.

[0006] Third, modern large language models generally use server-send event protocols for word-by-word streaming output. Conventional output review is usually performed after the entire text has been generated, resulting in the original solution being displayed in real time on the front end. At this point, intervention is completely ineffective, and millisecond-level real-time security control is impossible.

[0007] To address the aforementioned issues, existing technologies urgently need improvement. Summary of the Invention

[0008] The purpose of this application is to provide a human-machine collaborative adaptive control method, system, and electronic device, which has the advantages of being able to decode the user's cognitive engagement state in real time and implement dynamic adaptive intervention, avoiding cognitive outsourcing and expertise reversal effects, and achieving real-time secure control of streaming output without disclosing the original answer.

[0009] Firstly, this application provides a human-machine collaborative adaptive control method, including:

[0010] Acquire human-computer interaction data, decode the human-computer interaction data in real time, and obtain a probability sequence of the user's cognitive engagement state;

[0011] The probability sequence is smoothed with confidence based on POMDP to determine the current cognitive engagement state and the duration of the state. When the duration reaches a preset threshold, a control signal is output.

[0012] In response to the control signal, the interaction feature vector within the current sliding time window is extracted, and a predicted intervention action is selected from the preset intervention action set using the context multi-armed slot machine algorithm. The predicted intervention action is cross-validated with the current cognitive engagement state and the interaction feature vector to determine the final intervention action.

[0013] A composite prompt word template is generated based on the final intervention action. The composite prompt word template includes cognitive dimension constraint rules and emotional dimension constraint rules.

[0014] Monitor the streaming output of the main model used to generate the original text. When the final intervention action is a non-silent intervention, intercept and cache the streaming output to block real-time delivery.

[0015] After the streaming output is completed, the auxiliary model is invoked to asynchronously rewrite the cached text according to the compound prompt word template, generate security intervention text, and send it to the user terminal according to the streaming protocol.

[0016] Furthermore, the confidence smoothing of the probability sequence based on POMDP includes:

[0017] Based on the confidence distribution of the previous time step, the observation probability of the current time step, and the state transition probability, the confidence distribution of the current time step is recursively updated through Bayesian inference.

[0018] The state with the highest confidence is extracted as the current cognitive investment state, and the duration of the current cognitive investment state is tracked; the current cognitive investment state includes high dependency probability state and high construction probability state;

[0019] When the current cognitive input state is a high-dependency state and the duration reaches a first preset threshold, the control signal is output; when the current cognitive input state is a high-construction probability state and the duration reaches a second preset threshold, a silent protection signal is output.

[0020] Furthermore, the preset intervention action set includes output step-level guidance information, output question guidance information, output concept-level guidance information, and execution of raw output direct delivery; the interaction feature vector is composed of temporal rhythm features, semantic evolution features, and text reconstruction features.

[0021] The selection of predicted intervention actions from a preset set of intervention actions using a context-based multi-armed slot machine algorithm includes:

[0022] The interaction feature vector is multiplied by the weight parameter vector corresponding to each action in the preset intervention action set to obtain the expected reward value of each action. The action with the largest expected reward value is selected as the predicted intervention action.

[0023] Furthermore, the step of cross-validating the predicted intervention action with the current cognitive engagement state and the interaction feature vector includes:

[0024] When the predicted intervention action is to output rhetorical question guidance information, if the current cognitive input state is not a high-dependency probability state, or the interaction feature vector does not indicate that the text copy ratio exceeds the first preset ratio and the semantic evolution is stagnant, then the final intervention action will be downgraded to output step-level guidance information.

[0025] When the interactive feature vector indicates that the autonomous rewriting ratio exceeds the second preset ratio and the semantic span change rate is higher than the preset change rate, and the duration reaches the second preset threshold, the final intervention action will be forcibly overwritten as executing the original output directly.

[0026] Furthermore, the cognitive dimension constraint rules include:

[0027] When the final intervention action is to output step-level guidance information, the output text is restricted to include step-by-step operation instructions and conclusive content is blocked; when the final intervention action is to output rhetorical question guidance information, the output text is restricted to include conditional hypothetical questions and declarative sentences are blocked; when the final intervention action is to output concept-level guidance information, the output text is restricted to include principle explanations and operational step-level content is blocked.

[0028] Furthermore, the emotional dimension constraint rules include: a forced shift rule for person perspective, requiring the use of first-person narration; an upper limit control rule for academic terminology density, requiring the terminology density to not exceed a preset ratio; and a pre-injection rule for emotional reassurance corpus, which injects a preset reassurance text template into the first sentence of the reply when the duration of continuous interaction pauses exceeds a preset pause threshold.

[0029] Furthermore, the main model is streamed via an event sending protocol sent by the server;

[0030] The interception and caching of the streaming output includes: taking over the streaming output through an asynchronous coroutine, blocking the streaming data push operation to the downstream user terminal, and continuously concatenating the text entities of each data packet into a hidden buffer;

[0031] The process of generating security intervention text and sending it to the user terminal according to a streaming protocol includes: dividing the security intervention text into fine-grained data blocks, encapsulating them according to the server sending event protocol format, and sending them to the user terminal block by block through an asynchronous iterator.

[0032] Furthermore, after generating the security intervention text and sending it to the user terminal according to a streaming protocol, the method further includes:

[0033] Collect incremental data of interactive features within the next sliding time window, calculate the reward signal based on the feature changes before and after the intervention, and use the reward signal to update the weight parameter vector and covariance matrix in the context multi-armed slot machine algorithm.

[0034] Secondly, this application also proposes a human-machine collaborative adaptive control system based on real-time decoding of cognitive input, comprising:

[0035] The data acquisition module is used to acquire human-computer interaction data, decode the human-computer interaction data in real time, and obtain a probability sequence of the user's cognitive engagement state.

[0036] The smoothing module is used to perform confidence smoothing on the probability sequence based on POMDP, determine the current cognitive engagement state and the duration of the state, and output a control signal when the duration reaches a preset threshold.

[0037] The decision module is used to respond to the control signal, extract the interaction feature vector within the current sliding time window, select the predicted intervention action from the preset intervention action set using the context multi-armed slot machine algorithm, cross-validate the predicted intervention action with the current cognitive engagement state and the interaction feature vector, and determine the final intervention action.

[0038] A generation module is used to generate a composite prompt word template based on the final intervention action. The composite prompt word template includes cognitive dimension constraint rules and emotional dimension constraint rules.

[0039] The interception module is used to monitor the streaming output of the main model used to generate the original text. When the final intervention action is a non-silent intervention, the streaming output is intercepted and cached to block real-time delivery.

[0040] The delivery module is used to call the auxiliary model to asynchronously rewrite the cached text according to the compound prompt word template after the streaming output is completed, generate security intervention text, and deliver it to the user terminal according to the streaming protocol.

[0041] Thirdly, this application also proposes an electronic device comprising: one or more processors, and a memory for storing one or more computer programs; the computer programs are configured to be executed by the one or more processors, and the programs include steps for performing the human-machine collaborative adaptive control method as described in the first aspect.

[0042] As shown above, this case solves the problem of controlling leakage during generation in the context of large language model streaming generation by using server-sent event protocol streaming interception and hidden buffer accumulation technology to complete real-time interception and asynchronous rewriting before the original answer is leaked to the front end. Simultaneously, based on partially observable Markov decision processes, Bayesian confidence smoothing is applied to continuous multi-time-step probability sequences, combined with bidirectional time threshold adjudication to accurately determine the intervention timing. Contextual multi-armed slot machine dynamically selects and predicts intervention actions, and finite state machine cross-validation is used to execute security degradation and forced escalation, strictly avoiding the expertise reversal effect while adaptively matching the intervention intensity. Furthermore, the system synchronously applies cognitive and emotional constraints through composite prompt word templates, providing a tiered cognitive scaffold while maintaining user interaction motivation. It also calculates reward signals by collecting incremental interaction features after intervention, updating the weight parameter vector and covariance matrix of the contextual multi-armed slot machine in real time to achieve closed-loop self-optimization. The safe text rewritten by the auxiliary model is sent block by block according to the server-sent event protocol, with no difference between front-end rendering and the main model's native streaming output, ensuring users receive safe intervention while maintaining a low-latency interactive experience. Attached Figure Description

[0043] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0044] Figure 1 This is a flowchart illustrating the steps of the human-machine collaborative adaptive control method disclosed in an embodiment of the present invention;

[0045] Figure 2 This is a schematic diagram of the human-machine collaborative adaptive control system structure based on real-time decoding of cognitive input disclosed in an embodiment of the present invention. Detailed Implementation

[0046] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which these embodiments belong; the terminology used herein and in the specification of the application is for the purpose of describing particular embodiments only and is not intended to limit these embodiments; the terms "comprising" and "having," and any variations thereof, in the specification of these embodiments and the foregoing drawings, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification of these embodiments and the foregoing drawings are used to distinguish different objects, not to describe a particular order.

[0047] The implementation details of the technical solution in this embodiment are described in detail below:

[0048] This application proposes a human-machine collaborative adaptive control method, such as... Figure 1 As shown, the method includes:

[0049] S101, acquire human-computer interaction data, decode the human-computer interaction data in real time, and obtain a probability sequence of the user's cognitive engagement state.

[0050] In this embodiment, the system front-end continuously collects raw interaction data during the human-computer interaction process, including but not limited to user input text, keyboard editing time sequence records, text paste events, semantic query logs, and cursor hover trajectories. After processing by a pre-decoding model (e.g., a binary classification model based on a bidirectional long short-term memory network), the aforementioned raw interaction data outputs a probability sequence of the user's cognitive engagement state. This probability sequence includes probabilities of high-construction-probability states and high-dependency-probability states, with a set of binary probability distributions output at each time step.

[0051] S102, perform confidence smoothing on the probability sequence based on POMDP, determine the current cognitive engagement state and the duration of the state, and output a control signal when the duration reaches a preset threshold;

[0052] In S102, the confidence smoothing of the probability sequence based on POMDP includes the following steps:

[0053] Based on the confidence distribution of the previous time step, the observation probability of the current time step, and the state transition probability, the confidence distribution of the current time step is recursively updated through Bayesian inference.

[0054] The state with the highest confidence is extracted as the current cognitive investment state, and the duration of the current cognitive investment state is tracked; the current cognitive investment state includes high dependency probability state and high construction probability state;

[0055] When the current cognitive input state is a high-dependency state and the duration reaches a first preset threshold, the control signal is output; when the current cognitive input state is a high-construction probability state and the duration reaches a second preset threshold, a silent protection signal is output.

[0056] Specifically, in this embodiment, to reduce prediction noise at a single time step, the system deploys a time-series sliding window tracking module in a server-side microservice architecture (Python), maintaining a sliding time window with a fixed length of 120 seconds and a fixed step size of 30 seconds. The system maintains a sliding time window queue for each user session in the high-performance in-memory database Redis, recording the timestamps of core state transitions and initializing a POMDP (Partially Observable Markov Decision Process) state transition probability matrix based on a two-dimensional state space. Whenever a new 30-second data packet arrives, the policy engine uses a Bayesian inference mechanism, combining historical confidence distribution and current observation probability, to recursively update the user's true cognitive state. The core advantage of the Partially Observable Markov Decision Process lies in its noise-resistant smoothing: it does not rely on the probability extremes of a single time step, but rather filters out brief prediction fluctuations by tracking the confidence distribution and dwell time, ensuring accurate intervention timing and avoiding false triggers and missed triggers.

[0057] In the process of smoothing the probability sequence based on POMDP, the system recursively updates the confidence distribution at the current moment through Bayesian inference based on the confidence distribution at the previous moment, the observation probability at the current moment, and the state transition probability; it extracts the state with the highest confidence as the current cognitive engagement state and tracks the duration of the current cognitive engagement state.

[0058] Current state confidence level The calculation formula is as follows:

[0059]

[0060] Wherein, the normalization constant The calculation formula is:

[0061]

[0062] In the above formula: At the current sliding window step time, The previous moment; The set of real latent variable state spaces defined for the system (including active construction and cognitive outsourcing); For the current moment The system's estimated true state of the user's target, and ; For the previous moment The user's actual state, and ; For the current moment The user is in a state The latest probability confidence level; For the previous moment The user is in a state Historical probability confidence level; The observed features are the actual output of the current pre-decoding model. Let be the observation probability, representing the probability when the true state is At that time, the preceding model outputs the observation results. The probability is used to quantify the prediction error of the model; The state transition probability represents the user's state from the previous time step. Naturally evolving to the current state The prior probability; This is a normalization constant used to ensure that the system has a good understanding of the state space at the current time. The sum of the probabilities of all possible states is 1.

[0063] After completing the confidence update, the system extracts the state with the highest probability confidence as the current cognitive engagement state and tracks the duration for which this state remains unchanged in Redis. The current cognitive engagement state includes a high dependency probability state and a high construction probability state. Based on this, the system performs bidirectional dynamic threshold adjudication: when the current cognitive engagement state is a high dependency probability state and the duration reaches a first preset threshold (e.g., three consecutive time windows, i.e., 90 seconds), the system determines that the user has substantially fallen into cognitive dependency and immediately outputs the control signal downstream; when the current cognitive engagement state is a high construction probability state and the duration reaches a second preset threshold (e.g., four consecutive time windows, i.e., 120 seconds), the system determines that the user is in a deep self-knowledge reconstruction process. To avoid negative effects caused by teaching intervention, it immediately outputs a silent protection signal, physically blocking all explicit pop-ups or prompts. If the current cognitive engagement state changes during the tracking process, the duration counter is reset to zero, and silent monitoring continues.

[0064] S103, in response to the control signal, extract the interaction feature vector within the current sliding time window, select the predicted intervention action from the preset intervention action set using the context multi-armed slot machine algorithm, cross-validate the predicted intervention action with the current cognitive engagement state and the interaction feature vector, and determine the final intervention action.

[0065] In this embodiment, when the system receives the control signal output by S102, it immediately activates the intervention strategy allocation process.

[0066] In S103, the preset intervention action set includes output step-level guidance information, output question guidance information, output concept-level guidance information, and execution of raw output direct delivery; the interaction feature vector is composed of temporal rhythm features, semantic evolution features, and text reconstruction features.

[0067] In S103, the step of selecting a predicted intervention action from a preset set of intervention actions using a contextual multi-armed slot machine algorithm includes: performing a dot product operation between the interaction feature vector and the weight parameter vector corresponding to each action in the preset set of intervention actions to obtain the expected reward value of each action, and selecting the action with the largest expected reward value as the predicted intervention action.

[0068] In this embodiment, the temporal rhythm features include input interval duration, text editing pause rate, and continuous interaction pause duration; the semantic evolution features include query concept depth and semantic span change rate; and the text reconstruction features include external text copy ratio and autonomous rewriting ratio.

[0069] To balance the exploration and utilization of strategies in complex intervention contexts, the system employs the Linear Confidence Interval Upper Bound (LinUCB) algorithm to calculate the upper bound of the confidence interval for the expected return of each action. This algorithm is applied to any candidate action in the action space. In the current context Upper bound of the expected return confidence interval The calculation formula is as follows:

[0070]

[0071] Its decision formula is:

[0072]

[0073] In the above formula: This represents the time step for the current multi-armed slot machine to execute decisions; A set of discrete intervention action spaces defined for the system; For action space A candidate action in the process; For the current moment Extracted and stitched based on sliding time window Dimensional context feature vector; Actions learned online by the system of The weight parameter vector represents the linear contribution of each underlying feature to the expected return of the action; Predict actions for the model In the current context The average basic expected reward; For action of A feature covariance matrix is ​​used to record the distribution information of historical features; Covariance matrix The inverse matrix; Hyperparameters used to control the degree of model exploration The larger the value, the more the system tends to explore intervention actions with higher current uncertainty (i.e., fewer accesses); The exploration compensation term is the upper bound of the confidence interval, representing the uncertainty of the current model in predicting the reward for this action; For action The upper bound of the final expected return is a combination of the mean of the basic estimates and the uncertainty of exploration. This refers to the predictive intervention action output by the multi-armed slot machine model based on the maximization principle.

[0074] The system iterates through the preset set of intervention actions and selects the candidate action with the largest upper bound of the expected return confidence interval as the predicted intervention action. For the cold start phase of new users, the covariance matrix... Initialize it as an identity matrix, and preheat the weights by introducing population-based prior weight parameters or by decoding historical data beforehand.

[0075] In S103, the step of cross-validating the predicted intervention action with the current cognitive engagement state and the interaction feature vector includes: when the predicted intervention action is to output question guidance information, if the current cognitive engagement state is not a high-dependency state, or the interaction feature vector does not indicate that the text copy ratio exceeds the first preset ratio and the semantic evolution is stagnant, then the final intervention action is downgraded to output step-level guidance information; when the interaction feature vector indicates that the autonomous rewriting ratio exceeds the second preset ratio and the semantic span change rate is higher than the preset change rate, and the duration reaches the second preset threshold, the final intervention action is forcibly overwritten to execute the original output direct delivery.

[0076] Specifically, in this embodiment, after outputting the predicted intervention action, to prevent the machine learning model from outputting inappropriate instructions in the early exploration stage, the system introduces a cross-validation mechanism based on a finite state machine and an expert verification rule base. The policy engine comprehensively considers the predicted intervention action, the duration of the state, and the extreme value performance of the interaction feature vector to perform mandatory safety checks.

[0077] In this embodiment, after the system outputs a predicted intervention action through the contextual multi-armed slot machine algorithm, a cross-validation mechanism based on a finite state machine and an expert verification rule base is introduced before the formal issuance of control commands. The strategy engine integrates the predicted intervention action output by the multi-armed slot machine, the duration of the state tracked by POMDP, and the real-time interaction feature vectors to perform strong verification and closed-loop logic to generate the final intervention action.

[0078] I. Security Degradation Verification under High-Intensity Intervention

[0079] When the predicted intervention action is to output a counter-questioning guidance message, the system triggers a high-intensity intervention security verification process. The policy engine forcibly verifies the following two feature co-verification conditions:

[0080] First, the current cognitive engagement state must be a high-probability-of-dependence state. The system reads the current cognitive engagement state identifier output by the POMDP module in Example 1. If the identifier is not a high-probability-of-dependence state, it is directly determined that the prerequisite for high-intensity intervention is not met.

[0081] Secondly, the interactive feature vector must simultaneously indicate that the text copying ratio exceeds a first preset ratio and semantic evolution stagnates. The system analyzes the text reconstruction features and semantic evolution features within the current sliding time window: it calculates the external text copying ratio; if this ratio does not exceed the first preset ratio (e.g., preset to 0.6), it is determined that the condition is not met; simultaneously, it detects the semantic span change rate; if the change rate is lower than a preset stagnation threshold (e.g., the change rate is lower than 0.05 within two consecutive time windows), it is determined that semantic evolution stagnates. Only when both sub-features are simultaneously satisfied can the high-intensity instruction of outputting the rhetorical question guidance information be allowed.

[0082] If any of the above conditions are not met, i.e., the current cognitive input state is not a high-probability state, or the interaction feature vector does not indicate that the text copy ratio exceeds the first preset ratio and semantic evolution stagnates, the system will forcibly implement action downgrading and overwrite the final intervention action as output step-level guidance information to avoid excessive intervention that interrupts the user's normal cognitive process.

[0083] II. Absolutely Silent Forced Unauthorized Verification

[0084] The system performs absolutely silent security checks in parallel. The strategy engine monitors the deep autonomous construction feature combination in the interaction feature vector: extracts the autonomous rewriting ratio in the text reconstruction features, if the ratio exceeds the second preset ratio (e.g., preset to 0.7); simultaneously extracts the semantic span change rate in the semantic evolution features, if the change rate is higher than the preset change rate threshold (e.g., greater than 0.2); and the duration of the high construction probability state tracked in Example 1 reaches the second preset threshold (e.g., 120 seconds). At this time, regardless of what predicted intervention action the multi-armed slot machine previously output, the system determines that continued intervention may cause negative effects, and the rule engine triggers the highest priority forced overriding, forcibly overwriting the final intervention action to execute the original output directly, physically blocking any explicit pop-ups or prompts that may interrupt the user's mental flow.

[0085] III. State Machine Deployment and Command Issuance

[0086] The aforementioned cross-validation logic is implemented using a finite state machine combined with an expert verification rule base, and deployed in the downstream streaming output interception middleware of the system. The state machine maintains four state nodes: initial monitoring state, security degradation state, forced privilege escalation state, and direct release state. Based on the input predicted intervention action type and feature verification results, the state machine completes state transitions within milliseconds.

[0087] After cross-validation, the policy engine encapsulates the finalized intervention action into a standard JSON control instruction package. This package includes action enumeration values ​​(such as "output_step_guide", "output_rhetorical_question", "output_concept_guide", and "pass_through"), a status verification result identifier, a timestamp, and a session ID. The system asynchronously pushes this JSON control instruction package to the downstream rule engine module via RPC or a RabbitMQ message queue for subsequent use by the compound prompt word template generation and streaming interception modules.

[0088] S104, Generate a compound prompt word template based on the final intervention action, the compound prompt word template including cognitive dimension constraint rules and emotional dimension constraint rules;

[0089] In this embodiment, after determining the final intervention action through the upstream cross-validation module, the system enters the rule engine module to execute the generation and instantiation process of the composite prompt word template. The composite prompt word template consists of a subset of cognitive dimension constraint rules and a subset of emotional dimension constraint rules, employing a decoupled design and aggregating them into a complete set of constraints through a logical union operation. Specifically, the system defines the complete set of intervention rule constraints as the composite prompt word template. The cognitive dimension constraint mapping function is responsible for generating logical deduction control instructions based on different final intervention actions, while the emotional dimension constraint mapping function is responsible for generating social presence rendering instructions based on historical multimodal interaction contexts. The logical union operation ensures that the auxiliary model must simultaneously satisfy the rule constraints of both dimensions when generating a response.

[0090] In S104, the cognitive dimension constraint rules include: when the final intervention action is to output step-level guidance information, the output text is restricted to include step-by-step operation instructions and conclusive content is blocked; when the final intervention action is to output rhetorical question guidance information, the output text is restricted to include conditional hypothetical questions and declarative sentences are blocked; when the final intervention action is to output concept-level guidance information, the output text is restricted to include principle explanations and operational step-level content is blocked.

[0091] In this embodiment, the system pre-configures a cognitive dimension rule base in the backend strategy configuration center, storing logical constraint entries corresponding to each final intervention action in key-value pairs. When the final intervention action is to output step-level guidance information, the rule engine calls the corresponding entry from the rule base to generate cognitive dimension constraint rules. These rules restrict the output text from including step-by-step operation instructions and block conclusive content. Specifically, the rule engine adds structured constraints to the prompt word template: allowing the output of numbered steps, intermediate variable calculation processes, and boundary condition descriptions; prohibiting the output of final numerical results, final conclusion sentences, and direct answer strings. The rule engine matches conclusive keywords (such as "therefore the answer is", "the final result is", "in summary") using regular expressions. If the content generated by the auxiliary model matches the above keywords, it automatically truncates and triggers a regeneration mechanism to ensure that the generation of the final answer is physically blocked.

[0092] When the final intervention action is to output a rhetorical question, the system retrieves the corresponding entry from the rule base to generate cognitive dimension constraint rules. These rules restrict the output text to include conditional hypothetical questions and suppress declarative sentences. Specifically, the rule engine enforces syntactic structure constraints: requiring interrogative sentences to be introduced by conditional conjunctions ("if," "suppose," "if") and ending with a question mark. The rule engine detects declarative sentence markers through dependency parsing; if a complete subject-verb-object structure is detected without interrogative particles, it is determined to be a declarative sentence, triggering backtracking and rewriting. The question direction constraint requires the question to point towards forward reasoning prediction (e.g., "If the premises are reversed, what will happen to this logical chain?"), forcing the user to perform forward reasoning prediction.

[0093] When the final intervention action is to output conceptual-level guidance information, the system retrieves the corresponding entry from the rule base to generate cognitive dimension constraint rules. These rules restrict the output text from including principle-level explanations and suppress operational step-level content. Specifically, outputting core concept definitions, explanations of physical or mathematical principles, and causal mechanism interpretations is allowed; outputting specific numerical substitution steps, code-level instructions, and tool operation procedures is prohibited. The rule engine detects whether the text contains ordinal numbers ("first step," "second step") or operational verbs ("click," "input," "calculate," "substitute"). If present, filtering is triggered, instructing the auxiliary model to degrade from detailed step-level content to highly abstract principle-level explanations.

[0094] When the final intervention action is to execute the original output directly, the cognitive dimension constraint rules are an empty set, and the system does not impose any artificially set constraints on the cognitive dimension in order to preserve the original natural output path of the main model.

[0095] In S104, the emotional dimension constraint rules include: a forced shift rule for person perspective, requiring the use of first-person expression; an upper limit control rule for academic term density, requiring the term density to not exceed a preset ratio; and a pre-injection rule for emotional reassurance corpus, which injects a preset reassurance text template into the first sentence of the reply when the duration of continuous interaction pauses exceeds a preset pause threshold.

[0096] Specifically, in this embodiment, the system is independent of cognitive actions and forcibly superimposed with emotional dimension constraints. The forced conversion rule of person perspective requires the use of first-person expression, strictly prohibiting the use of mechanical titles such as "this system" or "this platform," and strictly prohibiting the use of passive voice. In specific implementation, the rule engine performs person detection during the text rewriting post-processing stage: if the output text contains prohibited titles ("system," "you" in a specific context), it triggers the pronoun substitution pipeline, converting the second-person "you" into the inclusive first-person "we," and converting "the system believes" into "I think," thereby reducing psychological distance.

[0097] The academic terminology density upper limit control rule requires that the terminology density not exceed a preset ratio. In specific implementation, the rule engine maintains an academic terminology lexicon, performs word segmentation and lexicon matching on the auxiliary model's single-response text, and calculates the terminology density. The terminology density is defined as the ratio of the size of academic or geek terminology vocabulary to the total vocabulary size of a single response, satisfying the formula:

[0098]

[0099] in, The amount of academic or geek terminology contained in a single response from a large language model; The total vocabulary in a single reply; A dynamically configurable density upper limit threshold (e.g., preset to 0.05 or 5%). If this threshold is exceeded, the rule forces the model to use analogies from everyday life for dimensionality reduction explanations. Finally, if the context... This indicates that the user has experienced prolonged attempts and failures. The engine will then trigger "pre-injection of emotional reassurance corpus," forcing the large model to generate emotional reassurance corpus in the first sentence of the response (for example, I understand this logic can indeed be confusing). Finally, the server-side policy configuration center will apply the aforementioned parallel instantiation of cognitive dimension constraint rules. Constraints on the Emotional Dimension The constraint rules are encapsulated in a structured JSON configuration file. This composite file is registered as a soft constraint parameter in the context of the backend streaming middleware, and is specifically called as a core system-level instruction in the next stage (i.e., the asynchronous rewriting of the lightweight auxiliary large language model Fast LLM), thereby completely defining the boundaries of the text reshaping of the auxiliary large model.

[0100] The rule for pre-injecting emotional reassurance text requires that when the duration of a continuous interaction pause exceeds a preset pause threshold, a preset reassurance text template be injected into the first sentence of the response. Specifically, the time rhythm feature module tracks the difference between the user's most recent input timestamp and the current moment to calculate the duration of the continuous interaction pause. The rule engine maintains a hierarchical reassurance corpus, including a general-level corpus (e.g., "This is a very tricky problem, I understand your current confusion") and a domain-level corpus (targeted reassurance phrases for vertical fields such as mathematics and programming), dynamically selecting the appropriate corpus based on the current task type. The injection position is strictly limited to the reassurance text template at the first sentence of the response, separated from subsequent cognitive guidance content by a newline character, ensuring that emotional text and logical content are not mixed, thus providing necessary emotional buffering while applying cognitive guidance.

[0101] Finally, the system encapsulates the aforementioned parallel-instantiated cognitive and emotional constraint rules into a structured JSON configuration file in the server-side policy configuration center. This JSON configuration file contains the following top-level fields: `cognitive_constraint` (a subset of cognitive constraint rules, including action type, allowed output mode, prohibited output mode, and trigger regeneration condition), `emotional_constraint` (a subset of emotional constraint rules, including person mode, term density upper limit, soothing corpus, and pause threshold), and `template_version` (the rule template version number, used for policy rollback during the online learning phase). This JSON configuration file, as a soft constraint parameter, is registered in the context of the backend streaming middleware, specifically for subsequent auxiliary models to call as core system-level instructions during the asynchronous rewriting phase, thereby thoroughly defining the text reshaping boundaries of the auxiliary model.

[0102] S105, monitor the streaming output of the main model used to generate the original text, and when the final intervention action is a non-silent intervention, intercept and cache the streaming output to block real-time delivery.

[0103] Furthermore, the main model is streamed via an event sending protocol sent by the server;

[0104] In S105, intercepting and caching the streaming output includes: taking over the streaming output through an asynchronous coroutine, blocking the streaming data push operation to the downstream user terminal, and continuously concatenating the text entities of each data packet into a hidden buffer.

[0105] Specifically, in this embodiment, the system applies an adaptive intervention strategy to the output of the main model. The system deploys a streaming-aware interceptor on the backend to continuously monitor the streaming output of the main model used to generate the original text. Upon receiving the data stream initiation signal, the interceptor reads the final intervention action determined by the upstream hybrid decision module and executes routing control accordingly. If the final intervention action is to directly send the original output, the system determines that the original output of the current underlying main model is safe, and the streaming output is directly transmitted through the server to the front-end interactive interface, maintaining a native low-latency streaming response experience. If the final intervention action is to output step-level guidance information, output question guidance information, or output concept-level guidance information, it is determined to be a non-silent intervention, and the system immediately activates the interception buffer mode to intercept and cache the streaming output, blocking real-time transmission.

[0106] The main model outputs data in a streaming manner via a server-side event sending protocol. The main model breaks down the generated raw text into a fine-grained sequence of character data packets. Each data packet is encapsulated according to the server-side event sending protocol format, including an event identifier field, a data payload field, and a newline delimiter. The system uses a stream-aware interceptor coroutine developed in the backend based on the Python asynchronous framework asyncio. This interceptor acts as a generator, taking over the server-side event stream from the main model and iterating through the sequence of consecutive character data packets using an async for loop.

[0107] The interception and buffering of the streaming output includes taking over the streaming output through an asynchronous coroutine. Specifically, the interceptor performs conditional dispatch by reading the current_action variable in memory and flips the system state according to the type of the final intervention action. When it is determined to be a non-silent intervention, the interceptor state immediately flips to the intercept buffer mode, taking over the underlying data stream at the physical level, and all streaming character data packets generated by the main model are no longer passed downstream.

[0108] In intercept buffer mode, the system uses conditional dispatch logic to block the `yield` operation sent to the downstream user end. Specifically, the interceptor iterates through the stream data using an `async for` loop. When the action state belongs to a non-silent intervention set, it uses a `continue` statement to forcibly block the `yield` operation sent to the downstream frontend, completely preventing the real-time leakage of sensitive answers. At this time, the original text generated by the main model will not penetrate to the user end; the frontend interface remains in a waiting state, and no new characters are rendered. In Python, the `yield` function is called a generator.

[0109] To block real-time synchronization, the system concatenates the text entities of each data packet into a hidden buffer. The system allocates a hidden buffer in server memory, continuously intercepting and accumulating all streaming character data packets generated by the main model. The real-time state update of the hidden buffer follows this formula:

[0110]

[0111] In the above formula: The sequence number and time step of the currently received streaming data packet; The first generation generated by the main model A single character data packet (Token); To receive the first After a certain number of characters, hide the current text string state in the buffer; To receive the first Before each character, hide the state of the historical text string that has been accumulated in the buffer; This is the string concatenation operator. Through the above asynchronous accumulation operation, the middleware continuously takes over the underlying data until it detects the end-of-stream marker of the main model's output.

[0112] When the middleware captures the [DONE] stream end marker in the server's event protocol, it determines that the original solution text has been collected in the hidden buffer and then suspends the current main data reception process. At this time, variable B... hidden The system has fully and securely stored the original complete solution text generated by the main model, providing a complete reference context for the asynchronous rewriting of downstream auxiliary models. After caching, the system performs a state machine reset operation, clears the hidden buffer variables, resets the route interception state, and enters the next round of interactive monitoring cycle.

[0113] S106, after the streaming output is completed, the auxiliary model is called to asynchronously rewrite the cached text according to the compound prompt word template, generate security intervention text, and send it to the user terminal according to the streaming protocol.

[0114] In S106, generating security intervention text and sending it to the user terminal according to a streaming protocol includes: dividing the security intervention text into fine-grained data blocks, encapsulating them according to the server sending event protocol format, and sending them to the user terminal block by block through an asynchronous iterator.

[0115] In S106, after generating the security intervention text and sending it to the user terminal according to the streaming protocol, the method further includes: collecting incremental data of interactive features within the next sliding time window, calculating a reward signal based on the feature changes before and after the intervention, and using the reward signal to update the weight parameter vector and covariance matrix in the context multi-armed slot machine algorithm.

[0116] In this embodiment, when the middleware captures the [DONE] streaming end marker in the server's event protocol, it determines that the original answer text has been collected in the hidden buffer and immediately suspends the current main data reception process. The system immediately calls the internally deployed lightweight auxiliary model (Fast LLM) to initiate the asynchronous rewrite protocol. The system uses the original text in the hidden buffer as a reference context and injects a combination of system prompts generated by the upstream rule engine module, which includes cognitive and emotional constraint rules, to generate security intervention text. The auxiliary model quickly performs text reshaping in the background, ensuring that the rewritten content strictly follows the cognitive constraints specified in the composite prompt template, such as step-by-step operation guidance, conditional hypothetical questions, or principle explanations, while also adding emotional constraints such as first-person expression, terminology density control, and emotional reassurance corpus.

[0117] After the security intervention text is generated, the system enters the streaming link reconstruction phase. In order not to disrupt the user's expectation of low-latency, word-by-word output from generative AI, the system does not send the entire text to the front end at once. Instead, the middleware pushes the generated security intervention text back into the network data stream for front-end rendering, perfectly disguising and reconstructing the streaming output experience of the main model.

[0118] The middleware segments the security intervention text into a fine-grained sequence of data blocks, strictly formatting and encapsulating them according to the server's event sending protocol specifications. Each data block contains an event identifier field, a data payload field, and a newline delimiter. Subsequently, the system, through an asynchronous iterator on the backend, sends the data blocks word by word or phrase by phrase to the front-end interactive interface at a preset sending frequency. After the push is complete, the system performs a state machine reset operation, clears hidden buffer variables, resets the routing interception state, and enters the next round of interactive monitoring cycle.

[0119] After generating the security intervention text and sending it to the user's end via a streaming protocol, the system enters the closed-loop evaluation phase of the adaptive strategy. In the server-side backend architecture, a separate online learning evaluation microservice is deployed. Within the next sliding time window, the system continuously collects incremental interaction feature data, including temporal rhythm feature increments, semantic evolution feature increments, and text reconstruction feature increments, used to quantify changes in user behavior after receiving intervention.

[0120] The system pre-configures an implicit feedback mechanism based on learning outcomes. It constructs a multi-objective reward evaluation function based on feature evolution gradients to quantify the intervention effect by comparing feature changes before and after the intervention. (Reward scalar) The calculation formula is as follows:

[0121]

[0122] in, The total number of core feature dimensions for participating in the reward evaluation (such as text self-rewriting rate, problem logic depth, etc.). For the first The preset weight coefficients of the dimensional features in reward evaluation satisfy the following conditions: ; and Before intervention ( (Time) and after intervention ( High-dimensional contextual feature vectors extracted at each time step; Here is the feature extraction function, used to extract the th feature from the feature vector. The specific quantitative score of the dimension; It represents the positive or negative gradient of a user's cognitive indicators after receiving intervention.

[0123] Calculate the true reward signal Subsequently, the system uses this as feedback data and asynchronously pushes it to the multi-armed slot machine model in step S2 to update the intervention actions in real time. The corresponding weight parameter vector With covariance matrix Through this closed-loop mechanism, the system can continuously self-correct amidst massive human-computer interactions, increasingly accurately matching the most ideal intervention intensity to users with different prior knowledge at the right time.

[0124] Secondly, this embodiment also proposes a human-machine collaborative adaptive control system based on real-time decoding of cognitive input, such as... Figure 2 ,include:

[0125] The acquisition module 201 is used to acquire human-computer interaction data, decode the human-computer interaction data in real time, and obtain a probability sequence of the user's cognitive engagement state.

[0126] The smoothing module 202 is used to perform confidence smoothing on the probability sequence based on POMDP, determine the current cognitive engagement state and the duration of the state, and output a control signal when the duration reaches a preset threshold.

[0127] Decision module 203 is used to respond to the control signal, extract the interaction feature vector within the current sliding time window, select the predicted intervention action from the preset intervention action set using the context multi-armed slot machine algorithm, cross-validate the predicted intervention action with the current cognitive engagement state and the interaction feature vector, and determine the final intervention action.

[0128] The generation module 204 is used to generate a composite prompt word template based on the final intervention action. The composite prompt word template includes cognitive dimension constraint rules and emotional dimension constraint rules.

[0129] Interception module 205 is used to monitor the streaming output of the main model used to generate the original text. When the final intervention action is a non-silent intervention, it intercepts and caches the streaming output to block real-time delivery.

[0130] The delivery module 206 is used to call the auxiliary model to asynchronously rewrite the cached text according to the composite prompt word template after the streaming output is completed, generate security intervention text, and deliver it to the user terminal according to the streaming protocol.

[0131] This system can be used to execute the human-machine collaborative adaptive control method described in the first aspect, which will not be elaborated further here.

[0132] Thirdly, this embodiment also proposes an electronic device comprising: one or more processors, and a memory for storing one or more computer programs; characterized in that the computer programs are configured to be executed by the one or more processors, and the programs include steps for performing the human-machine collaborative adaptive control method as described in the first aspect.

[0133] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

Claims

1. A human-machine collaborative adaptive control method, characterized in that, include: Acquire human-computer interaction data, decode the human-computer interaction data in real time, and obtain a probability sequence of the user's cognitive engagement state; The probability sequence is smoothed with confidence based on POMDP to determine the current cognitive engagement state and the duration of the state. When the duration reaches a preset threshold, a control signal is output. In response to the control signal, the interaction feature vector within the current sliding time window is extracted, and a predicted intervention action is selected from the preset intervention action set using the context multi-armed slot machine algorithm. The predicted intervention action is cross-validated with the current cognitive engagement state and the interaction feature vector to determine the final intervention action. A composite prompt word template is generated based on the final intervention action. The composite prompt word template includes cognitive dimension constraint rules and emotional dimension constraint rules. Monitor the streaming output of the main model used to generate the original text. When the final intervention action is a non-silent intervention, intercept and cache the streaming output to block real-time delivery. After the streaming output is completed, the auxiliary model is invoked to asynchronously rewrite the cached text according to the compound prompt word template, generate security intervention text, and send it to the user terminal according to the streaming protocol.

2. The human-machine collaborative adaptive control method according to claim 1, characterized in that, The confidence smoothing of the probability sequence based on POMDP includes: Based on the confidence distribution of the previous time step, the observation probability of the current time step, and the state transition probability, the confidence distribution of the current time step is recursively updated through Bayesian inference. The state with the highest confidence is extracted as the current cognitive investment state, and the duration of the current cognitive investment state is tracked; the current cognitive investment state includes high dependency probability state and high construction probability state; When the current cognitive input state is a high-dependency state and the duration reaches a first preset threshold, the control signal is output; when the current cognitive input state is a high-construction probability state and the duration reaches a second preset threshold, a silent protection signal is output.

3. The human-machine collaborative adaptive control method according to claim 1, characterized in that, The preset intervention action set includes output step-level guidance information, output question guidance information, output concept-level guidance information, and execution of raw output direct delivery; the interaction feature vector is composed of temporal rhythm features, semantic evolution features, and text reconstruction features; The selection of predicted intervention actions from a preset set of intervention actions using a context-based multi-armed slot machine algorithm includes: The interaction feature vector is multiplied by the weight parameter vector corresponding to each action in the preset intervention action set to obtain the expected reward value of each action. The action with the largest expected reward value is selected as the predicted intervention action.

4. The human-machine collaborative adaptive control method according to claim 3, characterized in that, The step of cross-validating the predicted intervention action with the current cognitive engagement state and the interaction feature vector includes: When the predicted intervention action is to output rhetorical question guidance information, if the current cognitive input state is not a high-dependency probability state, or the interaction feature vector does not indicate that the text copy ratio exceeds the first preset ratio and the semantic evolution is stagnant, then the final intervention action will be downgraded to output step-level guidance information. When the interactive feature vector indicates that the autonomous rewriting ratio exceeds the second preset ratio and the semantic span change rate is higher than the preset change rate, and the duration reaches the second preset threshold, the final intervention action will be forcibly overwritten as executing the original output directly.

5. The human-machine collaborative adaptive control method according to claim 1, characterized in that, The cognitive dimension constraint rules include: When the final intervention action is to output step-level guidance information, the output text is restricted to include step-by-step operation instructions and conclusive content is blocked; when the final intervention action is to output rhetorical question guidance information, the output text is restricted to include conditional hypothetical questions and declarative sentences are blocked; when the final intervention action is to output concept-level guidance information, the output text is restricted to include principle explanations and operational step-level content is blocked.

6. The human-machine collaborative adaptive control method according to claim 5, characterized in that, The emotional dimension constraint rules include: a forced shift rule for person perspective, requiring the use of first-person narration; an upper limit control rule for academic terminology density, requiring the terminology density to not exceed a preset ratio; and a pre-injection rule for emotional reassurance corpus, which injects a preset reassurance text template into the first sentence of the reply when the duration of continuous interaction pauses exceeds a preset pause threshold.

7. The human-machine collaborative adaptive control method according to claim 1, characterized in that, The main model is streamed through an event sending protocol sent by the server. The interception and caching of the streaming output includes: taking over the streaming output through an asynchronous coroutine, blocking the streaming data push operation to the downstream user terminal, and continuously concatenating the text entities of each data packet into a hidden buffer; The process of generating security intervention text and sending it to the user terminal according to a streaming protocol includes: dividing the security intervention text into fine-grained data blocks, encapsulating them according to the server sending event protocol format, and sending them to the user terminal block by block through an asynchronous iterator.

8. The human-machine collaborative adaptive control method according to claim 1, characterized in that, After generating the security intervention text and sending it to the user terminal according to a streaming protocol, the method further includes: Collect incremental data of interactive features within the next sliding time window, calculate the reward signal based on the feature changes before and after the intervention, and use the reward signal to update the weight parameter vector and covariance matrix in the context multi-armed slot machine algorithm.

9. A human-machine collaborative adaptive control system based on real-time decoding of cognitive input, characterized in that, include: The data acquisition module is used to acquire human-computer interaction data, decode the human-computer interaction data in real time, and obtain a probability sequence of the user's cognitive engagement state. The smoothing module is used to perform confidence smoothing on the probability sequence based on POMDP, determine the current cognitive engagement state and the duration of the state, and output a control signal when the duration reaches a preset threshold. The decision module is used to respond to the control signal, extract the interaction feature vector within the current sliding time window, select the predicted intervention action from the preset intervention action set using the context multi-armed slot machine algorithm, cross-validate the predicted intervention action with the current cognitive engagement state and the interaction feature vector, and determine the final intervention action. A generation module is used to generate a composite prompt word template based on the final intervention action. The composite prompt word template includes cognitive dimension constraint rules and emotional dimension constraint rules. The interception module is used to monitor the streaming output of the main model used to generate the original text. When the final intervention action is a non-silent intervention, the streaming output is intercepted and cached to block real-time delivery. The delivery module is used to call the auxiliary model to asynchronously rewrite the cached text according to the compound prompt word template after the streaming output is completed, generate security intervention text, and deliver it to the user terminal according to the streaming protocol.

10. An electronic device, the electronic device comprising: One or more processors, a memory for storing one or more computer programs; characterized in that the computer programs are configured to be executed by the one or more processors, the programs including steps for performing the human-machine collaborative adaptive control method as described in any one of claims 1-8.