Method and device for controlling dialogue reasoning exception of large language model, computer device and medium

By employing multi-level collaborative processing at the neural network level, functional module level, and output level of a large language model, anomalies are detected and intervened in real time. This solves the problem of anomaly chain formation during the reasoning process of a large language model, improves the stability and accuracy of the generated content, and provides interpretability and adaptive security protection capabilities.

CN122242785APending Publication Date: 2026-06-19BEIJING REALAI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING REALAI TECH CO LTD
Filing Date
2026-05-25
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies cannot fundamentally prevent the formation of abnormal chains in the reasoning process of large language models, resulting in logical errors and security risks in the generated content. Existing methods also suffer from problems such as intervention lag and insufficient detection generalization ability.

Method used

Real-time detection and intervention of neuron activation anomalies at the neural network level of the large language model; detection of functional anomalies in candidate inference results at the functional module level; and detection of semantic conflicts at the output level. Safety intervention and correction are achieved through multi-level collaborative processing at the micro, meso, and macro levels.

🎯Benefits of technology

It enables real-time perception and control of the internal reasoning process of large language models, avoids the spread of local errors, improves the stability and reliability of generated content, enhances the quality and accuracy of generated content, and possesses interpretability and adaptive security protection capabilities.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122242785A_ABST
    Figure CN122242785A_ABST
Patent Text Reader

Abstract

This invention provides a method, apparatus, computer device, and medium for controlling anomalies in dialogue reasoning within a large language model, relating to the field of artificial intelligence technology. The method includes: during the generation of dialogue reasoning using a large language model, real-time detection of neuron activation anomalies at the neural network level of the large language model, and first security intervention on the output of abnormal neurons; real-time detection of functional anomalies in candidate reasoning results output by functional modules at the functional module level of the large language model, and correction of the functionally abnormal candidate reasoning results; and real-time detection of semantic conflicts in received candidate reasoning results at the output level, and second security intervention on candidate reasoning results with semantic conflicts. This solution can fundamentally block the formation of abnormal reasoning chains, significantly improving the stability and reliability of the internal representation of the large language model, and ensuring high reliability of the large language model output at the functional, logical, and factual levels.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method, apparatus, computer device, and medium for controlling anomalies in dialogue reasoning of large language models. Background Technology

[0002] With the continuous improvement of computing power and the accumulation of large-scale corpora, generative artificial intelligence technology based on deep learning has developed rapidly, especially general artificial intelligence models represented by large language models, which have made breakthroughs in many areas such as natural language understanding, text generation, dialogue interaction, and complex problem solving. Current large language models are usually built based on large-scale pre-training and fine-tuning mechanisms. By learning the statistical correlations between words in massive amounts of text data, they can model language structure, semantic patterns, and some implicit knowledge, thereby possessing strong language generation and reasoning capabilities.

[0003] In practical applications, large language models have been widely deployed in open-domain dialogue systems, intelligent customer service, content creation platforms, knowledge-based question-answering systems, and professional decision-making support scenarios in fields such as healthcare, finance, and law. However, with the continuous expansion of their application scope, large language models have exposed a series of significant security and reliability issues during the reasoning and generation processes. In particular, the problem of inference anomalies has gradually become one of the core technical bottlenecks restricting their large-scale application.

[0004] Reasoning anomalies refer to output behaviors generated by a model that are inconsistent with factual accuracy, logical consistency, value orientation, or safety norms. These anomalies are not simple grammatical errors or grammatical inconsistencies, but often manifest as superficially structurally complete and semantically coherent content, yet exhibiting serious deviations in underlying logic, factual basis, or contextual consistency. The model may generate seemingly reasonable but actually nonexistent factual statements, or offer contradictory definitions and conclusions about the same concept in the same or multiple rounds of dialogue, or even construct self-consistent but completely flawed causal chains in complex reasoning. These phenomena are often referred to as illusion problems, but their essence is the result of abnormal deviations in the model's internal reasoning path.

[0005] From a technical perspective, large language models do not possess human-like comprehension capabilities; their generation process is essentially based on sequence prediction using probability distributions. In complex reasoning tasks, the model needs to perform multi-layered nonlinear mappings within a high-dimensional representation space, with different neurons and attention heads activated at different stages to perform different semantic or logical functions. During this process, the model is highly susceptible to the influence of long-tailed distributions in the training data, incomplete knowledge, implicit biases, and adversarial inducements in user input, leading to abnormal activation of some neurons or functional pathways. Once such anomalies occur early in the reasoning process, they can amplify along network layers, eventually evolving into obvious logical errors, factual errors, or security risks.

[0006] Most existing governance methods focus on the input or output ends. This reactive or soft-constraint security strategy often fails to identify the root cause of problems in a timely manner when facing complex inference tasks and new attack methods. It cannot effectively intervene during the formation stage of the inference chain, and therefore cannot fundamentally solve the problem of inference anomalies. As the model size continues to increase and the complexity of inference continues to rise, relying solely on external filtering or prompting guidance is no longer sufficient to meet the actual needs of high-security and high-reliability application scenarios.

[0007] For example, existing technical solutions primarily employ a black-box output filtering strategy. This approach typically involves independently training one or more binary or multi-class discriminant models, often referred to as security filters or content moderation models. When a large language model generates a complete text response, this text is input into the discriminant model for scanning. The discriminant model performs supervised learning based on labeled safe and unsafe corpora, analyzing the text's surface features and semantic information to determine whether it contains logically incoherent content. If the risk score output by the discriminant model exceeds a preset threshold, the system will intercept the response and replace it with a standard rejection message or forward it to manual review.

[0008] It is evident that the existing technical solution has an intervention lag problem. Security interception occurs after the large language model has completed all reasoning and text generation. At this point, the model has already consumed computational resources to form a complete erroneous reasoning process. The system cannot prevent the formation of erroneous thought chains during the reasoning process, nor can it trace the internal source of the reasoning anomalies. This technology also suffers from surface detection problems. Its judgment is based solely on the surface features and overall semantic performance of the final generated text, making it difficult to identify deep logical inconsistencies hidden within the text or factual conflicts relying on long-distance context. When the generated content remains linguistically coherent and does not contain explicit sensitive features, the contradictions between its conclusions and the preceding assumptions or implicit premises are difficult to detect effectively. This method has weak generalization ability, and its detection effect is highly dependent on the training data and the rule coverage. It is prone to failure against role-playing inducements, logical trap attacks, and harmful content with mutated expressions. The relatively strict filtering strategy adopted to improve security is prone to misjudging normal content, reducing the usefulness of the generated results.

[0009] For example, existing technical solution two focuses on guiding safe generation through input-side prompt engineering. Specific methods include pre-compiling system-level instructions (system prompts) before user input, explicitly requiring the model to self-censor, adhere to ethical guidelines, or ensure the truthfulness of the answer. Another advanced approach is to use thought chain prompting technology, requiring the model to explicitly output reasoning steps before providing the final answer. The aim is to improve the rigor of logic by showcasing intermediate processes and facilitate manual or simple rule script checks on the rationality of these processes.

[0010] However, the existing technical solution two relies entirely on the model's ability to follow instructions, guiding the model to self-regulate through prompts from engineering or system commands. This soft guidance mechanism is prone to failure when the model itself has cognitive biases or encounters carefully designed jailbreak-like prompts. This solution requires the model to explicitly output a complete thought chain, significantly increasing the length of generated content, the number of reasoning steps, computational resource consumption, and response latency, making it unsuitable for dialogue scenarios with high real-time requirements. Even if the model outputs a complete thought chain, existing technologies still lack automated means to verify each reasoning step in the thought chain in real time. When the model generates a thought chain that appears structurally complete but contains factual errors or logical deviations, the system struggles to identify and correct it.

[0011] Therefore, existing security methods have the drawback of failing to prevent the formation of abnormal reasoning chains at their source. Summary of the Invention

[0012] In view of this, embodiments of the present invention provide a method for controlling anomalies in dialogue reasoning of large language models, to solve the technical problem in the prior art that it is impossible to fundamentally block the formation of abnormal reasoning chains. The method includes:

[0013] During the process of generating dialogue reasoning using a large language model, abnormal activation of neurons is detected in real time at the neural network level of the large language model, and the output of abnormal neurons is subjected to first safety intervention. At the functional module level of the large language model, functional anomalies in the candidate inference results output by the functional modules are detected in real time, and the candidate inference results with functional anomalies are corrected. At the output level, semantic conflicts in the received candidate inference results are detected in real time, and a second security intervention is performed on candidate inference results with semantic conflicts.

[0014] This invention also provides a control device for abnormal dialogue reasoning in large language models, to solve the technical problem in the prior art that it is impossible to fundamentally block the formation of abnormal reasoning chains. The device includes: The micro-gating unit is used to detect abnormal activation of neurons in real time at the neural network level of the large language model during the process of generating dialogue reasoning in the large language model, and to perform a first safety intervention on the output of abnormal neurons. The meso-level gating unit is used to detect functional anomalies in the candidate inference results output by the functional modules in real time at the functional module level of the large language model, and to correct the candidate inference results with functional anomalies. The macro-gating unit is used to detect semantic conflicts in the received candidate inference results in real time at the output level, and to perform a second security intervention on the candidate inference results with semantic conflicts.

[0015] This invention also provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the above-mentioned control method for arbitrary large language model dialogue reasoning anomalies, thereby solving the technical problem in the prior art that it is impossible to block the formation of abnormal reasoning chains from the root.

[0016] This invention also provides a computer-readable storage medium storing a computer program that executes the above-described method for controlling anomalies in large language model dialogue reasoning, thereby solving the technical problem in the prior art that the formation of abnormal reasoning chains cannot be blocked from the root.

[0017] Compared with existing technologies, the beneficial effects achieved by at least one of the above-mentioned technical solutions adopted in the embodiments of this specification include at least the following: It proposes a multi-level, real-time, dynamic, layer-by-layer anomaly detection method for the reasoning generation process within a large language model, including the micro-level (i.e., neuron level of the neural network layer), meso-level (i.e., functional level of the functional modules), and macro-level (i.e., semantic level of the output layer). Based on anomalies at different levels, corresponding security interventions or corrections are performed. This also achieves multi-dimensional and fine-grained anomaly detection capabilities, preventing the propagation and amplification of anomalies at individual levels between network layers and preventing local errors from gradually evolving into global reasoning biases. Simultaneously, through multi-level collaborative processing, potential conflicts can be identified and corrected at different abstraction levels, providing effective real-time perception and control capabilities for the reasoning process within the large language model. This can fundamentally block the formation of abnormal reasoning chains, thereby significantly improving the stability and reliability of the representation within the large language model, ensuring high reliability of the large language model output at the functional, logical, and factual levels, and ultimately improving the quality and accuracy of the content generated by the large language model. Attached Figure Description

[0018] To more clearly illustrate the technical solutions of the embodiments of this application, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart of a method for controlling anomalies in dialogue reasoning of a large language model provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the overall architecture of a method for controlling anomalies in dialogue reasoning of a large language model, as provided in an embodiment of the present invention. Figure 3 This is a schematic diagram of the internal interaction of a large language model, which is a method for controlling anomalies in dialogue reasoning based on the above-mentioned large language model, provided by an embodiment of the present invention. Figure 4 This is a schematic diagram illustrating the processing principle of meso-level reasoning anomalies provided in an embodiment of the present invention; Figure 5 This is a schematic diagram illustrating the construction and updating of a dynamic concept graph according to an embodiment of the present invention; Figure 6 This is a structural block diagram of a computer device provided in an embodiment of the present invention; Figure 7 This is a structural block diagram of a control device for abnormal dialogue reasoning in a large language model provided in an embodiment of the present invention. Detailed Implementation

[0020] The embodiments of this application will now be described in detail with reference to the accompanying drawings.

[0021] The following specific examples illustrate the implementation of this application. Those skilled in the art can easily understand other advantages and effects of this application from the content disclosed in this specification. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. This application can also be implemented or applied through other different specific embodiments, and the details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of this application. It should be noted that, in the absence of conflict, the following embodiments and features in the embodiments can be combined with each other. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0022] In this embodiment of the invention, a method for controlling anomalies in dialogue reasoning of a large language model is provided, such as... Figure 1 As shown, the method includes: Step S101: During the process of generating dialogue reasoning in the large language model, abnormal activation of neurons is detected in real time at the neural network level of the large language model, and the output of abnormal neurons is subjected to the first safety intervention. Step S102: At the functional module level of the large language model, detect functional anomalies in the candidate inference results output by the functional modules in real time, and correct the candidate inference results with functional anomalies. Step S103: At the output level, detect semantic conflicts in the received candidate inference results in real time, and perform a second security intervention on the candidate inference results with semantic conflicts.

[0023] In practical implementation, to intervene in the early stages before anomalous signals propagate and amplify between network layers, and to prevent local errors from gradually evolving into global inference biases, thereby effectively reducing the risk of large language models generating inference illusions or unfounded conclusions, and thus significantly improving the stability and reliability of the internal representations of large language models, this embodiment proposes to detect neuronal activation anomalies in real time at the neural network layer level, achieving anomaly detection at the micro-level. For example... For each neural network layer to be detected, the current activation distribution or current activation value of neurons is continuously calculated in real time. The current activation distribution or the current activation value is compared with a preset activation standard, and the abnormal activation status of the neuron is determined based on the comparison result.

[0024] In practice, the neural network layer to be detected can be a neural network layer of each layer within a large language model or a neural network layer of multiple key layers.

[0025] In practice, when detecting abnormal activation of neurons in a neural network hierarchy, detection can be performed based on either the current activation distribution or the current activation value of the neurons. For example, The process of detecting neuronal activation anomalies based on the current activation distribution is as follows: First, calculate the expected activation distribution of key neurons or attention heads in the current neural network layer (at this time, the preset activation standard is the expected activation distribution). This expected activation distribution can be calculated based on the pre-training statistics of the large language model on a massive secure corpus or the state of the previous time step. Secondly, the activation difference between the current activation distribution and the expected activation distribution of neurons is calculated in real time. When the activation difference of a neuron cluster exceeds the set adaptive threshold, it is determined to be a microscopic abnormality (i.e., neuronal activation abnormality), such as potential hallucination trigger points or hypersensitive reactions.

[0026] The process of detecting neuron activation anomalies based on the current activation value is as follows: The current activation value of a neuron is compared with a standard deviation threshold. If the value exceeds the standard deviation threshold, the neuron is considered to be abnormally activated. For example, a value exceeding three times the standard deviation can be used as the standard deviation threshold (in which case the default activation standard is the standard deviation threshold). Although this method has slightly lower detection accuracy, it can significantly reduce the amount of computation, making it suitable for applications with low computing power or high real-time requirements. This type of statistical method does not rely on complex probabilistic modeling or additional prediction networks, and can complete basic security monitoring tasks with low system overhead, making it suitable for edge devices or embedded inference environments.

[0027] Specifically, by continuously statistically analyzing the activation distribution of different neural network layers, and by dynamically adjusting the standard deviation threshold based on the running status of the large language model, the anomaly detection can maintain a stable performance under different input complexities.

[0028] In practical implementation, to prevent local errors from accumulating and amplifying in subsequent levels and gradually evolving into global reasoning biases, this embodiment proposes a safety intervention for abnormal neuron activation. For example, a first safety intervention is performed on the output of abnormal neurons, including: Applying a temporary sparse inhibition mask to abnormal neurons weakens or masks the weights of their outputs, for example, by multiplying their output weights by a decay coefficient, thereby reducing their impact on subsequent layers. In other words, a corresponding inhibition strategy is applied to abnormal neuron activation to prevent the spread of errors from the source.

[0029] In specific implementation, such as Figure 2 , Figure 3As shown, within the large language model, the process of detecting and intervening in the activation anomalies of neurons at the neural network level can be implemented using micro-gating units. Micro-gating units are embedded within the neural network layer to be detected within the large language model. In these micro-gating units, a distribution predictor can calculate the expected activation distribution of key neurons or attention heads in the current neural network layer, or calculate the current activation distribution or current activation value of neurons. An error calculator compares the current activation distribution or current activation value with a preset activation standard, and based on the comparison result, determines whether there are any abnormal activation conditions of neurons. A sparse suppression mask is generated using a mask generator, and a temporary sparse suppression mask is applied to the abnormal neurons, weakening or masking the weights of the abnormal neurons' outputs.

[0030] In practice, the aforementioned micro-gating units are embedded in the existing Transformer architecture in a differentiable and low-overhead manner for training and inference.

[0031] In practical implementation, forward propagation reasoning is performed layer by layer during the reasoning generation process of the large language model. When the information flow reaches the functional module level of the high-level semantic part of the large language model, in order to comprehensively improve the quality and accuracy of the generated content, this embodiment proposes anomaly detection and intervention at the functional module level of the large language model, realizing anomaly detection and intervention at the meso-level. For example... Real-time detection of functional anomalies in candidate inference results output by the functional module, and correction of abnormal candidate inference results, including: For each functional module to be detected, multiple functional analyses are performed in parallel in real time based on the output and input data of the functional module. These multiple functional analyses include fact retrieval analysis, logical reasoning analysis, and sentiment analysis. For each functional analysis, a score is assigned based on the results, context, preset security rules, and confidence level of each functional analysis. If the function with the lowest score is found to have a functional abnormality, the content related to the functional abnormality in the candidate inference results output by the function module is corrected, and the corrected candidate inference results are output to the output level.

[0032] In specific implementation, such as Figure 2 , Figure 3 , Figure 4As shown, within the large language model, the process of detecting functional anomalies and intervening in security at the functional module level can be implemented through meso-level gating units. When information flows from the neural network layer to the functional modules of the high-level semantic part, multiple virtual functional subnetworks are activated to perform parallel multi-functional analysis on candidate inference results from different functional paths based on the output and input data of the functional modules. These functional subnetworks can be specific subsets of model parameters or lightweight modules obtained through adapter fine-tuning, such as... Figure 4 As shown, it specifically includes a fact retrieval subnetwork that focuses on accurate knowledge recall, a logical reasoning subnetwork that focuses on causal relationship deduction, and an emotional safety analysis subnetwork that focuses on tone and potential risk assessment.

[0033] Then, the routing decision network scores each functional analysis based on its results, context, preset security rules, and confidence level. The analysis with the lowest score is identified as having functional anomalies, and the candidate inference results from the functional modules are corrected for these anomalies before being output to the output level. This creates competition among functional modules within the same inference cycle. This mechanism prevents a single functional module from dominating due to insufficient local information or internal biases, creating effective checks and balances among the fact retrieval module, logical reasoning module, and language generation module. This reduces the probability of biased reasoning and error amplification, improving the overall performance of the final generated content in terms of factual accuracy, logical completeness, and expressive rationality.

[0034] In practical implementation, during the process of detecting functional anomalies in the meso-level detection module, instead of directly employing a multi-parallel functional sub-network approach, a single functional sub-network can be used. By adding different Prompt prefixes to the dialogue input data, the computational paths of different functional sub-networks within the meso-level gating unit are activated, enabling different functions such as fact retrieval and logical reasoning, and generating corresponding candidate outputs. Then, a routing decision network scores and corrects the candidate content according to preset scoring rules. This approach eliminates the need to build multiple parallel sub-networks, effectively reducing the overall model parameter count and deployment cost. Through the design of the Prompt prefix form and content, functional expansion can be achieved without modifying model parameters, giving the system high flexibility across different tasks. The routing decision network can comprehensively evaluate candidate outputs by combining consistency scores, fact matching degrees, and security risk indicators, ensuring the final result better meets the expected security and quality requirements.

[0035] In specific situations, to achieve explainable and traceable advanced security protection, so that security protection no longer relies on simple keyword matching or static rules, but makes judgments based on the reasoning structure itself, so that interception and intervention behaviors have clear logical basis and improve the ability to identify complex semantic risks, this embodiment proposes the following method for detecting semantic conflicts at the output layer to achieve anomaly detection at the macro level: Load the dynamic concept graph of the current dialogue reasoning, map the words or phrases in the candidate reasoning results to the dynamic concept graph, and detect whether there is a logical closed-loop conflict between the nodes or edges generated by the candidate reasoning results and the existing paths in the dynamic concept graph. The dynamic concept graph includes nodes and edges between nodes. The nodes are entities in the dialogue input information and / or entities obtained through reasoning. The edges are semantic, temporal, or logical relationships between entities. The nodes include the attributes of the entities. The triple content in the candidate reasoning results is compared with the trusted knowledge base to detect whether there is a factual conflict. The triple content includes entities, attributes, and relationships between entities. The content of the candidate inference results is matched with a pre-defined unsafe pattern library to detect whether there are any security compliance anomalies.

[0036] In practical implementation, to achieve secure intervention and control over semantic conflicts at the output level, a second security intervention is proposed for candidate reasoning results with semantic conflicts, including: If the candidate reasoning result passes the semantic conflict detection, the candidate reasoning result is determined as the reasoning generated content and the dynamic concept graph is updated so as to continue generating the next word element; If the candidate reasoning result has a logical closed-loop conflict and / or a factual conflict (e.g., there is a slight logical deviation or factual ambiguity), an implicit corrective hint vector is generated and injected into the context of the large language model to guide the large language model to recalculate the candidate reasoning result of the current step. If a candidate inference result exhibits security compliance anomalies (e.g., a malicious attack response, a serious factual error, or an ethical violation is detected), the current inference path of the candidate inference result is terminated and its content is discarded, thus triggering the circuit breaker mechanism. At this point, an appropriate rejection or guidance text can be selected from the security response template library, or the request can be marked and transferred to a manual review process.

[0037] In specific implementation, such as Figure 3 , Figure 5As shown, at the start of dialogue reasoning, a text sequence input by the user is received. The large language model, combined with a graph database, performs input parsing and graph initialization based on the text sequence. First, semantic parsing is performed to extract key entities, entity attributes, and relationships between entities. Simultaneously, the dynamic concept graph for the current session is initialized or loaded. It is determined whether the current round is the first round of dialogue. If it is, an empty graph is created (i.e., nodes and edges are temporarily set). If it is a multi-round dialogue, the historical dynamic concept graph is loaded (this is the dynamic concept graph updated based on the reasoning generated from previous rounds of dialogue). Logical closed-loop conflict detection is performed on candidate reasoning results based on the dynamic concept graph. If a candidate reasoning result passes semantic conflict detection (i.e., logical closed-loop conflicts, factual conflicts, security compliance anomalies, etc., do not exist, and consistency is achieved), the dynamic concept graph is updated based on the current candidate reasoning result, such as merging newly generated nodes and edges, to obtain the dynamic concept graph for the current session.

[0038] As can be seen, by constructing a dynamic concept graph that can be updated in real time during the reasoning process, macro-level logical verification and security circuit breaker control of the generated content are achieved. This dynamic concept graph provides a structured representation of entities, attributes, events, and their relationships involved in the reasoning process, and continuously verifies the consistency between newly generated content and the existing concept structure in the graph. When factual conflicts, logical contradictions, or violations of contextual premises occur, the system can locate the specific concept node or relationship path that triggers the anomaly and perform correction, weight reduction, or suspension of the reasoning operation according to preset rules.

[0039] In specific implementation, such as Figure 2 , Figure 3 As shown, within the large language model, the process of semantic conflict detection and security intervention at the output level can be implemented through macro-gating units. For example, a logic inference engine can be used to detect whether there is a logical closed-loop conflict in the candidate reasoning results by combining dynamic concept graphs; an external knowledge base interface can be used to compare the triple content in the candidate reasoning results with a trusted knowledge base to detect factual conflicts; a negative feedback producer can be used to generate an implicit corrective hint vector and inject it into the context of the large language model; and a security circuit breaker can be used to terminate the current reasoning path of the candidate reasoning results and discard the content of the candidate reasoning results.

[0040] In practical implementation, existing security intervention mechanisms are often unexplainable black-box operations, lacking records of decision-making basis, which is detrimental to the optimization or auditing of security strategies and the establishment of user trust. This application proposes to establish a security decision memory. After completing the current round of dialogue, all relevant data in this reasoning process (including activation bias values ​​at the micro level, module scores at the meso level, and conflict types and final decisions at the macro level) are recorded in the security decision memory. The system periodically uses the data in the memory to update the threshold parameters of each layer of gating and the scoring weights of the router through reinforcement learning algorithms, thereby achieving adaptive evolution of security defense capabilities.

[0041] Specifically, such as Figure 3 As shown, all gating unit decisions are associated with interpretable metadata (such as activation deviation values, scores, or logical conflict paths) in the security decision memory, forming a complete intervention log and supporting adaptive threshold adjustment based on historical data. The security decision memory records and archives the background information, decision basis, and intervention results for each gating trigger, forming a log database, making the system operation traceable and analyzable. Through statistical analysis of historical intervention data by the parameter optimization engine, the trigger thresholds and decision parameters of different gating levels can be dynamically adjusted, allowing the system to gradually adapt to new input characteristics and application environment changes during long-term operation, avoiding misjudgments or omissions caused by fixed thresholds, thereby continuously improving the overall security protection effect.

[0042] In practical implementation, the large language model employing the aforementioned method for controlling anomalies in dialogue reasoning can be applied to scenarios such as dialogue generation, content creation, and question-answering systems. These applications can be further extended to multiple professional fields with stringent requirements for factual accuracy and logical consistency. For example, in a medical question-answering system, the macro-concept graph verification function of this application can be used to verify the logical coherence of medical advice in real time, effectively preventing inconsistencies and deviations from medical common sense. Through structured modeling of diseases, symptoms, treatment plans, and contraindications, implicit logical conflicts can also be identified, reducing the probability of generating high-risk erroneous advice. In the financial investment advisory scenario, the neuron-level monitoring and inhibition capabilities of micro-gating can effectively prevent the model from developing reasoning illusions due to market noise. Simultaneously, relying on the module selection mechanism of meso-gating, it ensures that the model's output content is based on rigorous logical reasoning and fact retrieval modules, rather than sentiment analysis modules susceptible to subjective influence, guaranteeing the objectivity and professionalism of investment analysis and advice. This mechanism helps avoid biases caused by over-interpreting short-term fluctuations or unstructured information. In the context of legal document generation, the full-process monitoring and verification through the micro, meso, and macro triple gating of this application can control multiple dimensions from neuron activation and internal computation path selection to semantic logic level, accurately identify and block abnormal reasoning such as circular definitions and contradictory clauses, ensure that the generated legal document clauses are rigorous, logically consistent, and meet the professional writing requirements of legal documents, and reduce the cost of manual review.

[0043] In practical implementation, the above-mentioned method for controlling anomalies in large language model dialogue reasoning has the following beneficial effects: 1. Improved stability and reliability of the internal representation of the large language model: A micro-level neuron-level gating monitoring and inhibition mechanism (i.e., detecting and safely intervening in abnormal neuron activation) is introduced during the reasoning stage of the large language model, significantly improving the stability and reliability of the internal representation. This mechanism, through fine-grained monitoring of neuron activation intensity, activation distribution patterns, and cross-layer propagation trends, can identify abnormal activation states that deviate from normal reasoning patterns and apply targeted inhibition or attenuation operations to limit the accumulation and amplification of abnormal signals in subsequent layers. This technical approach controls potential hallucination triggers at an early stage, preventing erroneous reasoning dependencies from forming closed loops within the model, reducing unfounded inferences and semantic drift caused by local abnormal activation, and ensuring higher consistency and controllability of the system output under different input conditions.

[0044] 2. Improved Quality and Accuracy of Generated Content: By adopting a meso-gating architecture based on module competition (i.e., the process of detecting and correcting functional anomalies in candidate inference results output by functional modules), the quality and accuracy of generated content are comprehensively improved. This architecture evaluates the intermediate outputs of multiple candidate inference modules in parallel and selects or suppresses them based on their matching degree with the current dialogue goal, contextual constraints, and security conditions (i.e., preset security rules), creating a competitive relationship among functional modules within the same inference cycle. This mechanism prevents a single functional module from dominating due to insufficient local information or internal biases, effectively balancing the fact retrieval module, logical reasoning module, and language generation module, reducing the probability of biased reasoning and error amplification, and improving the overall performance of the final generated content in terms of factual correctness, logical completeness, and expressive rationality.

[0045] 3. Achieved interpretable and traceable advanced security protection: By constructing a dynamic concept graph that updates in real time during the reasoning process, macro-level logical verification and security circuit breaker control of generated content are realized. This dynamic concept graph structurally represents the entities, attributes, events, and their relationships involved in the reasoning process, and continuously verifies the consistency between newly generated content and the existing concept structure in the dynamic concept graph. When factual conflicts, logical contradictions, or violations of contextual premises occur, the system can locate the specific concept node or relationship path that triggers the anomaly and execute correction, weight reduction, or suspension of reasoning operations according to preset rules. This approach makes security protection no longer dependent on simple keyword matching or static rules, but based on the reasoning structure itself, giving interception and intervention behaviors a clear logical basis and improving the system's ability to identify complex semantic risks.

[0046] 4. Possesses long-term evolutionary security protection potential: By introducing a security decision memory and supporting an adaptive optimization mechanism based on historical intervention data, the system is endowed with long-term evolutionary security protection capabilities. This memory continuously records different types of inference anomalies, gating trigger conditions, and their intervention results, and statistically analyzes the effects of various intervention strategies to adjust threshold settings and decision parameters at the micro, meso, and macro gating levels. This mechanism enables the system to gradually adapt to new application scenarios and attack forms, avoiding the failure of fixed thresholds and static rules in long-term operation. The system's security performance exhibits a cumulative enhancement characteristic over time.

[0047] In this embodiment, a computer device is provided, such as... Figure 6 As shown, it includes a memory 601, a processor 602, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the control method for any of the above-mentioned large language model dialogue reasoning anomalies.

[0048] Specifically, the computer device can be a computer terminal, a server, or a similar computing device.

[0049] In this embodiment, a computer-readable storage medium is provided, which stores a computer program that executes the control method for any of the above-described large language model dialogue reasoning anomalies.

[0050] Specifically, computer-readable storage media, including both permanent and non-permanent, removable and non-removable media, can store information using any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer-readable storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable storage media does not include transient media, such as modulated data signals and carrier waves.

[0051] Based on the same inventive concept, this invention also provides a control device for large language model dialogue reasoning exceptions, as described in the following embodiments. Since the principle of the control device for large language model dialogue reasoning exceptions is similar to the control method for large language model dialogue reasoning exceptions, the implementation of the control device for large language model dialogue reasoning exceptions can refer to the implementation of the control method for large language model dialogue reasoning exceptions, and repeated details will not be elaborated further. As used below, the terms "unit" or "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0052] Figure 7 This is a structural block diagram of a control device for dialogue reasoning anomalies in a large language model according to an embodiment of the present invention. This device is deployed within the large language model, such as... Figure 7 As shown, the device includes: The micro-gating unit 701 is used to detect abnormal activation of neurons in real time at the neural network level of the large language model during the process of generating dialogue reasoning in the large language model, and to perform a first safety intervention on the output of abnormal neurons. The meso-level gating unit 702 is used to detect functional anomalies in the candidate inference results output by the functional modules in real time at the functional module level of the large language model, and to correct the candidate inference results with functional anomalies. The macro-gating unit 703 is used to detect semantic conflicts in the received candidate inference results in real time at the output level, and to perform a second security intervention on the candidate inference results with semantic conflicts.

[0053] In one embodiment, the micro-gating unit is used to continuously calculate the current activation distribution or current activation value of neurons in real time for each neural network layer to be detected; compare the current activation distribution or current activation value with a preset activation standard, and determine the abnormal activation status of neurons based on the comparison result.

[0054] In one embodiment, a micro-gating unit is used to apply a temporary sparse inhibition mask to the abnormal neuron, thereby weakening or masking the weights of the abnormal neuron's output.

[0055] In one embodiment, the meso-level gating unit is used to perform multiple functional analyses in parallel in real time based on the output and input data of each functional module to be detected. The multiple functional analyses include fact retrieval analysis, logical reasoning analysis, and sentiment analysis. For each functional analysis, a score is given based on the results, context, preset security rules, and confidence level of each functional analysis. If the functional analysis with the lowest score is determined to have a functional anomaly, the content related to the functional anomaly in the candidate reasoning results output by the functional module is corrected, and the corrected candidate reasoning results are output to the output level.

[0056] In one embodiment, a macro-gating unit is used to load the dynamic concept graph of the current dialogue reasoning, map the lexical units or phrases in the candidate reasoning results to the dynamic concept graph, and detect whether there is a logical closed-loop conflict between the nodes or edges generated by the candidate reasoning results and the existing paths in the dynamic concept graph. The dynamic concept graph includes nodes and edges between nodes, where the nodes are entities in the dialogue input information and / or entities obtained through reasoning, the edges are semantic, temporal, or logical relationships between entities, and the nodes include entity attributes. The unit also compares the triple content in the candidate reasoning results with a trusted knowledge base to detect whether there is a factual conflict. The triple content includes entities, attributes, and relationships between entities. Finally, the unit matches the content of the candidate reasoning results with a preset insecure pattern library to detect whether there are security compliance anomalies.

[0057] In one embodiment, the macro-gating unit is configured to: if the candidate inference result passes semantic conflict detection, determine the candidate inference result as the inference-generated content and update the dynamic concept graph; if the candidate inference result has logical closed-loop conflict and / or factual conflict, generate an implicit corrective hint vector and inject it into the context of the large language model to guide the large language model to recalculate the candidate inference result of the current step; if the candidate inference result has security compliance anomalies, terminate the current inference path of the candidate inference result and discard the content of the candidate inference result.

[0058] The embodiments of this invention achieve the following technical effects: They propose a multi-level, real-time, dynamic, layer-by-layer anomaly detection system within a large language model, encompassing micro-levels (neuronal level of neural network layers), meso-levels (functional level of functional modules), and macro-levels (semantic level of output layers). Based on anomalies at different levels, corresponding security interventions or corrections are implemented. This also achieves multi-dimensional and fine-grained anomaly detection capabilities, preventing the propagation and amplification of anomalies at individual levels between network layers and avoiding the gradual evolution of local errors into global reasoning biases. Simultaneously, through multi-level collaborative processing, potential conflicts can be identified and corrected at different abstraction levels, providing effective real-time perception and control capabilities for the reasoning process within the large language model. This can fundamentally block the formation of abnormal reasoning chains, significantly improving the stability and reliability of the representation within the large language model. This ensures that the output of the large language model maintains high reliability at the functional, logical, and factual levels, thereby improving the quality and accuracy of the content generated by the large language model.

[0059] Obviously, those skilled in the art should understand that the modules or steps of the above-described embodiments of the present invention can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. Optionally, they can be implemented using computer-executable program code, thereby storing them in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, the embodiments of the present invention are not limited to any particular combination of hardware and software.

[0060] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, various modifications and variations can be made to the embodiments of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for controlling anomalies in dialogue reasoning within a large language model, characterized in that, include: During the process of generating dialogue reasoning using a large language model, abnormal activation of neurons is detected in real time at the neural network level of the large language model, and the output of abnormal neurons is subjected to first safety intervention. At the functional module level of the large language model, functional anomalies in the candidate inference results output by the functional modules are detected in real time, and the candidate inference results with functional anomalies are corrected. At the output level, semantic conflicts in the received candidate inference results are detected in real time, and a second security intervention is performed on candidate inference results with semantic conflicts.

2. The method as described in claim 1, characterized in that, Real-time detection of abnormal neuronal activation, including: For each neural network layer to be detected, the current activation distribution or current activation value of neurons is continuously calculated in real time. The current activation distribution or the current activation value is compared with a preset activation standard, and the abnormal activation status of the neuron is determined based on the comparison result.

3. The method as described in claim 1, characterized in that, First-line safety interventions for the output of abnormal neurons include: Apply a temporary sparse inhibition mask to abnormal neurons to weaken or mask the weights of their outputs.

4. The method as described in claim 1, characterized in that, Real-time detection of functional anomalies in candidate inference results output by the functional module, and correction of abnormal candidate inference results, including: For each functional module to be detected, multiple functional analyses are performed in parallel in real time based on the output and input data of the functional module. These multiple functional analyses include fact retrieval analysis, logical reasoning analysis, and sentiment analysis. For each functional analysis, a score is assigned based on the results, context, preset security rules, and confidence level of each functional analysis. If the function with the lowest score is found to have a functional abnormality, the content related to the functional abnormality in the candidate inference results output by the function module is corrected, and the corrected candidate inference results are output to the output level.

5. The method according to any one of claims 1 to 4, characterized in that, Real-time detection of semantic conflicts in received candidate inference results, including: Load the dynamic concept graph of the current dialogue reasoning, map the words or phrases in the candidate reasoning results to the dynamic concept graph, and detect whether there is a logical closed-loop conflict between the nodes or edges generated by the candidate reasoning results and the existing paths in the dynamic concept graph. The dynamic concept graph includes nodes and edges between nodes. The nodes are entities in the dialogue input information and / or entities obtained through reasoning. The edges are semantic, temporal, or logical relationships between entities. The nodes include the attributes of the entities. The triple content in the candidate reasoning results is compared with the trusted knowledge base to detect whether there is a factual conflict. The triple content includes entities, attributes, and relationships between entities. The content of the candidate inference results is matched with a pre-defined unsafe pattern library to detect whether there are any security compliance anomalies.

6. The method as described in claim 5, characterized in that, A second security intervention is implemented for candidate inference results with semantic conflicts, including: If the candidate reasoning result passes the semantic conflict detection, the candidate reasoning result is determined as the reasoning generated content and the dynamic concept graph is updated; If the candidate reasoning result has a logical closed-loop conflict and / or factual conflict, an implicit corrective hint vector is generated and injected into the context of the large language model to guide the large language model to recalculate the candidate reasoning result of the current step. If a candidate inference result has a security compliance anomaly, terminate the current inference path of the candidate inference result and discard the content of the candidate inference result.

7. A control device for abnormal dialogue reasoning in a large language model, characterized in that, The control device is deployed within the large language model, and the control device includes: The micro-gating unit is used to detect abnormal activation of neurons in real time at the neural network level of the large language model during the process of generating dialogue reasoning in the large language model, and to perform a first safety intervention on the output of abnormal neurons. The meso-level gating unit is used to detect functional anomalies in the candidate inference results output by the functional modules in real time at the functional module level of the large language model, and to correct the candidate inference results with functional anomalies. The macro-gating unit is used to detect semantic conflicts in the received candidate inference results in real time at the output level, and to perform a second security intervention on the candidate inference results with semantic conflicts.

8. The apparatus as claimed in claim 7, characterized in that, The micro-gating unit is used to continuously calculate the current activation distribution or current activation value of neurons in real time for each neural network layer to be detected. The current activation distribution or the current activation value is compared with a preset activation standard, and the abnormal activation status of the neuron is determined based on the comparison result.

9. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method for controlling anomalies in large language model dialogue reasoning as described in any one of claims 1 to 6.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that executes the control method for large language model dialogue reasoning exceptions according to any one of claims 1 to 6.