An Adaptive Repair Method and System for Structured Output of a Large Language Model

By acquiring tightness and budget indicators in real time, dynamically determining risk types, and employing latent space optimization or boundary control techniques, the problems of distribution truncation and structural non-closure in the hard constraint decoding process of large language models are solved, thereby improving the output quality and reliability of the model.

CN122308808APending Publication Date: 2026-06-30SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
Filing Date
2026-03-26
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies cannot uniformly diagnose and specifically repair distributed truncation faults and budget non-closure faults during hard constraint decoding, making it difficult to balance semantic quality and structural integrity in the structured output of large language models.

Method used

By acquiring tightness and budget indicators in real time, the risk type is dynamically determined. Latent space optimization or boundary control techniques are used to specifically repair the risks of distribution truncation and structural non-closure, including superimposing perturbation terms or applying positive bias values ​​on the hidden state vector to reconstruct the probability distribution and guide the generation process.

Benefits of technology

It achieves unified diagnosis and adaptive repair of distribution truncation and structural non-closure faults, improves the format compliance and semantic quality of large language models, and ensures the structural integrity and reliability of the output.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure REF-OBJ-1774508885008-000004
    Figure REF-OBJ-1774508885008-000004
  • Figure REF-OBJ-1774508885008-000014
    Figure REF-OBJ-1774508885008-000014
  • Figure REF-OBJ-1774508885008-000018
    Figure REF-OBJ-1774508885008-000018
Patent Text Reader

Abstract

This application discloses an adaptive repair method and system for structured output of a large language model. The method includes: during structured decoding based on a finite state machine, acquiring in real-time the tightness index and budget index of the current decoding state; determining the current risk type based on the real-time values ​​of the indices; if the tightness index indicates a risk of distribution truncation, reconstructing the probability distribution by superimposing a perturbation term on the hidden state vector; if the budget index indicates a risk of structural non-closure, applying a positive bias value to the output probability of specific closed-related lexical units; sampling and generating the current lexical unit based on the processed probability distribution and iteratively executing the above steps until structured output is completed. This invention solves the problem of the inability to uniformly diagnose and specifically repair two heterogeneous problems—distribution truncation faults and budget non-closure faults—during hard-constraint decoding, achieving a significant improvement in semantic quality and structural integrity while ensuring format compliance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the fields of artificial intelligence and natural language processing technology, and in particular to an adaptive repair method and system for the structured output of a large language model. Background Technology

[0002] Large language models, based on the Transformer architecture, acquire rich linguistic knowledge and context modeling capabilities through pre-training on massive corpora. In practical applications, large language models not only need to generate fluent natural language but are also often required to output content in specific structured formats to meet the parsing and processing needs of downstream systems. These structured output scenarios are wide-ranging; for example, application programming interface (API) calls require models to output strictly formatted JSON for server parsing, medical question-answering systems require output conforming to predefined patterns to ensure information integrity, and code generation systems require output code snippets that pass syntax checks. In these scenarios, output format compliance is as important as semantic quality, both jointly determining the model's usability in real-world business systems.

[0003] To ensure the structured output capability of large language models, the industry commonly employs hard-constraint decoding techniques. The core idea of ​​this technique is to compile the output format specification, such as JSON Schema, regular expressions, or context-free grammars, into a finite state machine during the decoding process. At each step of generating a new lexical, the set of all legal lexicals is calculated based on the current finite state machine state. The log probability of lexicals not in this set is set to negative infinity, and then the remaining lexicals are re-normalized and resampled. Representative implementations include the Outlines tool, which compiles JSON Schema into regular expressions and then into a finite state machine; the Guidance framework, which supports embedding format constraints during the generation process; and XGrammar, which achieves fast constraint decoding through efficient grammar compilation.

[0004] However, in practical applications, hard-constraint decoding suffers from two distinct types of failures. The first type occurs in scenarios with extremely strict constraints, such as when only 2-10 tokens are allowed per step in certain parts of code generation. In this case, the sum of the probabilities of valid tokens approaches zero, forcing the model to sample from the low-quality probability tails, leading to a sharp decline in the semantic quality of the generated content or even complete collapse. The second type of failure stems from the presence of unbounded fields in the output pattern, such as arrays, strings, or numbers of unlimited length. The model may get stuck in an infinite generation loop within a finite number of generation steps until the budget is exhausted and it is forcibly truncated, resulting in an unclosed JSON structure and missing necessary fields. The mechanisms behind these two types of failures are completely different. The former is a probability distribution truncation problem, while the latter is a structural closure problem under budget constraints, requiring entirely different repair methods. Currently, there is a lack of unified diagnostic capabilities for these two types of failures. In practice, only unified repairs such as overall retries and uniformly increasing the temperature can be used, which not only incurs high computational costs but also often fails to effectively solve specific types of failures. Summary of the Invention

[0005] This application provides an adaptive repair method and system for the structured output of a large language model, which solves the problem that two heterogeneous problems, distributed truncation faults and budget non-closure faults, cannot be uniformly diagnosed and targetedly repaired under the same architecture during hard constraint decoding. It realizes real-time differentiation and adaptive processing of the two types of faults, and significantly improves semantic quality and structural integrity while ensuring format compliance.

[0006] This application provides an adaptive repair method and system for the structured output of a large language model, the method comprising: In the process of structured decoding of a large language model based on a finite state machine, the tightness index and budget index of the current decoding state are obtained in real time. The tightness index represents the degree of constraint of the probability distribution, and the budget index represents the remaining resources of the generation. Based on the real-time values ​​of the tightness index and the budget index, the current risk type is determined. If the tightness index indicates a risk of distribution truncation, the probability distribution is reconstructed by superimposing a perturbation term on the hidden state vector of the large language model. If the budget index indicates a risk of structural non-closure, a positive bias value is applied to the output probability of specific closed related words. The current lexical unit is generated based on the probability distribution after reconstruction or after applying a positive bias value, and the process returns to the step of obtaining the compactness index and budget index of the current decoding state in real time until the structured output is completed.

[0007] Optionally, the step of obtaining the tightness index of the current decoding state in real time includes: Obtain the probability values ​​of all legal transition words in the original output probability distribution of the model under the current finite state machine state; The sum of the probability values ​​of all the legal transition words is calculated to obtain the feasible quality value; The feasible quality value is determined as a tightness index characterizing the degree of constraint on the probability distribution; The smaller the feasible quality value, the higher the degree to which the hard constraint mask truncates the original probability distribution of the model.

[0008] Optionally, the steps for obtaining the budget indicators include: Obtain the length of the currently generated sequence and the preset maximum generation length, and calculate the ratio between the two as the first budget component; Calculate the minimum number of steps required to reach any terminal state based on the current finite state machine state, compare the minimum number of steps with the remaining available generation length, and use the comparison result as the second budget component. If the current generation context detects whether there is a continuous expansion signal of an unbounded field, and if it is detected that a numeric field is continuously generating decimal places, an array field is continuously expanding elements, or a string field is continuously generating characters and is not closed, then a third budget component representing the risk of structural non-closure is generated. At least one of the first budget component, the second budget component, and the third budget component is determined as the current budget indicator.

[0009] Optionally, the step of reconstructing the probability distribution by superimposing a perturbation term onto the hidden state vector of the large language model includes: Extract the hidden state vector of the last layer of the large language model decoder and construct an initialized, optimizable perturbation vector; Construct a loss function that includes an interval loss term and a distribution offset constraint term, wherein the interval loss term is used to calculate the log probability difference between the target legal word and the illegal word with the highest probability to increase the difference, and the distribution offset constraint term is used to calculate the degree of deviation between the reconstructed probability distribution and the original probability distribution to limit the degree of deviation; The optimizable perturbation vector is iteratively updated using a gradient optimization algorithm until the loss function is minimized. The optimized perturbation vector is then superimposed onto the hidden state vector to obtain the reconstructed hidden state. Based on the reconstructed hidden state, the reconstructed probability distribution is recalculated.

[0010] Optionally, the step of applying a positive bias value to the output probability of a specific closed related lexical unit includes: Identify the content domain type of the current generation context and classify the content domain type into literal content domain or string content domain; If the content is identified as a literal content domain, then at least one of the comma, right curly brace, and right square bracket is selected as the target closing symbol word, and a first positive bias value is applied to the output probability of the target closing symbol word. If the recognition result is a string content field, then the closed quotation mark word is selected as the target closed symbol word, and a second positive bias value is applied to the output probability of the target closed symbol word, while keeping the output probabilities of other closed symbol words unchanged; The first positive bias value or the second positive bias value is dynamically adjusted based on the real-time value of the budget indicator, thereby completing the guidance processing of the output probability.

[0011] Optionally, the method further includes a static pattern preprocessing step, the static pattern preprocessing step comprising: Receive structured output pattern descriptions from user input, scan and identify unbounded field fields that do not have length or precision limits set; For the identified unbounded fields, bounded compilation processing is performed, where a maximum number of elements is added to array type fields, a discrete enumeration value set is added to numeric type fields, and a maximum character length is added to string type fields. The finite state machine is constructed and generated based on the bounded compilation data for use in the subsequent decoding process.

[0012] Optionally, the method further includes a formal interference diagnosis step, the formal interference diagnosis step comprising: Before inference begins, the input prompts are scanned to identify and remove or replace placeholder templates, regular expression templates, or canonical text content that may be literalized by the model, generating cleaned prompts. During the structured decoding process based on the purified prompt words, the generated text content is monitored in real time to see if there is any leakage of standard text. If a leak is detected, diagnostic information is recorded or an interrupt is triggered, and the decoding process is reinitialized based on a further cleanup prompt.

[0013] Optionally, the method further includes a post-conference quality verification step, which includes: After the structured output is completed, a structural feature vector is extracted from the generated structured output text. The structural feature vector includes the number of brackets, the proportion of punctuation marks, the total length of characters, and the word type-lexicon ratio. The structural feature vector is input into a pre-trained classification model, and the classification model outputs a judgment result, which includes a normal category or a failure category. If the determination result is a failure category, a retry generation mechanism or a degradation processing strategy will be automatically triggered to generate an alternative output result.

[0014] Optionally, the step of determining the current risk type based on the real-time values ​​of the tightness index and the budget index further includes: The test examines whether emergency indicators suggest a risk of distribution truncation and whether budget indicators suggest a risk of structural non-closure. When both distribution truncation risk and structural non-closure risk are detected, the step of reconstructing the probability distribution by superimposing a perturbation term on the hidden state vector of the large language model is selected first, and the step of applying a positive bias value is suspended. When the tightness index does not indicate a risk of distribution truncation, the step of applying a positive bias value to the output probability of a specific closed related term is performed.

[0015] Furthermore, to achieve the above objectives, embodiments of the present invention also provide an adaptive repair system for the structured output of a large language model, the system comprising: The indicator acquisition module is used to acquire the tightness indicator and budget indicator of the current decoding state in real time during the structured decoding of a large language model based on a finite state machine. The risk assessment and repair execution module is used to determine the current risk type based on the real-time values ​​of the tightness index and the budget index. If the tightness index indicates a risk of distribution truncation, the probability distribution is reconstructed by superimposing a perturbation term on the hidden state vector of the large language model. If the budget index indicates a risk of structural non-closure, a positive bias value is applied to the output probability of specific closed related words. The sampling and loop control module is used to sample and generate the current word based on the probability distribution after reconstruction or after applying a positive bias value, and return to execute the step of obtaining the compactness index and budget index of the current decoding state in real time, until the structured output is completed.

[0016] One or more technical solutions provided in the embodiments of this application have at least the following technical effects or advantages: 1. By acquiring diagnostic indicators in two orthogonal dimensions—tightness and budget—in real time, it is possible to accurately distinguish between two completely different anomalies: distributed truncation faults and structural non-closure faults. Based on this, targeted repairs can be carried out by dynamically selecting latent space optimization or boundary control according to the risk type, avoiding the limitations of a one-size-fits-all repair strategy.

[0017] 2. This application employs latent space optimization techniques to address the risk of distribution truncation. By superimposing learnable perturbation vectors onto the model's hidden states, the probability quality of legitimate terms is increased while maintaining controllable distribution offset, thus preventing the model from being forced to sample from the low-quality probability tails. Compared to simply relying on hard constraint decoding, this method effectively mitigates the problem of semantic quality degradation or even collapse while maintaining a high compliance rate, achieving a balance between format compliance and semantic quality.

[0018] 3. This application employs context-aware boundary control techniques to address the risk of structural incompleteness. Differential positive biases are applied to closing symbol lexical units based on content domain type, guiding the model to complete structural closure within a finite number of generation steps. Simultaneously, static pattern preprocessing eliminates structural risks arising from unbounded domain fields before inference, ensuring the structural integrity of the output through both pre-emptive prevention and runtime guidance. This reduces truncation and field loss issues caused by budget exhaustion. Attached Figure Description

[0019] Figure 1 This is a schematic diagram of the framework of the adaptive repair method for the structured output of the large language model in this application; Figure 2 This is a flowchart illustrating the adaptive repair method for the structured output of the large language model in this application. Figure 3 This is a schematic diagram of the framework of the adaptive repair system for the structured output of the large language model in this application; Figure 4 This is a schematic diagram of the terminal structure of the hardware operating environment involved in one embodiment of this application. Detailed Implementation

[0020] The inability to uniformly diagnose and specifically repair two heterogeneous problems—distribution truncation faults and budget non-closure faults—during hard-constraint decoding under a single architecture makes it difficult to balance semantic quality degradation and structural integrity loss in structured output. This application proposes an adaptive repair method and system for structured output of a large language model. During structured decoding based on a finite state machine, a tightness index representing the degree of constraint on the probability distribution and a budget index representing the remaining generation resources are acquired in real time. The current risk type is determined based on the real-time values ​​of these two indices. When distribution truncation risk exists, a perturbation term is superimposed on the hidden state vector of the large language model to reconstruct the probability distribution and improve the probability quality of legitimate lexical units. When structural non-closure risk exists, a positive bias value is applied to the output probability of specific closure-related lexical units to guide the model to complete structural closure within a finite number of steps. The above steps are iteratively executed after sampling and generating the current lexical unit based on the processed probability distribution until structured output is completed. Through this technical solution, this application achieves unified diagnosis and adaptive repair of two types of heterogeneous faults, significantly improving semantic quality and structural integrity while ensuring format compliance.

[0021] To better understand the above technical solutions, exemplary embodiments of this application will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of this application are shown in the drawings, it should be understood that this application can be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of this application and to fully convey the scope of this application to those skilled in the art.

[0022] To better understand the above technical solutions, the following will provide a detailed explanation of the technical solutions in conjunction with the accompanying drawings and specific implementation methods.

[0023] Example 1 In this embodiment, an adaptive repair method for the structured output of a large language model is provided.

[0024] Reference Figure 1 and Figure 2 The adaptive repair method for the structured output of the large language model in this embodiment includes the following steps: Step S1: During the structured decoding of a large language model based on a finite state machine, the tightness index and budget index of the current decoding state are obtained in real time. The tightness index represents the degree of constraint of the probability distribution, and the budget index represents the remaining resources of the generation. In this embodiment, during the word-by-word generation process, at each decoding step, the hard-constraint backend calculates the set of all legal words based on the current finite state machine state. Based on this, a compactness index and a budget index are calculated. By calculating these two indices in real time, two types of fault risks that may occur during the decoding process can be dynamically perceived: distribution truncation risk and structural non-closure risk.

[0025] As an optional implementation, when obtaining the tightness index, the probability values ​​of all legal transition words in the original output probability distribution of the model under the current finite state machine state are obtained; the sum of the probability values ​​of all legal transition words is calculated to obtain the feasible quality value; the feasible quality value is determined as the tightness index characterizing the degree of constraint of the probability distribution; wherein, the smaller the feasible quality value, the higher the degree of truncation of the hard constraint mask on the original probability distribution of the model.

[0026] Specifically, the tightness metric measures the degree to which hard constraints truncate the original probability distribution of the model. The tightness metric is calculated using Feasible Mass (FM). Let the current decoding step be t, the set of legal tokens given by the finite state machine be A_t, and the original token probability distribution of the model be... ,in Let t be the log probability value output by the model. Then the feasible quality FM(t) is defined as: It is the sum of the original probabilities of all legal words. The smaller the value of FM(t), the lower the probability quality of legal words, and the more severe the truncation of the original distribution by the hard constraint.

[0027] As an alternative implementation, the budget metric measures the degree of matching between the current generation resources and the resources required to complete the structured output. In one implementation, the budget metric considers several factors: the proportion of generation steps consumed, the minimum number of steps required to reach the termination state, and whether a continuous expansion signal of the unbounded field is detected.

[0028] Optionally, the budget target is determined by combining the following three components: The first budget component refers to the budget consumption ratio, specifically the length of the currently generated sequence. With preset maximum generation length ratio When r approaches 1, it indicates that the generated resources are about to be exhausted.

[0029] The second budget component refers to assessing the termination feasibility of the finite state machine, specifically calculating the minimum number of steps required to reach any termination state based on the current state of the finite state machine. Combine it with the remaining available generation length Comparison. When When the remaining resources are insufficient to complete the structural closure, it indicates that the remaining resources are insufficient. Alternatively, dynamic programming can be used to estimate the probability of reaching the termination state within the remaining steps, and an early warning can be triggered when this probability falls below a preset probability threshold.

[0030] The third budget component refers to unbounded domain loop detection. By monitoring the current generation context in real time, it detects whether the following loop signals exist: the number of decimal places generated in a numeric field exceeds a preset step threshold, the number of elements generated in an array field exceeds a preset threshold, and the number of characters generated in a string field exceeds a preset threshold without a closing quotation mark. When any of the above signals are detected, a third budget component representing the risk of structural non-closure is generated.

[0031] The three components mentioned above can be used individually or in any combination. For example, the first budget component and the second budget component can be weighted and combined, with the weights preset according to the actual application scenario; alternatively, the first budget component and the third budget component can be combined to increase the sensitivity of the first budget component when a loop signal is detected. When any component or the combined index reaches the preset warning condition, it is determined that there is a risk of structural non-closure.

[0032] Step S2: Based on the real-time values ​​of the tightness index and the budget index, determine the current risk type. If the tightness index indicates a risk of distribution truncation, reconstruct the probability distribution by superimposing a perturbation term on the hidden state vector of the large language model. If the budget index indicates a risk of structural non-closure, apply a positive bias value to the output probability of specific closed related words. In this embodiment, the risk type is determined based on the indicator value.

[0033] As an optional implementation, the tightness index is obtained using the feasible quality FM(t) described in step S1. When FM(t) is lower than a preset threshold, a risk of distribution truncation is determined. This preset threshold can be set according to the actual application scenario, for example, set to 0.1, meaning that repair is triggered when the total probability of legitimate tokens is less than 10%. The selection of the preset threshold needs to strike a balance between computational overhead and repair effectiveness, and can be adjusted according to the specific task.

[0034] Alternatively, the tightness index can also be calculated by the difference between the highest probability of a legally transferred word and the highest probability of an illegally transferred word. The smaller the difference, the weaker the relative advantage of the legal word and the higher the risk of truncation.

[0035] As an alternative implementation, when a risk of distribution truncation is identified, a latent space optimization approach is used to correct it. This involves superimposing a learnable perturbation vector onto the hidden states of the last layer of the model. Optimizing this perturbation vector improves the probability quality of legitimate words while controlling the degree of distribution shift.

[0036] Specifically, latent space optimization includes the following steps: Step S210: Extract the hidden state vector of the last layer of the large language model decoder and construct an initialized optimizable perturbation vector; Specifically, extract the hidden state vector of the last layer of the large language model decoder. ,in This represents the dimension of the model's hidden layers. Simultaneously, an optimizable perturbation vector is constructed. Initialize it as a zero vector.

[0037] Step S220: Construct a loss function that includes an interval loss term and a distribution offset constraint term, wherein the interval loss term is used to calculate the log probability difference between the target legal word and the illegal word with the highest probability to increase the difference, and the distribution offset constraint term is used to calculate the degree of deviation between the reconstructed probability distribution and the original probability distribution to limit the degree of deviation. Specifically, a loss function is constructed that includes an interval loss term and a distribution offset constraint term: in, These are the KL divergence weighting coefficients, used to balance the relative importance between the two loss terms.

[0038] The goal of the interval loss term is to increase the log probability difference between the target valid word and the illegal word with the highest probability. Let the target valid word be... escape word element If the term is the one with the highest log probability value that is not currently in the allowed set, then the interval loss is defined as: in, The logarithmic probability value after perturbation, i.e. ; This is a preset interval threshold. This loss function encourages the log probability of the target valid word to be at least higher than that of the escaped word. When the interval has been reached, the loss is 0.

[0039] target lexicon The selection can employ various strategies. In one implementation, a confidence-based strategy is used, selecting the word with the second-highest original log probability value from the set of legal words as the target word. In another implementation, a greedy strategy is used, selecting the word with the highest original probability among the legal words. In yet another implementation, a pre-selection machine strategy can be employed, using external instructions to fine-tune the model's log probability as a reference for selecting the target word.

[0040] The objective of the distribution offset constraint is to limit the deviation between the reconstructed probability distribution and the original probability distribution, and it is calculated using KL divergence: in, Let be the probability distribution after the perturbation. This represents the original probability distribution. This term ensures that the optimization process does not cause drastic changes in the probability distribution, thereby preserving the model's semantic generation capabilities.

[0041] Step S230: Iteratively update the optimizable perturbation vector using the gradient optimization algorithm until the loss function is minimized. Superimpose the optimized perturbation vector onto the hidden state vector to obtain the reconstructed hidden state. Recalculate the reconstructed probability distribution based on the reconstructed hidden state.

[0042] Specifically, the gradient optimization algorithm is used to iteratively update the optimized perturbation vector. The optimization process continues until the aforementioned loss function is minimized. A decreasing learning rate is used during the optimization process, for example, gradually decreasing the learning rate from an initial 0.1 to 0.01. A maximum optimization step limit and a KL divergence limit are set as safety constraints; when the interval of the target word is met, i.e. The optimization can be terminated early when the log probability exceeds the escape term.

[0043] The optimized perturbation vector Superimposed on the original hidden state vector, the reconstructed hidden state is obtained. Based on this reconstructed hidden state, the logarithmic probability value is recalculated. The reconstructed probability distribution is obtained after softmax normalization. .

[0044] As another alternative implementation, when a risk of structural non-closure is determined, a boundary control approach is used for repair. That is, based on the remaining generation resources and the current context type, a positive probability bias is applied to specific closure-related lexical units, guiding the model to select lexical units that can promote structural closure in a limited way.

[0045] Optionally, boundary control includes the following steps: Step S240: Identify the content domain type of the current generation context and classify the content domain type into literal content domain or string content domain; In this embodiment, the literal content field refers to the context in which literal content such as numerical values, boolean values, or null values ​​is currently being generated; the string content field refers to the content part in which string values ​​are currently being generated, that is, the context in which an open quotation mark has been matched but a closing quotation mark has not yet been matched.

[0046] Step S241: If the content is identified as a literal content domain, then select at least one of the following: comma, right curly brace, and right square bracket as the target closing symbol word, and apply a first positive bias value to the output probability of the target closing symbol word. In this embodiment, if the content is identified as a literal content domain, at least one of the following is selected as the target closing symbol word: comma ",", right curly brace "}", and right square bracket "]". A first positive bias value is applied to the output probability of these target words.

[0047] Optionally, the first positive bias value is a fixed value, such as 8.0.

[0048] Optionally, the first positive bias value is an adaptive value that increases linearly as the remaining budget decreases.

[0049] Step S242: If the recognition result is a string content field, then select the closed quotation mark word as the target closed symbol word, and apply a second positive bias value to the output probability of the target closed symbol word, while keeping the output probabilities of other closed symbol words unchanged; In this embodiment, if the string content field is identified, only the closing quotation mark (or double quotation mark) is selected. This double quotation mark is used as the target closing symbol, and a second positive bias is applied to the output probability of this symbol, while the output probabilities of other closing symbols such as commas and parentheses remain unchanged. This differential processing aims to avoid generating illegal closing symbols within the string content, ensuring syntactic correctness.

[0050] Step S243: Dynamically adjust the magnitude of the first positive bias value or the second positive bias value according to the real-time value of the budget indicator, thereby completing the guidance processing of the output probability.

[0051] In this embodiment, the applied bias value is dynamically adjusted based on the real-time value of the budget indicator. For example, the smaller the remaining budget, the larger the bias value, forming a gradual closed-loop guidance.

[0052] Optionally, the bias value is linearly negatively correlated with the remaining budget: Where r is the budget consumption ratio, and This is an adjustable parameter.

[0053] Optionally, a step function can be used, with different bias values ​​applied when the remaining budget falls into different warning intervals.

[0054] Optionally, the above bias value is superimposed on the original logarithmic probability value to obtain the corrected logarithmic probability distribution, and then sampling is performed within the set of legal lexical units to obtain the output lexical unit of the current step.

[0055] Step S3: Based on the probability distribution after reconstruction or after applying a positive bias value, sample the current word to generate the current word, and return to the step of obtaining the compactness index and budget index of the current decoding state in real time, until the structured output is completed.

[0056] In this embodiment, through the iterative process of the above steps, risks can be perceived in real time, repair strategies can be dynamically selected, and targeted interventions can be carried out in each decoding process, thereby effectively avoiding two types of faults: distribution truncation and structural non-closure, and improving the quality and reliability of the structured output of the large language model.

[0057] Example 2 Based on Embodiment 1, another embodiment of this application is proposed, which further includes a static pattern preprocessing step to optimize the structured output pattern before inference begins.

[0058] Specifically, the static mode preprocessing steps include: Step S010: Receive the structured output pattern description input by the user, scan and identify unbounded field fields that are not subject to length or precision limits; Step S011: Perform bounded compilation processing on the identified unbounded field, where the maximum number of elements is added to array type fields, a discrete enumeration value set is added to numeric type fields, and the maximum character length is added to string type fields. Step S012: Construct and generate the finite state machine based on the pattern data after bounded compilation processing, for use in the subsequent decoding process.

[0059] In this embodiment, a structured output pattern description, such as a JSON Schema definition, input from the user is received, and the types, constraints, and nesting structures of all fields are parsed. Unbounded fields without length or precision limits are scanned and identified. The criteria for determining unbounded fields are: array-type fields without a `maxItems` attribute; numeric-type fields without `minimum` and `maximum` attributes, or with no upper limit on their allowed length; and string-type fields without a `maxLength` attribute and without a restrictive regular expression pattern.

[0060] Bounded compilation processing is performed on the identified unbounded field.

[0061] Optionally, the bounded compilation process includes, for array-type fields, automatically calculating and supplementing the maxItems value based on the generated budget k using a preset scaling function. For example, the scaling function is: maxItems=4 when k=256, maxItems=8 when k=512, and linear interpolation for intermediate values.

[0062] For numeric fields, compile them into a discrete enumeration type. For example, compile a numeric field representing a probability into a discrete set of values ​​{0.0, 0.1, 0.2, ..., 1.0}, which preserves semantic equivalence and avoids infinitesimal loops.

[0063] For string type fields, set the maxLength upper limit proportionally to the budget.

[0064] A finite state machine is constructed and generated based on the bounded compiled pattern data for use in the subsequent decoding process. Simultaneously, the allowable word cloud set size and estimated probability quality for each field within the finite state machine can be calculated, providing a reference for subsequent real-time diagnostics.

[0065] For example, for the question-and-answer task pattern {"answer": string, "evidence": string, "certainty": number, "reasoning": string}, the static pattern compiler compiles the certainty field from a number type to an enum: [0.0, 0.1, ..., 1.0], and supplements the maxItems array field with possible additional reasoning_steps: 8 (when budget k=512). After compilation, the truncation rate decreases from 0.93 to 0.035, and the JSON validity rate increases from 0.07 to 0.965.

[0066] By using static pattern preprocessing, this embodiment eliminates the structural risks that may be caused by unbounded domains before inference begins. Combined with the runtime dynamic control in Embodiment 1, it forms a dual guarantee mechanism of pre-optimization and real-time intervention, which can more effectively improve the success rate and quality of structured output.

[0067] Example 3 Based on the above embodiments, a formal interference diagnosis step is further included to address formal interference issues such as leakage of standard text.

[0068] Specifically, the formal interference diagnosis steps include: Step S020: Before inference begins, scan the input prompts, identify and remove or replace placeholder templates, regular expression templates or canonical text content that may be literalized by the model, and generate cleaned prompts. Optionally, the specific operations include scanning the JSON Schema example section in the prompt words, identifying placeholder text such as "..." or "example_value" which are template contents that are easily copied and output by the model, and removing or replacing them with semantically clear explanatory text.

[0069] Simultaneously, the regular expression template in the pattern definition, i.e., the `pattern` attribute, is scanned to identify regular expression text that might be literalized by the model. For tool invocation scenarios, natural language-based formatting specifications can be provided to replace the regular expression itself, or the `pattern` attribute can be removed directly from the pattern definition.

[0070] Step S021: During the structured decoding process based on the purified prompt words, monitor in real time whether the currently generated text content conforms to the standard text leakage phenomenon. Optionally, canonical text leakage refers to the model literally writing canonical text such as regular expressions, field description text, or example content from the pattern definition into the value domain. The detection method does not perform substring matching or vector similarity calculation between the currently generated value domain text and the canonical text in the pattern definition. If the matching degree exceeds a preset threshold, it is marked as canonical text leakage.

[0071] Step S022: If a leak is detected, record the diagnostic information or trigger an interrupt and reinitialize the decoding process based on the prompt for further purification.

[0072] Optionally, if a leakage is detected, diagnostic information is recorded for subsequent analysis. If a serious canonical text leakage is detected, such as multiple consecutive outputs of text that highly overlap with the pattern definition, an interrupt can be triggered, and the decoding process can be reinitialized based on further cleaned prompts.

[0073] For example, for classification tasks, the prompt cleansing improved accuracy from 0.050 to 0.230, an increase of 18.0 percentage points. For tool invocation tasks, removing the pattern attribute significantly restored the success rate.

[0074] This embodiment effectively avoids literal output caused by the model being interfered with by the standard text in the prompt words through formal interference processing, thereby further improving the semantic quality and format compliance of the structured output.

[0075] Example 4 Building upon the above embodiments, a post-construction quality verification step is further included to perform quality checks and fallback processing on the final generated structured output. The post-construction quality verification step includes: Step S410: After the structured output is completed, extract the structure feature vector from the generated structured output text. The structure feature vector includes the number of brackets, the proportion of punctuation marks, the total length of characters, and the word type-lexicon ratio. Optionally, the structural feature vector includes, but is not limited to, total character length, number of square brackets, number of curly braces, proportion of punctuation marks, proportion of uppercase letters, total number of words, number of newline characters, proportion of numeric characters, word type-lexicon ratio, number of root-level JSON keys, etc.

[0076] Step S411: Input the structural feature vector into the pre-trained classification model, and the classification model outputs a judgment result, which includes a normal category or a failure category; Optionally, the classification model employs a random forest classifier, trained with 200 decision trees, and categorizes the output into five classes: normal, structurally flawed, semantically flawed, aligned, and pattern-broken. Training samples for the classification model can be drawn from historically generated data, constructed through manual or automated rule-based annotation.

[0077] Step S412: If the determination result is a failure category, the retry generation mechanism is automatically triggered or a degradation processing strategy is executed to generate alternative output results.

[0078] Optionally, retry strategies may include re-executing the entire generation process, adjusting generation parameters, or switching to a backup model. Degradation strategies may include returning simplified error messages or returning partial generation results.

[0079] For example, the random forest classifier achieved a macroscopic ROC-AUC of approximately 0.961 on 8272 test samples, with an accuracy of 0.996 for the structural failure category.

[0080] This embodiment provides a final quality defense for structured output through post-processing quality verification, which can effectively identify and intercept outputs with serious faults, prevent them from entering downstream systems, and improve the overall reliability of the system.

[0081] Example 5 Based on the above embodiments, a priority routing rule is further proposed for risk assessment based on tightness and budget indicators. This includes the following steps: Step S201: Check whether the emergency indicator indicates a risk of distribution truncation and whether the budget indicator indicates a risk of structural non-closure; Step S202: When both distribution truncation risk and structural non-closure risk are detected, the step of reconstructing the probability distribution by superimposing a perturbation term on the hidden state vector of the large language model is selected first, and the step of applying a positive bias value is suspended. Step S203: When the tightness index does not indicate a risk of distribution truncation, select to perform the step of applying a positive bias value to the output probability of a specific closed related word.

[0082] Optionally, when both distribution truncation risk and structural non-closure risk are detected simultaneously, the step of reconstructing the probability distribution by superimposing perturbation terms on the hidden state vector of the large language model (i.e., latent space optimization) is prioritized, and the step of applying positive bias values ​​(i.e., boundary control) is paused. The step of applying positive bias values ​​to the output probabilities of specific closed-related lexical units is only selected when the compactness index does not indicate the presence of distribution truncation risk.

[0083] It should be noted that distribution truncation is an immediate fault at the lexical level. If it is not corrected in the current step, the erroneous lexical will cascade to all subsequent steps, causing an irreversible decline in the overall output quality. In contrast, the risk of structural non-closure is a gradual accumulation and can be corrected step by step in subsequent steps through boundary control. Therefore, compactness issues have a higher priority for correction.

[0084] In actual operation, latent space optimization and boundary control may be triggered alternately in different decoding steps. For example, latent space optimization may be mainly triggered in the first half of the generation process, i.e., at the positions of tightly constrained structural delimiters, while boundary control may be mainly triggered in the second half, i.e., when approaching the budget limit. A global state table can be maintained through the routing decision module to record the cumulative number of triggers for each rule, which can be used for post-event analysis and system tuning.

[0085] Example 6 Based on the same inventive concept, this application also provides a system corresponding to the method in Embodiment 1, an adaptive repair system for the structured output of a large language model, referencing... Figure 3 The system includes: an indicator acquisition module, a risk assessment and repair execution module, and a sampling and loop control module. These three modules work together to form a complete closed-loop control architecture.

[0086] The index acquisition module is used to acquire the tightness index and budget index of the current decoding state in real time during the structured decoding of a large language model based on a finite state machine. The tightness index represents the degree of constraint of the probability distribution, and the budget index represents the remaining resources generated.

[0087] Optionally, the indicator acquisition module contains the following sub-units: The compactness index calculation unit is used to calculate the compactness index of the current decoding step. In one optional implementation, this unit obtains the probability values ​​of all legal transition words in the original output probability distribution of the model under the current finite state machine state, calculates the sum of these probability values ​​to obtain a feasible quality value, and determines this feasible quality value as the compactness index. The smaller the feasible quality value, the higher the degree of truncation of the hard constraint mask on the original probability distribution of the model. In another implementation, this unit can also use the proportion of the set of legal transition words to the entire vocabulary size as the compactness index to reduce computational overhead.

[0088] The budget metric calculation unit is used to calculate the budget metric for the current decoding step. This unit further includes three sub-component calculators: The first budget component calculator obtains the length of the currently generated sequence and the preset maximum generation length, and calculates the ratio between the two as the first budget component.

[0089] The second budget component calculator calculates the minimum number of steps required to reach any terminating state based on the current finite state machine state. It then compares this minimum number of steps with the remaining available generation length, using the comparison result as the second budget component. This comparison result essentially assesses the terminating feasibility of the finite state machine; when the minimum number of steps is greater than the remaining steps, it indicates a higher risk of non-closure.

[0090] The third budget component calculator detects whether there are continuous expansion signals of unbounded fields in the current generation context. Specifically, it detects whether numeric fields continuously generate more decimal places than a preset step threshold, whether the number of elements generated in array fields exceeds a preset threshold, and whether the number of characters generated in string fields exceeds a preset threshold without a closing quotation mark. If any of the above conditions are detected, a third budget component representing the risk of structural non-closure is generated.

[0091] The budget indicator calculation unit determines the current budget indicator by combining at least one of the above-mentioned first budget component, second budget component and third budget component.

[0092] The risk assessment and repair execution module is used to determine the current risk type based on the real-time values ​​of the tightness index and the budget index. If the tightness index indicates a risk of distribution truncation, the probability distribution is reconstructed by superimposing a perturbation term on the hidden state vector of the large language model. If the budget index indicates a risk of structural non-closure, a positive bias value is applied to the output probability of specific closed related words. Optionally, the risk assessment and remediation execution module includes the following sub-units: The risk assessment unit determines the current risk type based on indicator values. This unit compares the tightness indicator with a preset threshold; if the tightness indicator is below the threshold, a distribution truncation risk is identified. Simultaneously, this unit compares the budget indicator with preset warning conditions; if the budget consumption ratio exceeds the warning value, the shortest closure distance is greater than the remaining steps, or an unbounded loop signal is detected, a structural non-closure risk is identified. When both distribution truncation and structural non-closure risks exist simultaneously, this unit prioritizes the distribution truncation risk according to a tightness-first priority rule.

[0093] A latent space optimization controller is used to perform remedial operations when a risk of distribution truncation is detected. This controller first extracts the hidden state vector of the last layer of the large language model decoder and constructs an initial, optimizable perturbation vector. Then, a loss function is constructed, including an interval loss term and a distribution offset constraint term. The interval loss term is used to calculate the log probability difference between the target legal word and the highest-probability illegal word to increase this difference, while the distribution offset constraint term is used to calculate the deviation between the reconstructed probability distribution and the original probability distribution to limit this deviation. Next, a gradient optimization algorithm is used to iteratively update the optimizable perturbation vector until the loss function is minimized. The optimized perturbation vector is then superimposed onto the hidden state vector to obtain the reconstructed hidden state, and the reconstructed probability distribution is recalculated based on the reconstructed hidden state.

[0094] The boundary control component performs remedial operations when a structural non-closure risk is detected. This component first identifies the content domain type of the current generation context, classifying it into literal content domains or string content domains. If identified as a literal content domain, it selects at least one of commas, right curly braces, and right square brackets as target closure symbols, applying a first positive bias to the output probabilities of these target symbols. If identified as a string content domain, it selects only closing quotation marks as target closure symbols, applying a second positive bias to the output probability of these symbols while keeping the output probabilities of other closure symbols unchanged. This component also dynamically adjusts the applied bias based on real-time budget metrics; for example, the bias increases as the remaining budget decreases, forming a gradual closure guidance.

[0095] The sampling and loop control module is used to sample and generate the current word based on the probability distribution after reconstruction or after applying a positive bias value, and return to execute the step of obtaining the compactness index and budget index of the current decoding state in real time, until the structured output is completed.

[0096] Optionally, the sampling and loop control module includes the following sub-units: The sampling unit is used to sample words based on the repaired probability distribution. Within the set of legal words, this unit generates the output words for the current step based on the reconstructed probability distribution output by the latent space optimization controller or the probability distribution after applying bias to the boundary control component, using strategies such as random sampling or greedy sampling.

[0097] The state update unit is used to update the state of the finite state machine after generating the current word, so that it enters the next valid state and prepares for the next decoding step.

[0098] The loop control unit is used to determine whether the decoding process has terminated. When the finite state machine reaches the termination state or the number of generation steps reaches the preset upper limit, the unit terminates the loop and outputs the complete structured text; otherwise, the unit triggers the indicator acquisition module to enter the next decoding indicator acquisition process, forming a complete closed loop of indicator acquisition - risk assessment - repair execution - sampling generation - state update - continued loop.

[0099] In addition to the modules mentioned above, the adaptive repair system for the structured output of large language models may also include the following modules to further enhance the overall performance of the system.

[0100] Optionally, the system includes a static pattern compiler as an optional preprocessing module before inference, used to perform bounded compilation of the structured output pattern before decoding begins. This module receives a user-input structured output pattern description, scans and identifies unbounded fields without length or precision limits, performs bounded compilation processing on the identified unbounded fields, including adding a maximum element limit to array-type fields, adding a discrete enumeration set to numeric-type fields, and adding a maximum character length limit to string-type fields. Then, based on the bounded compilation data, a finite state machine is constructed and generated for use by the index acquisition module.

[0101] Optionally, the system includes a prompt purification component as an optional preprocessing module before inference, used to decontaminate input prompts. This component scans the input prompts before inference begins, identifies and removes or replaces any canonical text content that may be literalized by the model, such as exemplary text content or regular expression templates, generating purified prompts to reduce formal interference errors at the source.

[0102] Optionally, the system includes a post-processing blocking module as an optional module for post-generation review, used to perform quality checks on the final output structured text. After the structured output is completed, this module extracts structural feature vectors from the generated structured output text, including the number of brackets, the proportion of punctuation marks, the total character length, and the ratio of word types to lexical units. The structural feature vectors are then input into a pre-trained classification model, which outputs a judgment result. If the judgment result is a failure, a retry generation mechanism is automatically triggered or a degradation processing strategy is executed to generate an alternative output result.

[0103] Through the collaborative work of the three core modules—the indicator acquisition module, the risk assessment and repair execution module, and the sampling and loop control module—and the enhancement of optional auxiliary modules, this system can achieve real-time diagnosis and adaptive repair of distributed truncation faults and budget non-closure faults, significantly improving the overall reliability, semantic quality, and engineering deployability of the structured output of large language models.

[0104] Since the system described in Embodiment 2 of this application is a system used to implement the method of Embodiment 1 of this application, those skilled in the art can understand the specific structure and variations of the system based on the method described in Embodiment 1 of this application, and therefore will not be described again here. All systems used in the method of Embodiment 1 of this application fall within the scope of protection of this application.

[0105] Example 7 In this application embodiment, an adaptive repair device for the structured output of a large language model is proposed.

[0106] Reference Figure 4 , Figure 4 This is a schematic diagram of the terminal structure of the hardware operating environment involved in one embodiment of this application.

[0107] like Figure 4 As shown, the control terminal may include: a processor 1001, such as a CPU, a network interface 1003, a memory 1004, and a communication bus 1002. The communication bus 1002 is used to enable communication between these components. The network interface 1003 may optionally include a standard wired interface or a wireless interface (such as a Wi-Fi interface). The memory 1004 may be high-speed RAM or stable non-volatile memory, such as disk storage. Alternatively, the memory 1004 may be a storage device independent of the aforementioned processor 1001.

[0108] Those skilled in the art will understand that Figure 4 The terminal structure shown does not constitute a limitation on the terminal and may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0109] like Figure 4 As shown, the memory 1004, which serves as a computer storage medium, may include an operating system, a network communication module, and an adaptive repair program for the structured output of a large language model.

[0110] exist Figure 4 In the hardware structure of the adaptive repair device for the structured output of the large language model shown, the processor 1001 can call the adaptive repair program for the structured output of the large language model stored in the memory 1004 and perform the following operations: In the process of structured decoding of a large language model based on a finite state machine, the tightness index and budget index of the current decoding state are obtained in real time. The tightness index represents the degree of constraint of the probability distribution, and the budget index represents the remaining resources of the generation. Based on the real-time values ​​of the tightness index and the budget index, the current risk type is determined. If the tightness index indicates a risk of distribution truncation, the probability distribution is reconstructed by superimposing a perturbation term on the hidden state vector of the large language model. If the budget index indicates a risk of structural non-closure, a positive bias value is applied to the output probability of specific closed related words. The current lexical unit is generated based on the probability distribution after reconstruction or after applying a positive bias value, and the process returns to the step of obtaining the compactness index and budget index of the current decoding state in real time until the structured output is completed.

[0111] Optionally, the processor 1001 may invoke the adaptive repair program of the large language model structured output stored in the memory 1004, and further perform the following operations: Obtain the probability values ​​of all legal transition words in the original output probability distribution of the model under the current finite state machine state; The sum of the probability values ​​of all the legal transition words is calculated to obtain the feasible quality value; The feasible quality value is determined as a tightness index characterizing the degree of constraint on the probability distribution; The smaller the feasible quality value, the higher the degree to which the hard constraint mask truncates the original probability distribution of the model.

[0112] Optionally, the processor 1001 may invoke the adaptive repair program of the large language model structured output stored in the memory 1004, and further perform the following operations: Obtain the length of the currently generated sequence and the preset maximum generation length, and calculate the ratio between the two as the first budget component; Calculate the minimum number of steps required to reach any terminal state based on the current finite state machine state, compare the minimum number of steps with the remaining available generation length, and use the comparison result as the second budget component. If the current generation context detects whether there is a continuous expansion signal of an unbounded field, and if it is detected that a numeric field is continuously generating decimal places, an array field is continuously expanding elements, or a string field is continuously generating characters and is not closed, then a third budget component representing the risk of structural non-closure is generated. At least one of the first budget component, the second budget component, and the third budget component is determined as the current budget indicator.

[0113] Optionally, the processor 1001 may invoke the adaptive repair program of the large language model structured output stored in the memory 1004, and further perform the following operations: Extract the hidden state vector of the last layer of the large language model decoder and construct an initialized, optimizable perturbation vector; Construct a loss function that includes an interval loss term and a distribution offset constraint term, wherein the interval loss term is used to calculate the log probability difference between the target legal word and the illegal word with the highest probability to increase the difference, and the distribution offset constraint term is used to calculate the degree of deviation between the reconstructed probability distribution and the original probability distribution to limit the degree of deviation; The optimizable perturbation vector is iteratively updated using a gradient optimization algorithm until the loss function is minimized. The optimized perturbation vector is then superimposed onto the hidden state vector to obtain the reconstructed hidden state. Based on the reconstructed hidden state, the reconstructed probability distribution is recalculated.

[0114] Optionally, the processor 1001 may invoke the adaptive repair program of the large language model structured output stored in the memory 1004, and further perform the following operations: Identify the content domain type of the current generation context and classify the content domain type into literal content domain or string content domain; If the content is identified as a literal content domain, then at least one of the comma, right curly brace, and right square bracket is selected as the target closing symbol word, and a first positive bias value is applied to the output probability of the target closing symbol word. If the recognition result is a string content field, then the closed quotation mark word is selected as the target closed symbol word, and a second positive bias value is applied to the output probability of the target closed symbol word, while keeping the output probabilities of other closed symbol words unchanged; The first positive bias value or the second positive bias value is dynamically adjusted based on the real-time value of the budget indicator, thereby completing the guidance processing of the output probability.

[0115] Optionally, the processor 1001 may invoke the adaptive repair program of the large language model structured output stored in the memory 1004, and further perform the following operations: Receive structured output pattern descriptions from user input, scan and identify unbounded field fields that do not have length or precision limits set; For the identified unbounded fields, bounded compilation processing is performed, where a maximum number of elements is added to array type fields, a discrete enumeration value set is added to numeric type fields, and a maximum character length is added to string type fields. The finite state machine is constructed and generated based on the bounded compilation data for use in the subsequent decoding process.

[0116] Optionally, the processor 1001 may invoke the adaptive repair program of the large language model structured output stored in the memory 1004, and further perform the following operations: Before inference begins, the input prompts are scanned to identify and remove or replace placeholder templates, regular expression templates, or canonical text content that may be literalized by the model, generating cleaned prompts. During the structured decoding process based on the purified prompt words, the generated text content is monitored in real time to see if there is any leakage of standard text. If a leak is detected, diagnostic information is recorded or an interrupt is triggered, and the decoding process is reinitialized based on a further cleanup prompt.

[0117] Optionally, the processor 1001 may invoke the adaptive repair program of the large language model structured output stored in the memory 1004, and further perform the following operations: After the structured output is completed, a structural feature vector is extracted from the generated structured output text. The structural feature vector includes the number of brackets, the proportion of punctuation marks, the total length of characters, and the word type-lexicon ratio. The structural feature vector is input into a pre-trained classification model, and the classification model outputs a judgment result, which includes a normal category or a failure category. If the determination result is a failure category, a retry generation mechanism or a degradation processing strategy will be automatically triggered to generate an alternative output result.

[0118] Optionally, the processor 1001 may invoke the adaptive repair program of the large language model structured output stored in the memory 1004, and further perform the following operations: The test examines whether emergency indicators suggest a risk of distribution truncation and whether budget indicators suggest a risk of structural non-closure. When both distribution truncation risk and structural non-closure risk are detected, the step of reconstructing the probability distribution by superimposing a perturbation term on the hidden state vector of the large language model is selected first, and the step of applying a positive bias value is suspended. When the tightness index does not indicate a risk of distribution truncation, the step of applying a positive bias value to the output probability of a specific closed related term is performed.

[0119] Furthermore, to achieve the above objectives, embodiments of the present invention also provide an adaptive repair system for the structured output of a large language model, the system comprising: The indicator acquisition module is used to acquire the tightness indicator and budget indicator of the current decoding state in real time during the structured decoding of a large language model based on a finite state machine. The risk assessment and repair execution module is used to determine the current risk type based on the real-time values ​​of the tightness index and the budget index. If the tightness index indicates a risk of distribution truncation, the probability distribution is reconstructed by superimposing a perturbation term on the hidden state vector of the large language model. If the budget index indicates a risk of structural non-closure, a positive bias value is applied to the output probability of specific closed related words. The sampling and loop control module is used to sample and generate the current word based on the probability distribution after reconstruction or after applying a positive bias value, and return to execute the step of obtaining the compactness index and budget index of the current decoding state in real time, until the structured output is completed.

[0120] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0121] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0122] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0123] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0124] It should be noted that any reference signs placed between parentheses in the claims should not be construed as limiting the claims. The word "comprising" does not exclude the presence of components or steps not listed in the claims. The word "a" or "an" preceding a component does not exclude the presence of a plurality of such components. This application can be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, third, etc., does not indicate any order. These words can be interpreted as names.

[0125] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.

[0126] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of the invention. Therefore, if these modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include these modifications and variations.

Claims

1. An adaptive repair method for the structured output of a large language model, characterized in that, The method includes: In the process of structured decoding of a large language model based on a finite state machine, the tightness index and budget index of the current decoding state are obtained in real time. The tightness index represents the degree of constraint of the probability distribution, and the budget index represents the remaining resources of the generation. Based on the real-time values ​​of the tightness index and the budget index, the current risk type is determined. If the tightness index indicates a risk of distribution truncation, the probability distribution is reconstructed by superimposing a perturbation term on the hidden state vector of the large language model. If the budget index indicates a risk of structural non-closure, a positive bias value is applied to the output probability of specific closed related words. The current lexical unit is generated based on the probability distribution after reconstruction or after applying a positive bias value, and the process returns to the step of obtaining the compactness index and budget index of the current decoding state in real time until the structured output is completed.

2. The method as described in claim 1, characterized in that, The step of obtaining the tightness index of the current decoding status in real time includes: Obtain the probability values ​​of all legal transition words in the original output probability distribution of the model under the current finite state machine state; The sum of the probability values ​​of all the legal transition words is calculated to obtain the feasible quality value; The feasible quality value is determined as a tightness index characterizing the degree of constraint on the probability distribution; The smaller the feasible quality value, the higher the degree to which the hard constraint mask truncates the original probability distribution of the model.

3. The method as described in claim 2, characterized in that, The steps to obtain the budget indicators include: Obtain the length of the currently generated sequence and the preset maximum generation length, and calculate the ratio between the two as the first budget component; Calculate the minimum number of steps required to reach any terminal state based on the current finite state machine state, compare the minimum number of steps with the remaining available generation length, and use the comparison result as the second budget component. If the current generation context detects whether there is a continuous expansion signal of an unbounded field, and if it is detected that a numeric field is continuously generating decimal places, an array field is continuously expanding elements, or a string field is continuously generating characters and is not closed, then a third budget component representing the risk of structural non-closure is generated. At least one of the first budget component, the second budget component, and the third budget component is determined as the current budget indicator.

4. The method as described in claim 1, characterized in that, The step of reconstructing the probability distribution by superimposing perturbation terms on the hidden state vector of a large language model includes: Extract the hidden state vector of the last layer of the large language model decoder and construct an initialized, optimizable perturbation vector; Construct a loss function that includes an interval loss term and a distribution offset constraint term, wherein the interval loss term is used to calculate the log probability difference between the target legal word and the illegal word with the highest probability to increase the difference, and the distribution offset constraint term is used to calculate the degree of deviation between the reconstructed probability distribution and the original probability distribution to limit the degree of deviation; The optimizable perturbation vector is iteratively updated using a gradient optimization algorithm until the loss function is minimized. The optimized perturbation vector is then superimposed onto the hidden state vector to obtain the reconstructed hidden state. Based on the reconstructed hidden state, the reconstructed probability distribution is recalculated.

5. The method as described in claim 4, characterized in that, The step of applying a positive bias value to the output probability of a specific closed related word includes: Identify the content domain type of the current generation context and classify the content domain type into literal content domain or string content domain; If the content is identified as a literal content domain, then at least one of the comma, right curly brace, and right square bracket is selected as the target closing symbol word, and a first positive bias value is applied to the output probability of the target closing symbol word. If the recognition result is a string content field, then the closed quotation mark word is selected as the target closed symbol word, and a second positive bias value is applied to the output probability of the target closed symbol word, while keeping the output probabilities of other closed symbol words unchanged; The first positive bias value or the second positive bias value is dynamically adjusted based on the real-time value of the budget indicator, thereby completing the guidance processing of the output probability.

6. The method as described in claim 1, characterized in that, The method further includes a static pattern preprocessing step, which includes: Receive structured output pattern descriptions from user input, scan and identify unbounded field fields that do not have length or precision limits set; For the identified unbounded fields, bounded compilation processing is performed, where a maximum number of elements is added to array type fields, a discrete enumeration value set is added to numeric type fields, and a maximum character length is added to string type fields. The finite state machine is constructed and generated based on the bounded compilation data for use in the subsequent decoding process.

7. The method as described in claim 1, characterized in that, The method further includes a formal interference diagnosis step, which includes: Before inference begins, the input prompts are scanned to identify and remove or replace placeholder templates, regular expression templates, or canonical text content that may be literalized by the model, generating cleaned prompts. During the structured decoding process based on the purified prompt words, the generated text content is monitored in real time to see if there is any leakage of standard text. If a leak is detected, diagnostic information is recorded or an interrupt is triggered, and the decoding process is reinitialized based on a further cleanup prompt.

8. The method as described in claim 1, characterized in that, The method further includes a post-processing quality verification step, which includes: After the structured output is completed, a structural feature vector is extracted from the generated structured output text. The structural feature vector includes the number of brackets, the proportion of punctuation marks, the total length of characters, and the word type-lexicon ratio. The structural feature vector is input into a pre-trained classification model, and the classification model outputs a judgment result, which includes a normal category or a failure category. If the determination result is a failure category, a retry generation mechanism or a degradation processing strategy will be automatically triggered to generate an alternative output result.

9. The method according to any one of claims 1-8, characterized in that, The step of determining the current risk type based on the real-time values ​​of the tightness index and the budget index further includes: The test examines whether emergency indicators suggest a risk of distribution truncation and whether budget indicators suggest a risk of structural non-closure. When both distribution truncation risk and structural non-closure risk are detected, the step of reconstructing the probability distribution by superimposing a perturbation term on the hidden state vector of the large language model is selected first, and the step of applying a positive bias value is suspended. When the tightness index does not indicate a risk of distribution truncation, the step of applying a positive bias value to the output probability of a specific closed related term is performed.

10. An adaptive repair system for structured output of a large language model, characterized in that, The system includes: The indicator acquisition module is used to acquire the tightness indicator and budget indicator of the current decoding state in real time during the structured decoding of a large language model based on a finite state machine. The risk assessment and repair execution module is used to determine the current risk type based on the real-time values ​​of the tightness index and the budget index. If the tightness index indicates a risk of distribution truncation, the probability distribution is reconstructed by superimposing a perturbation term on the hidden state vector of the large language model. If the budget index indicates a risk of structural non-closure, a positive bias value is applied to the output probability of specific closed related words. The sampling and loop control module is used to sample and generate the current word based on the probability distribution after reconstruction or after applying a positive bias value, and return to execute the step of obtaining the compactness index and budget index of the current decoding state in real time, until the structured output is completed.