A method for code automatic generation based on constraint decoding

By using a constraint-based code generation method, which utilizes a dual-encoder architecture and a symbolic finite state machine constraint decoding operator, the adaptability and accuracy issues of code generation in existing technologies are resolved, resulting in more efficient code generation.

CN122308809APending Publication Date: 2026-06-30CHONGQING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHONGQING UNIV OF POSTS & TELECOMM
Filing Date
2026-03-30
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing code generation methods are poorly adaptable to complex business logic and multi-language development environments, making them difficult to quickly expand and maintain. Furthermore, intelligent learning models, in the absence of explicit syntactic and semantic constraints, are prone to generating illegal tokens or invalid symbols that exceed the range of variable values, leading to syntax errors and unexecutable code.

Method used

An automated code generation method based on constraint decoding is adopted. Feature fusion is performed through a dual encoder architecture, and constraint decoding operators of symbolic finite state machines are combined to extract multi-source heterogeneous semantic features and perform dynamic mask pruning to ensure that the generated code conforms to syntax specifications and variable constraints.

Benefits of technology

It significantly improves the accuracy and contextual understanding of code generation, enhances BLEU score and accuracy, outperforms existing methods, and achieves improvements of 3.13% and 2.41% on the Shellcode_IA32 and Django datasets, respectively.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122308809A_ABST
    Figure CN122308809A_ABST
Patent Text Reader

Abstract

This invention belongs to the field of code generation, specifically relating to an automated code generation method based on constraint decoding (GFM-SFSM). The method includes: constructing a template-based natural language intent parser to parse the input natural language, extracting original natural language intent features and semantic features obtained after template parsing; inputting the original natural language intent features and template parsing features into a dual encoder model, and using a gating fusion mechanism to adaptively fuse and filter redundancy of the dual-source features to obtain high-quality semantic representations; in the decoding stage, introducing a constraint-aware decoding operator based on symbolic finite state machines, and using dynamic Logits masking technology to constrain the candidate word space to generate code sequences that conform to grammatical rules; and using a trained automated code generation model to process test data to obtain the code generation results. This invention achieves fast and accurate automated code generation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0003] This invention belongs to the interdisciplinary field of artificial intelligence and program synthesis, specifically relating to an automated code generation method based on constraint decoding, which is particularly suitable for scenarios where structured program generation is strongly correlated with semantic constraints. Background Technology

[0005] Code generation refers to the process by which computer systems automatically generate executable code, scripts, interface logic, or software modules based on natural language descriptions, functional requirements, design rules, or existing contextual information. It is a crucial technology for improving software development efficiency and reducing the cost of manual programming. Code generation systems can analyze and understand input requirements, business rules, and contextual data, outputting program code that conforms to syntax standards and functional requirements. Current code generation methods can be mainly divided into rule-based template-based methods and intelligent learning-based methods.

[0006] Traditional rule-based template-based code generation methods typically map input parameters to corresponding code snippets by pre-defining code templates, syntax rules, and transformation logic to achieve automated development in specific scenarios. However, this method heavily relies on manually constructed templates and rule bases, resulting in poor adaptability. When faced with complex business logic, multi-language development environments, and application scenarios with frequently changing requirements, it often struggles to be quickly expanded and maintained, and its ability to handle unstructured requirements is limited.

[0007] To improve the adaptability of code generation systems to complex development tasks and dynamic requirements, code generation algorithms based on intelligent learning have gradually become a focus of research. These algorithms are primarily implemented using technologies such as machine learning, deep learning, and large-scale pre-trained models. While intelligent learning models possess strong generalization capabilities, the probability search space in their autoregressive generation process is typically open. In the absence of explicit syntactic and semantic constraints, the model is prone to generating illegal tokens that do not conform to language specifications or invalid symbols exceeding the range of predefined variable values ​​during the decoding process, leading to problems such as syntax errors, inconsistent variable references, and unexecutable code.

[0008] With the rapid growth of software development needs and the continuous intelligentization of development processes, there is an urgent need for a code generation method with high accuracy and good context understanding capabilities to automate code generation for various development tasks, thereby improving software development efficiency and reducing manual writing and maintenance costs. Summary of the Invention

[0010] To address the shortcomings of existing technologies, this invention proposes an automated code generation method based on constraint decoding, which includes:

[0011] S1: Obtain the training dataset and divide it into training and test sets;

[0012] S2: Preprocess the dataset and perform intent parsing to extract the original semantic features and template parsing features respectively;

[0013] S3: Input the original semantic features and template parsing features into the model, and use a gated fusion mechanism to adaptively filter and fuse the features to obtain high-quality semantic representations;

[0014] S4: A constraint decoding operator based on a symbolic finite state machine is used to decode and search for high-quality semantic representations to obtain the final code generation result;

[0015] S5: Use the trained model to process the test set and obtain the code generation result.

[0016] Preferably, the preprocessing of the dataset includes:

[0017] S21: Perform stop word filtering and word segmentation on the input sequence;

[0018] S22: Identify and extract normalizable tokens from natural language intents using an intent parser, and filter out non-normalizable tokens using a language keyword dictionary.

[0019] S23: Replace the selected standardized tokens with placeholders to build a slot mapping table and obtain the standardized template parsing input;

[0020] S24: Using a dual encoder architecture, the original semantic features are obtained by extracting features from the original natural language intent through the original input encoder, and the template parsing features are obtained by encoding the standardized template parsing input through the preprocessing parsing input encoder.

[0021] Preferably, the process of fusing the outputs of two encoders using a gating mechanism includes:

[0022] In the model, the semantic vectors obtained from the two encoders are first efficiently fused. Specifically, this fusion layer models the input vector through three different feature combination methods, characterizing semantic relationships from three perspectives: concatenation, difference, and similarity. The calculation process is shown in the formula below:

[0023]

[0024]

[0025]

[0026] in, , and This represents a single-layer feedforward neural network with three independent parameters; "Indicates vector concatenation operation;" "This represents the vector difference operation, used to highlight the differences between two semantic vectors;" "" indicates element-wise multiplication, used to characterize the similarity between vectors.

[0027] Subsequently, the three intermediate representation vectors obtained , and The data is concatenated and fed into another feedforward neural network to obtain the final output of the fusion layer:

[0028]

[0029] For the sake of brevity in the subsequent formulas, we will Abbreviated as .

[0030] Original semantic features With initial fusion features Perform splicing and calculate its gating weight. ;

[0031] Based on gating weight The features are dynamically weighted and fused, and then L2 normalization is performed to obtain high-quality semantic representations.

[0032] Preferably, the automated code generation model consists of a cascaded encoding layer (composed of two CodeBERT layers), a semantic attention layer, a gating fusion layer, and a constraint decoding layer.

[0033] Preferably, the process of decoding and searching for high-quality semantic representations using a constraint decoding operator based on a symbolic finite state machine includes:

[0034] S41: Extract variable constraints, structural constraints, and parameter constraints from the constraint template, and combine them with the slot mapping table built in the front end to construct a symbolic finite state machine;

[0035] S42: At the time step of the decoding phase, based on the current syntax state of the symbolic finite state machine, generate a binary legality indicator vector to identify the legality of each token in the candidate vocabulary;

[0036] S43: Construct a dynamic feature mask matrix based on the binary valid indicator vector, and superimpose the dynamic feature mask matrix onto the original Logits output by the model to obtain the reconstructed logical value;

[0037] S44: The reconstructed logic value is probabilistically normalized using the Softmax function, and the candidate path with the highest cumulative log probability is retained by the bundle search strategy until the end symbol output code is generated.

[0038] Furthermore, the formulas for calculating the reconstructed logic value and the output probability distribution are as follows:

[0039]

[0040]

[0041] in, Indicates the refactoring logic value. Represents the original unnormalized logical value. Indicates the first Step to generate the first candidate words The probability, This represents the generated historical sequence. This indicates a high-quality semantic representation. Indicates the size of the vocabulary.

[0042] The beneficial effects of this invention are as follows:

[0043] This invention proposes an automated code generation method based on constraint decoding (GFM-SFSM). Building upon structured template parsing of natural language intent, it combines and improves a dual-stream pre-trained encoder architecture, optimizing the gating fusion and adaptive filtering of multi-source heterogeneous semantic features. Simultaneously, it employs a constraint decoding algorithm based on symbolic finite state machines to dynamically mask and prune the autoregressive probability search space. Experiments on the Shellcode_IA32 and Django datasets demonstrate that this invention significantly outperforms existing methods, improving BLEU score and accuracy by 3.13% and 2.41% respectively compared to the baseline model. Attached Figure Description

[0045] Figure 1 This is a schematic diagram of the automatic code generation process in this invention;

[0046] Figure 2 This is a schematic diagram of the code automatic generation model architecture in this invention;

[0047] Figure 3 This is a schematic diagram of the gating fusion layer in this invention. Detailed Implementation

[0049] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0050] This invention proposes an automated code generation method based on constraint decoding, such as... Figure 1 As shown, the method includes the following:

[0051] S1: Obtain the training dataset and divide it into training and test sets.

[0052] Preferably, Shellcode_IA32 and the Django dataset can be used as the training dataset.

[0053] S2: The dataset is preprocessed and intent parsed to extract the original semantic features and template parsing features, respectively.

[0054] S21: Perform stop word filtering and word segmentation on the input sequence.

[0055] Remove pre-compiled custom stop word sets (such as the, each, onto, etc.) to retain only the relevant data for machine translation; then perform tokenization, using the nltk lexical analyzer to tokenize the natural language intent sequence and using Python's tokenize package to tokenize the code snippet sequence, converting the input string into a byte representation and breaking it down into a sequence of subwords;

[0056] S22: Identify and extract normalizable tokens from natural language intents using an intent parser, and filter out non-normalizable tokens using a language keyword dictionary.

[0057] Regular expressions are used to identify hexadecimal values, strings enclosed in quotation marks or square brackets, variable names with various naming conventions (such as camelCase or underscore naming), function names, mathematical expressions, and byte arrays in sequences; at the same time, WordNet is used to identify alphabetic strings that do not conform to English language conventions; an intent parser extracts a dictionary of normalizable tokens composed of specific values, label names, and parameters, and non-normalizable tokens are filtered out by combining a pre-built dictionary of assembly language keywords and Python language keywords;

[0058] S23: Replace the selected standardized tokens with placeholders to build a slot mapping table and obtain the standardized template parsing input;

[0059] Replace the selected standardized tokens in the intent and code snippets with placeholders in the format "var#" (where # represents a number sequence number), and store the original tokens and their corresponding standardized placeholders in a mapping dictionary for use in the post-processing stage;

[0060] S24: Using a dual encoder architecture, the original semantic features are obtained by extracting features from the original natural language intent through the original input encoder, and the template parsing features are obtained by encoding the standardized template parsing input through the preprocessing parsing input encoder for post-processing.

[0061] Preferably, the core employs two independent CodeBERT (Roberta architecture encoders): a raw input encoder and a preprocessing parsing input encoder. This allows for the separate extraction of features from different input formats, balancing semantic integrity and domain specificity. The specific design is as follows:

[0062] 1. Raw Input Encoder: Receives raw natural language input (S) containing the user's natural expression of their needs, encompassing complete intent logic. The raw encoder is initialized based on a Roberta pre-trained model and extracts semantic features from the raw input through processes such as word segmentation, encoding, and multi-layer attention mechanisms. Its core function is to preserve the original expression logic and semantic integrity of the input, capture the original semantic information of the intent, and provide a foundation for subsequent feature fusion.

[0063] 2. Preprocessing parsing input encoder (temp_encoder): Receives the raw input... The standardized input after preprocessing and parsing (denoted as ) The preprocessing and parsing process mainly involves using rule-based parsing tools to extract key domain features from the original intent, and then standardizing and structuring them to eliminate redundant expressions and ambiguities in the original input. `temp_encoder` is also initialized based on a Roberta pre-trained model and is responsible for extracting the domain semantic features of this structured input (denoted as...). Its core function is to strengthen the expression of core information of intent, highlight key features of the domain, and improve the relevance of features.

[0064] Both encoders incorporate layer attention mechanisms to enhance the extraction of key layer features and ensure the quality of output features. At the same time, the hyperparameter settings of the two encoders are kept consistent to avoid the feature fusion effect being affected by parameter differences, thus providing a well-adapted input for the subsequent multi-layer semantic feature fusion module.

[0065] S3: Input the original semantic features and template parsing features into the model, and use a gated fusion mechanism to adaptively filter and fuse the features to obtain high-quality semantic representations;

[0066] The intent-gated fusion module is the core solution to the feature redundancy problem in the original model. It introduces an intent attention gating layer, which uses gating mechanism and L2 normalization to filter relevant core features and enhance semantics, guiding the model to focus on the target code. The specific design is as follows:

[0067] In the model, the semantic vectors obtained from the two encoders are first efficiently fused. Specifically, this fusion layer models the input vector through three different feature combination methods, characterizing semantic relationships from three perspectives: concatenation, differencing, and dot product. The calculation formulas are as follows:

[0068]

[0069]

[0070]

[0071] in, , and This represents a single-layer feedforward neural network with three independent parameters. This represents a vector concatenation operation; This represents the vector difference operation, used to highlight the differences between two semantic vectors; This indicates element-wise multiplication and is used to characterize the similarity between vectors.

[0072] Subsequently, the three intermediate representation vectors obtained , and The data is concatenated and fed into another feedforward neural network to obtain the final output of the fusion layer:

[0073]

[0074] For the sake of brevity in the subsequent formulas, we will Abbreviated as .

[0075] The core function of the gating fusion layer is to calculate the correlation between features and intent, filter out relevant core features, weaken irrelevant and redundant features, and calculate dynamic gating weights. We will use the original features With the fusion feature The components are concatenated, and the allocation ratio is calculated using the Sigmoid function. The core calculation formula is as follows:

[0076]

[0077] Perform feature filtering and fusion. Use the calculated... Instead of direct addition, perform a smooth weighted summation.

[0078] The formula is as follows:

[0079]

[0080] When the input contains strong domain identifiers (such as hexadecimal addresses), the model adaptively increases the corresponding gating weights. This achieves dynamic intent focusing. The parameters in the formula are explained below:

[0081] : Gating weights, with values ​​ranging from [0,1], are activated by the Sigmoid activation function ( This is calculated and used to measure the relevance between features and intent. The closer a value is to 1, the stronger its relevance to the intent; the closer a value is to 0, the more redundant and irrelevant the feature is.

[0082] The trainable parameters of the gating layer are the weight matrix and the bias term, respectively. The model is trained to adaptively learn the selection rules of intent-related features.

[0083] The output features are concatenated to provide complete feature information for gating weight calculation.

[0084] Element-wise multiplication enables element-wise weighting of gating weights and features, thus filtering out redundant features.

[0085] The core features obtained after gating retain the core information of the two encoder output features while removing redundant and irrelevant content, thus focusing on the intent.

[0086] To improve the stability and discriminative power of features and avoid interference from features of different scales in subsequent decoding, the core features of the output are filtered after the gating layer is completed. L2 normalization is performed to obtain a normalized vector representation, thereby mitigating the impact of different feature scales. The normalization formula is as follows:

[0087]

[0088] in, Representation of features L2 norm, normalized features While maintaining the same dimensions, stability and comparability are significantly improved, ensuring that the subsequent decoding and generation process can accurately focus on the target code.

[0089] S4: A constraint decoding operator based on a symbolic finite state machine is used to decode and search for high-quality semantic representations to obtain the final code generation result.

[0090] This invention defines SFSM as a six-tuple. .

[0091] in A finite set of states, divided according to the language domain. . Input the alphabet, i.e., the model's vocabulary. . State transition function, . : Initial state (Start_State). : Set of termination states (EOF_State). : Contextual memory stack, used to handle nested parentheses or consecutive indentation levels.

[0092] 1. Assembly domain state subspace

[0093] For assembly datasets, the model enters at the beginning of the decoding phase. The core states include: (Labeling State): Monitors instruction anchors such as D1:, F1:, decode:, etc. (Opcode state): Restricts the generator set to valid x86 instruction mnemonics (such as mov, cmp, not, xor). (Operand State): Constrain register matching to avoid illegal operand pairs with conflicting bit widths, such as cmp ax, bl. (Pure Data State): When the db (Define Byte) instruction is detected, the state machine unconditionally transitions to this state. At this time, the permission set of the state machine is strictly limited to {0-9, af, AF, x, ,, \s}, thereby completely eliminating the appearance of phantom characters in shellcode encoded sequences (such as encodedshellcode: db 0x32,0x51...).

[0094] 2. Python Domain State Subspace

[0095] For Python code, the state machine enters... The core states include: (Contraction state): Subject to Stack control forces the generation of a corresponding level of space token upon encountering a newline character \n. (Slot Injection State): When the context semantics trigger parameter passing, the state machine queries... This forces the decoder to choose from a limited set of candidate variables (such as target_ip, payload_buf), ensuring absolute consistency of context symbol references.

[0096] Constraint templates are structured descriptions of code, containing key information such as valid variable definitions, structural specifications, and parameter ranges, and are the core basis for constraint extraction. This paper, based on the preprocessed parsed input S' in the model input and combined with a constraint template library, automatically extracts three types of core constraints and constructs a set of allowed tokens (allowed_tokens). The specific extraction rules are as follows:

[0097] Variable constraint extraction: Extract all predefined legal variables from the constraint template to construct a variable constraint set. For example, if the template defines var0 as the load, var1 as the decoder stub, and var2 as the trigger function, then the variable constraint set is {"var": ["var0", "var1", "var2"]}, explicitly stating that the model can only generate variable names within this set and prohibits the generation of illegal variables.

[0098] Structural constraint extraction: Analyze the code structure of the constraint template, extract valid code snippets, statement formats, and logical sequences, and construct structural constraints. For example, the fixed logic of "load definition, trigger, result return" in the Shellcode constraint template, and the statement order of "request construction, parameter injection, execution" in the Django template, must be followed by the code generated by the constraint model to conform to this structural specification.

[0099] Parameter constraint extraction: Extract the explicit parameter ranges and valid values ​​from the constraint template to construct parameter constraints. Ensure that the parameters generated by the model meet the requirements of the actual scenario.

[0100] The constraint extraction process employs a combination of rule-based parsing and regular expression matching to ensure that the extracted constraints are consistent with the input encoding module's S'. It also supports dynamic updates to adapt to the template constraint requirements of different scenarios. After extraction, the constraint information is stored in a structured constraint rule base to support subsequent constraint decoding.

[0101] Physical actuators that implement boundary constraints: during the generation of the decoder During the forward propagation of each token, let the Logits output by the last layer of the model be... .

[0102] SFSM based on current state Generate a binary valid indicator vector If the first word in the vocabulary... If a token is deemed valid by the state machine, then ,otherwise .

[0103] This invention constructs a dynamic penalty masking operator. The formula is as follows:

[0104]

[0105] in : Indicates the current decoding time step, that is, the time step the model is generating. Tokens. : indicates the first in the model vocabulary 10 candidate words.

[0106] Applying the mask to the original Logits yields the constrained reconstructed logical values. The formula is as follows:

[0107]

[0108] Finally, the Softmax layer is used to calculate the... The output probability distribution of the step is shown in the following formula:

[0109]

[0110] in, : Represents probability. : Indicates the current time step. The model in the first The token or word to be generated in this step. This is a random variable. The first in the model vocabulary A specific candidate word. This means that "the word generated at the current moment is exactly the [number]th word in the vocabulary". "one word". At any moment All previously generated historical sequences (i.e.) In autoregressive generation, future predictions are strictly dependent on previously generated content. Contextual feature representation. It is typically the hidden state matrix output by the encoder after encoding the input sequence, containing all the semantic information of the input text. The model in the first The corresponding candidate words calculated step by step The original prediction score. Natural exponential function It serves two purposes: first, it maps all scores to non-negative numbers greater than 0; second, it exponentially amplifies the differences between scores, making the advantages of words with higher scores even more pronounced after conversion to probabilities. : The size of the vocabulary. That is, the total number of different tokens that the model can recognize and generate. : Summation index, traversing each word in the vocabulary (starting from the first word). One to the first indivual). The denominator, known as the normalization constant or partition function, is the sum of the index scores of all candidate words in the vocabulary.

Claims

1. A method for automatically generating code based on constraint decoding, characterized in that, Includes the following steps: S1: Obtain the training dataset and divide it into training and test sets; S2: Preprocess the dataset and perform intent parsing to extract the original semantic features and template parsing features, and construct the constraint template; S3: Input the original semantic features and template parsing features into the model, and use a gated fusion mechanism to adaptively filter and fuse the features. This includes constructing intermediate representation vectors through three different combinations and calculating gate weights for dynamic weighted fusion to obtain high-quality semantic representations. S4: Abstract the target code syntax reduction and the slot mapping table built by the front end into a symbolic finite state machine. Calculate the legal candidate space in real time at each decoding time step, and construct a dynamic Logits mask operator to constrain the candidate word space to obtain the final code. S5: Use the trained model to process the test set and obtain the test results.

2. The method as described in claim 1, characterized in that, Step S2 specifically includes the following steps: S21: Perform stop word filtering and word segmentation on the input sequence; S22: Identify and extract normalizable tokens from natural language intents using an intent parser, and filter out non-normalizable tokens using a language keyword dictionary. S23: Replace the selected standardized tokens with placeholders to build a slot mapping table and obtain the standardized template parsing input; S24: Using a dual encoder architecture, the original semantic features are obtained by extracting features from the original natural language intent through the original input encoder, and the template parsing features are obtained by encoding the standardized template parsing input through the preprocessing parsing input encoder.

3. The method for automatically generating code based on constraint decoding according to claim 1, characterized in that... The process of adaptive feature selection and fusion using a gating fusion mechanism includes: In the model, the semantic vectors obtained from the two encoders are first efficiently fused. Specifically, this fusion layer models the input vector through three different feature combination methods, characterizing semantic relationships from three perspectives: concatenation, difference, and similarity. The calculation process is shown in the formula below: ; ; ; in, The semantic feature vector is the output of the encoder from the original input. Parse the semantic feature vector output by the encoder for the template; , and This represents a single-layer feedforward neural network with three independent parameters. " indicates a vector concatenation operation; This represents the vector difference operation, used to highlight the differences between two semantic vectors; This indicates element-wise multiplication and is used to characterize the similarity between vectors. Subsequently, the three intermediate representation vectors obtained , and The data is concatenated and fed into another feedforward neural network to obtain the final output of the fusion layer. ; For the sake of brevity in the subsequent formulas, we will Abbreviated as ; Original semantic features With initial fusion features Perform splicing and calculate its gating weight. ; Based on gating weight The features are dynamically weighted and fused, and then L2 normalization is performed to obtain high-quality semantic representations.

4. The method for automatically generating code based on constraint decoding according to claim 3, characterized in that... Calculate the gating weight The formula for dynamic weighted fusion is: ; Perform feature selection and fusion. Weighted fusion of features is performed based on gating weights, using the following formula: .

5. The method for automatically generating code based on constraint decoding according to claim 1, characterized in that, The code generation model consists of a cascaded encoding layer (composed of two CodeBERT pre-trained models), a semantic attention layer, a gating fusion layer, and a constraint decoding layer.

6. The method for automatically generating code based on constraint decoding according to claim 1, characterized in that, The process of decoding high-quality semantic representations using a constrained decoding operator based on a symbolic finite state machine includes: S41: Extract variable constraints, structural constraints, and parameter constraints from the constraint template, and combine them with the slot mapping table built in the front end to construct a symbolic finite state machine; S42: At the time step of the decoding phase, based on the current syntax state of the symbolic finite state machine, generate a binary legality indicator vector to identify the legality of each token in the candidate vocabulary; S43: Construct a dynamic feature mask matrix based on the binary valid indicator vector, and superimpose the dynamic feature mask matrix onto the original Logits output by the model to obtain the reconstructed logical value; S44: The reconstructed logic value is probabilistically normalized using the Softmax function, and the candidate path with the highest cumulative log probability is retained by the bundle search strategy until the end symbol output code is generated.

7. The method for automated code generation based on constraint decoding according to claim 6, characterized in that, The formula for constructing the dynamic feature mask matrix is ​​as follows: ; in, Indicates time step For the vocabulary list The mask value of each candidate word. Let the components of the binary valid indicator vector generated by the symbolic finite state machine be such that the first component is the first component of the binary valid indicator vector generated by the symbolic finite state machine. The candidate words are valid. ,otherwise .

8. The method for automatically generating code based on constraint decoding according to claim 7, characterized in that, The formulas for calculating the reconstructed logic value and the output probability distribution are as follows: ; in, Indicates the refactoring logic value. Represents the original unnormalized logical value. Indicates the first Step to generate the first candidate words The probability, This represents the generated historical sequence. This indicates a high-quality semantic representation. Indicates the size of the vocabulary.