Verifiable code data synthesis method and system based on atomization decomposition and reorganization

By atomizing and recombining code tasks, constructing an atomic element space and performing information theory optimization, logically coherent new element combinations are generated. This solves the problems of logical topological limitations and insufficient verification rigor in existing technologies, achieves high-quality code dataset generation, and improves the effectiveness of reinforcement learning.

CN122240500APending Publication Date: 2026-06-19INST OF SOFTWARE - CHINESE ACAD OF SCI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INST OF SOFTWARE - CHINESE ACAD OF SCI
Filing Date
2026-04-15
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies suffer from limitations in logical topology, difficulty in expanding training value, and insufficient rigor in verification during code generation, leading to data scarcity and reward cheating in reinforcement learning.

Method used

A method based on atomization decomposition and recombination is adopted. By extracting atomic elements and optimizing patterns from seed instances, an atomic element space is constructed. The utility index of information theory signal quantification is used for optimization to generate logically coherent new element combinations. Finally, a code dataset with high originality, high difficulty and rigorous verification capability is generated through adversarial solution space refinement.

Benefits of technology

It achieves original logic generation that transcends the boundaries of seed data distribution, improves the scalability of training value and the rigor of verification, solves the data bottleneck in reinforcement learning, and avoids model overfitting and reward cheating.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240500A_ABST
    Figure CN122240500A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for synthesizing verifiable code data based on atomization decomposition and recombination, belonging to the field of large-scale reinforcement learning data synthesis technology. To address the problems of code data scarcity and reward cheating caused by limited logical topology and insufficient verification rigor in existing data synthesis schemes, this invention constructs an atomic element space by extracting atomic elements and optimizing patterns from seed instances; logically recombines these atomic elements to obtain new element combinations; synthesizes tasks based on these new element combinations to generate code task descriptions; verifies the validity of these descriptions to obtain verifiable code synthesis data; and finally, refines the solution space to obtain test-enhanced code synthesis data. This invention enables original logical generation that transcends the distribution boundaries of seed data, significantly improves the discriminative power of the test set, ensures the rigor and robustness of the synthesized data verification, and effectively solves the data bottleneck in code generation tasks in reinforcement learning.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of large-scale reinforcement learning data synthesis technology, specifically relating to a method and system for synthesizing verifiable code data based on atomization decomposition and recombination. Background Technology

[0002] In the context of the deep integration of artificial intelligence and software engineering, the code generation and logical reasoning capabilities of Large Language Models (LLMs) have become core indicators for measuring the intelligence level of models. To further unleash the potential of these models, Reinforcement Learning with Verifiable Rewards (RLVR) has evolved into a key paradigm for improving model performance on complex programming tasks. RLVR leverages the inherent executable nature of code, providing objective and real-time reward signals to the model through deterministic unit tests, thereby optimizing the model's thought process during reinforcement learning.

[0003] Current RLVR frameworks heavily rely on large-scale, high-quality, and rigorously tested verifiable code datasets. To address the scarcity of high-quality data, existing technologies primarily employ automated synthesis schemes. One approach is heuristic seed expansion, which uses an LLM (Limited Language Model) as a rewriter to extend existing seed tasks. For example, Evol-Instruct (see WizardCoder: Empowering Code Large Language Models with Evol-Instruct) focuses on generating data through multiple recursive iterations, while KodCode (see A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding) emphasizes leveraging the transfer capabilities of LLMs in in-context learning, guiding the model to perform analogical expansion by showcasing code examples. Another approach is knowledge-based expansion, which guides synthesis by introducing external knowledge bases or structured signals. For example, OSS-Instruct and Package Instruct capture real code snippets from open-source platforms or analyze technical documentation, using a teacher model to reverse engineer them into instruction pairs to capture real library call scenarios.

[0004] However, existing technologies suffer from the following problems in practical applications. First, there are limitations in logical topology. Existing methods are constrained by the inherent limitations of heuristic extension paradigms. Whether it's semantic rewriting or rule-based constraint addition, they essentially maintain the logical combination structure of the original seed task, resulting in the logical topology of the synthesized task being confined to the distribution boundaries of the seed data, failing to generate novel algorithmic challenges. This lack of logical diversity in the dataset can cause the model to quickly fall into local optima and overfit during RLVR. Second, the training value is difficult to extend. Due to the limitations in logical difficulty and novelty of the synthesized data, it often fails to reach the model's capability boundaries, leading to premature reward saturation during training, and the training value failing to grow proportionally with the scale of synthesis. Finally, there is insufficient validation rigor. Most solutions rely on test sets generated by the model in a single instance, typically containing only common inputs while ignoring boundary conditions and hidden logical flaws. In RLVR scenarios, low-resolution testing can lead to reward cheating, where erroneous code generated by the model may unexpectedly pass due to incomplete test cases, thus misleading the optimization direction of reinforcement learning. Summary of the Invention

[0005] The purpose of this invention is to address the problems of data scarcity and reward cheating in reinforcement learning caused by the limited logical topology, difficulty in expanding training value, and insufficient verification rigor of existing synthetic data schemes. This invention proposes a verifiable code data synthesis method and system based on atomized decomposition and recombination. Through atomic element extraction and pattern optimization, controlled element recombination, and adversarial solution space refinement, this method achieves the effect of exceeding the distribution boundary of seed data and generating code datasets with high originality, high difficulty, and rigorous verification capabilities, thereby solving the data bottleneck in code generation tasks in reinforcement learning.

[0006] To achieve the above objectives, the present invention adopts the following technical solution.

[0007] A method for synthesizing verifiable code data based on atomization decomposition and recombination includes the following steps: Atomic elements are extracted and patterns are optimized from seed instances to construct an atomic element space; Logically rearrange the atomic elements in the atomic element space to obtain new element combinations; Based on the new element combination, task synthesis is performed to generate a code task description; The validity of the code task description is verified to obtain verifiable code synthesis data; The verifiable code synthesis data is refined in the despace to obtain test-enhanced code synthesis data.

[0008] Furthermore, atomic elements are extracted and patterns are optimized from seed instances to construct an atomic element space, including: The seed instance is decomposed according to the task mode, and multiple atomic elements constituting the code task are extracted. The utility index of the atomic element is quantified using information theory signals. The utility index includes a first index for characterizing element diversity and a second index for characterizing the correlation between elements. Based on the utility index, perform optimization operations on the current task mode and iterate and update until the first index converges or reaches the preset condition, and construct the atomic element space.

[0009] Furthermore, the utility index of the atomic element is quantified using information-theoretic signals, including: The Shannon entropy of the atomic element is calculated as the first indicator; The conditional mutual information of atomic elements under a given task is calculated as the second index.

[0010] Furthermore, based on the utility metric, optimization operations are performed on the current task mode and iteratively updated, including: The information theory signal is used to identify redundant elements, fuzzy elements, or highly discriminative elements among the atomic elements. The task mode corresponding to the atomic element is mapped and adjusted based on the identification results.

[0011] Furthermore, the atomic elements in the atomic element space are rearranged to obtain new element combinations, including: Select a core element as an anchor point from the atomic element space; Retrieve existing successful combination patterns from the element pool; Based on the anchor point and referring to the successful combination pattern, a logically coherent new combination of elements is generated using a probability distribution.

[0012] Furthermore, based on the new element combination, task synthesis is performed to generate a code task description, including: Provides structured templates that include problem descriptions, input / output formats, and constraints; The new elements are combined and injected into the structured template to generate a code task description with linguistic diversity.

[0013] Furthermore, the validity of the code task description is verified to obtain verifiable code synthesis data, including: Generate corresponding reference solutions and test case generators for the code task description; Run the test case generator to generate a test case set; The reference solution is run in an isolated sandbox environment to process the test case set. The validity of the code task description is judged based on the consistency of the running results, and verifiable code synthesis data is obtained by filtering.

[0014] Furthermore, the validity of the code task description is verified to obtain verifiable code synthesis data, which also includes: The reference scheme was logically verified using static code analysis tools. By combining the logical verification results and the execution verification results, verifiable code synthesis data is obtained.

[0015] Furthermore, the verifiable code synthesis data is subjected to despace refinement to obtain test-enhanced code synthesis data, including: Generate a set of near-error solutions with logical flaws for the verifiable code synthesis data; The near-error solutions are evaluated using the current test case set corresponding to the verifiable code synthesis data, and their failure rate is calculated. The test case generator is iteratively updated based on the failure rate until the generated test cases can distinguish between the reference solution and the near-fault solution, thus obtaining test-enhanced code synthesis data.

[0016] A verifiable code data synthesis system based on atomization decomposition and recombination, comprising: The element extraction module is used to extract atomic elements and optimize patterns from seed instances, and to construct an atomic element space. The element recombination module is used to logically recombine the atomic elements in the atomic element space to obtain new element combinations. The task synthesis module is used to synthesize tasks based on the new element combination and generate code task descriptions. The task verification module is used to verify the validity of the code task description and obtain verifiable code synthesis data. The data refinement module is used to perform despace refinement on the verifiable code synthesis data to obtain test-enhanced code synthesis data.

[0017] The present invention has achieved the following beneficial effects.

[0018] 1. This invention employs a controlled element recombination mechanism. By selecting core elements as anchor points and conditionally sampling and recombinizing other elements, it breaks through the logical topological limitations of traditional heuristic expansion, achieves original logical generation that transcends the distribution boundaries of seed data, and solves the problem of model overfitting caused by the single logic of the synthesis task.

[0019] 2. This invention introduces information-oriented pattern optimization, which uses Shannon entropy and conditional mutual information to quantify the diversity and redundancy of atomic elements. Based on feedback signals, it drives the large model to automatically iterate and update the task pattern, achieving rapid adaptation and high-quality atomic element space construction with extremely low manual input.

[0020] 3. This invention employs an adversarial solution space refinement technique, which actively generates logically similar near-fault solutions and iteratively enhances the rigor of test cases, significantly improving the discriminative power of the test set. This fundamentally eliminates reward cheating in reinforcement learning and ensures the rigor and robustness of synthetic data verification.

[0021] 4. This invention ensures the validity and solvability of code task descriptions by performing guided validity verification and sandbox environment filtering, enabling the training value of synthetic data to continuously increase with scale, effectively solving the scarcity and scalability bottleneck of verifiable code data in reinforcement learning training. Attached Figure Description

[0022] Figure 1 This is an overall flowchart of the verifiable code data synthesis method based on atomization decomposition and recombination in the embodiments; Figure 2 This is a schematic diagram illustrating element extraction and pattern optimization in the embodiment. Figure 3 This is a schematic diagram of adversarial solution space refinement in the embodiments; Figure 4 This is a block diagram of a verifiable code data synthesis system based on atomization decomposition and recombination in the embodiment. Figure 5 The performance graph of Qwen2.5-Coder-7B-Instruct on the LCB-v5 dataset for Pass@8 (%). Detailed Implementation

[0023] To make the various technical features, advantages, or effects of the present invention more apparent and understandable, detailed descriptions are provided below through embodiments.

[0024] This invention proposes a method for synthesizing verifiable code data based on atomized decomposition and recombination. The overall process is as follows: Figure 1 As shown, the specific steps include:

[0025] Step S1: Extract atomic elements and optimize the pattern of the seed instance to construct the atomic element space.

[0026] Specifically, this invention defines a code task as a combination of a series of atomic elements. The task mode is defined as: in, Indicates the task mode; Represents the i-th atomic element, element , Indicates the element name, This indicates a description of the element's content. This indicates the dimension in which an element can change.

[0027] By sampling a set of high-quality seed instances and decomposing them into constituent elements according to the task mode, semantic consistency within the element space can be ensured, which is beneficial for subsequent recombination and synthesis.

[0028] In an optional embodiment of the present invention, the atomic elements in the task mode can be specifically divided into core background, algorithm core, data structure, and constraints in the field of algorithm programming. The core background describes the real or simulated scenario in which the task occurs; the algorithm core describes the key algorithmic logic required to solve the problem, such as prefix sums and sliding windows; the data structure serves as the main carrier of task operations, including binary trees and discretized arrays; and the constraints include time limits, space limits, and specific input value ranges.

[0029] In an optional embodiment of the present invention, such as Figure 2 As shown, step S1 may include: Step S11: Decompose the seed instance according to the task mode and extract multiple atomic elements that constitute the code task.

[0030] Specifically, this process utilizes a large language model to structure seed instances according to a pre-defined task pattern. As an alternative, atomic elements can also be obtained from a pre-built code knowledge graph within the domain documentation. In this approach, atomic elements are represented as entities and relationships within the knowledge graph, where entities include application interfaces, algorithm nodes, or error types.

[0031] Step S12: Quantify the utility index of the atomic element using information theory signals. The utility index includes a first index for characterizing element diversity and a second index for characterizing the correlation between elements.

[0032] In an optional embodiment of the present invention, step S12 may include: Step S121: Calculate the Shannon entropy of the atomic element as the first index.

[0033] Step S122: Calculate the conditional mutual information of atomic elements under a given task as the second index.

[0034] Specifically, to support rapid task adaptation with minimal human expert input, this invention introduces information-driven pattern optimization. The process begins with a coarse initial task pattern generated by a large language model. Using a small number of seed instances, this invention extracts elements and quantifies their utility using two information-theoretic signals: Shannon entropy. Mutual information with conditions .

[0035] Shannon entropy This invention is used to quantify the diversity and discriminative power of elements. Specifically, it utilizes the all-MiniLM-L6-v2 embedding model to vectorize text elements and maps them to a discrete semantic label space through K-Means clustering. Then, it calculates the frequency of each cluster label in the dataset to estimate the probability distribution. Then, the Shannon entropy is calculated.

[0036] Conditional mutual information This is used to identify redundancy or complementarity between elements. Specifically, this invention statistically analyzes triples. The joint distribution and marginal distributions are determined, and then the conditional mutual information is calculated to determine the condition of a given set of known elements. Under the conditions, new elements For generating questions Additional information provided.

[0037] Step S13: Optimize the current task mode based on the utility index and iteratively update it until the first index converges or reaches a preset condition, and construct the atomic element space.

[0038] In an optional embodiment of the present invention, step S13 may include: Step S131: Use information theory signals to identify redundant elements, fuzzy elements, or highly discriminative elements among the atomic elements.

[0039] Step S132: Adjust the mapping of the task mode corresponding to the atomic element based on the recognition result.

[0040] Specifically, this process is information-driven pattern optimization. First, a rough initial task pattern is generated from a large language model. Guided by the utility indicator signals, the large language model generates a series of discrete optimization operations organized in a JSON structure. Then, these JSON structure optimization operations are applied automatically through code to the current task mode. Map to optimized version This process iterates continuously until the average elemental entropy is reached. Converging to a stable state, or the number of iterations t reaches a predefined threshold. .

[0041] In an optional embodiment of the present invention, the optimization operation The reorganization process can be selected from any of the following: addition, deletion, splitting, merging, or redefinition. If a knowledge graph-based construction scheme is adopted, the reorganization process can be achieved through graph random walks or subgraph mining, thereby generating task combinations in the semantic network.

[0042] In addition to using the aforementioned information theory signals, the model's actual loss function or training gain during the reinforcement learning phase can also be introduced as feedback signals to guide the optimization direction of element patterns.

[0043] In another embodiment of the present invention, the iterative optimization of the task mode can also use the actual loss function or training gain of the model in the reinforcement learning stage as a feedback signal, and guide the optimization direction of the element mode by introducing performance fluctuation data of model training.

[0044] Step S2: Reorganize the atomic elements in the atomic element space to obtain a new element combination.

[0045] Specifically, to avoid generating logically conflicting tasks, this invention employs a controlled element recombination mechanism. By selecting specific elements as logical base points, the model is guided to innovate based on existing successful patterns, thereby generating novel and coherent logical intersections.

[0046] In an optional embodiment of the present invention, step S2 may include: Step S21: Select a core element from the atomic element space as an anchor point.

[0047] Specifically, core elements The core element is typically selected as the core algorithm objective. The selection principle for the core element is that it should be able to carry a large amount of information and minimize its coupling with other elements, thereby providing a richer combination space for the free recombination of subsequent atomic elements to support the generation of more diverse and challenging task structures. During the definition and optimization of task patterns, the large language model autonomously determines the selection of the core element based on the pattern structure and information distribution characteristics. As a variation of the recombination strategy, in addition to using a single core element as an anchor point, multi-center joint recombination or random topological recombination can also be adopted to further explore more extreme combinatorial scenarios.

[0048] Step S22: Retrieve existing successful combination patterns from the element pool.

[0049] Specifically, the existing successful combination patterns retrieved from the element pool are represented as follows: These patterns are used to provide logical references for subsequent generation processes.

[0050] Step S23: Based on the anchor point and referring to the successful combination pattern, generate a logically coherent new element combination using the probability distribution.

[0051] Specifically, new element combinations The generation probability follows the following distribution: in, This indicates the generation of new element combinations; This represents a core element that serves as an anchor point and belongs to the task mode. ; This indicates the existing successful combination patterns retrieved from the element pool; This represents the probability distribution of a large language model under parameter θ. By utilizing existing successful combination patterns as guiding information, the model can be guided to generate logical intersections that are both novel and coherent.

[0052] Step S3: Based on the new element combination, perform task synthesis to generate a code task description.

[0053] In an optional embodiment of the present invention, step S3 may include: Step S31: Provide a structured template that includes a problem description, input / output format, and constraints.

[0054] Specifically, the structured template T specifies the necessary fields for the synthesis task. Template-driven approaches ensure that the generated task description maintains high fidelity for new element combinations while allowing for controlled language variations.

[0055] Step S32: Inject the new element combination into the structured template to generate a code task description with linguistic diversity.

[0056] Specifically, a complete task description Q is generated through a large language model, and the generation process satisfies the following: Where Q represents the final synthesized programming task description; This represents the generated combination of elements; T represents the structured template. This step ensures that the programming task description Q matches the combination of elements. Maintain high fidelity while allowing controlled language variations.

[0057] Step S4: Verify the validity of the code task description to obtain verifiable code synthesis data.

[0058] Specifically, to ensure the effectiveness of the task and provide reliable feedback signals, this invention employs an execution-guided verification mechanism. Through automated execution and verification, it filters out semantically ambiguous or unsolvable composite items. In addition to execution-based verification, the verification mechanism can be extended to multi-dimensional rigorous verification, such as by combining formal verification or static code analysis tools.

[0059] In an optional embodiment of the present invention, step S4 may include: Step S41: Generate a corresponding reference scheme and test case generator for the code task description.

[0060] Specifically, a reference solution (sol) and a test case generator (test) are generated synchronously for task Q using a large language model. The process is as follows: Where sol represents the reference scheme; test represents the test case generator; and Q represents the programming task description. This represents the probability distribution of a large language model under parameter θ.

[0061] Specifically, the generator produces a set of test cases: Where T represents the initial test set; This represents the i-th original input data; This represents the standard output corresponding to the i-th original input data; N represents the total number of test cases.

[0062] Step S43: Run the reference scheme in an isolated sandbox environment to process the test case set, judge the validity of the code task description based on the consistency of the running results, and filter to obtain verifiable code synthesis data.

[0063] Specifically, the validity verification function is defined as: Where Valid(D) represents the validation result function; Indicates the reference scheme for the input The results of the operation.

[0064] This function indicates that for all test cases in test case set T... The execution results of the reference scheme All equal to If the result is 1, the task is considered a valid programming task; otherwise, it is 0, indicating invalidity, thus filtering out semantically ambiguous or unsolvable composite terms.

[0065] In another optional embodiment of the present invention, step S4 may further include: Step S44: Use a static code analysis tool to perform logical verification on the reference scheme.

[0066] Step S45: Combine the logical verification results with the execution verification results to filter and obtain verifiable code synthesis data.

[0067] Step S5: Perform despace refinement on the verifiable code synthesis data to obtain test-enhanced code synthesis data.

[0068] Specifically, this invention introduces an adversarial solution space refinement process. By generating logically similar solutions that contain subtle errors, and using these as the target to optimize test cases, the robustness and challenge of the synthetic data are improved. This process effectively narrows the space of feasible solutions and enhances the ability of test cases to distinguish between correct and incorrect solutions.

[0069] In an optional embodiment of the present invention, such as Figure 3 As shown, step S5 may include: Step S51: Generate a set of near-error solutions with logical errors for the verifiable code synthesis data.

[0070] Specifically, embodiments of this invention introduce an adversarial solution space refinement procedure to improve test coverage and the robustness of synthetic data. This invention first prompts a large language model to generate a set of near-error solutions V, represented as: Where V represents the set of near-mistaken solutions; Let represent the k-th candidate solution. These candidate solutions appear logically reasonable, but contain subtle errors in key details.

[0071] Step S52: Evaluate the near-error solution using the current test case set corresponding to the verifiable code synthesis data, and calculate its failure rate.

[0072] Specifically, using the near error rate The failure level of near-mistake solutions in the current test case set T is calculated using the following formula: Where V represents the set of near-faulty solutions; |V| represents the number of near-faulty solutions; and T represents the current test case set. As an indicator function, when near-mistaken solutions The value is 1 if the test case fails to run under test case set T, and 0 otherwise.

[0073] Step S53: Iteratively update the test case generator based on the failure rate until the generated test cases can distinguish between the reference solution and the near-fault solution, thereby obtaining test-enhanced code synthesis data.

[0074] Specifically, this invention updates the test case generator iteratively. To maximize the near-error rate R, the generator is prompted to produce more discriminative test cases. When the generated test cases can accurately identify subtle errors in near-error solutions and ensure that the reference solution passes while all near-error solutions are intercepted, the solution space is refined, and the final test-enhanced code synthesis data is output.

[0075] This invention also provides a verifiable code data synthesis system based on atomization decomposition and recombination, such as... Figure 4 As shown, it includes: The element extraction module is used to extract atomic elements and optimize patterns from seed instances, and to construct an atomic element space. The element recombination module is used to logically recombine the atomic elements in the atomic element space to obtain new element combinations. The task synthesis module is used to synthesize tasks based on the new element combination and generate code task descriptions. The task verification module is used to verify the validity of the code task description and obtain verifiable code synthesis data. The data refinement module is used to perform despace refinement on the verifiable code synthesis data to obtain test-enhanced code synthesis data.

[0076] Method performance testing: 1. Significant improvement in the quality of multi-dimensional data: Based on experimental evaluation, the ADR method proposed in this invention achieves a breakthrough in four key performance indicators. By comparing the synthesis task with mainstream benchmarks such as KodCode and Educational Instruction (see Opencoder: The opencookbook for top-tier code large language models), the experimental data demonstrates the significant advantages of ADR.

[0077] Originality: This metric measures the performance of a synthetic task relative to a reference dataset. The degree of novelty. This is determined by calculating the novelty of the synthetic samples. With any sample in the reference set in the representation space Is the maximum cosine similarity in the data lower than a preset threshold? To determine this. The calculation formula is: in, This represents the total number of samples in the synthetic task set; This indicates an indicator function that takes the value 1 when the condition within the parentheses is true, and 0 otherwise.

[0078] Task difficulty: defined using a set of representative reference models. In the synthetic task set The average performance loss is used to measure the challenge the task poses to the model's capability boundaries. The formula is: in, This indicates that the reference model m is in the task set. Performance score.

[0079] Diversity: measures the uniformity of data distribution in a characterizing space, calculated by the nearest neighbor distance between synthetic samples. The coefficient of variation (the ratio of standard deviation to mean) is used to define the distribution; a higher value indicates a more uniform distribution. in, It represents the distance between the i-th sample and its nearest neighbor in the representation space.

[0080] Test Quality: Measures the effectiveness of test cases in verifying the correctness of tasks. It uses a "Model-as-a-Judge" approach to generate evaluation code and executes it to obtain continuous coverage scores. in, Indicates the task The test case coverage score.

[0081] Table 1: Data quality assessment results of the ADR dataset and the benchmark dataset across multiple dimensions. As shown in Table 1, ADR scores 4 to 16 times higher than existing methods in terms of originality, which fully demonstrates the superior ability of the atomization recombination paradigm to break the distribution of seed data and generate entirely new logical topologies. Meanwhile, the test quality jumps from around 30 to 81.36, significantly reducing the risk of reward cheating in the reinforcement learning process.

[0082] 2. The verification signal is more rigorous and reliable: By applying the Adversarial Solution Space Refinement (ASSR) technique to 5,000 tasks synthesized by ADR, the average number of test cases was increased from 14.75 to 34.78 (an increase of 135.8%), while the test quality was improved from 72.91 to 81.36 (an increase of 11.6%).

[0083] 3. Enhance intergenerational learning outcomes In practical large-scale reinforcement learning validation (RLVR) training, the data synthesized in this invention significantly expands the capability boundaries of the model. The following is a complete set of experimental data for different benchmark models and task domains.

[0084] Table 2: Performance Comparison of Pass@1 (%) of Multiple Benchmark Tests and Representative Base Models on Algorithm Tasks Table 3: Performance Comparison of ADR-based Model and Baseline Model in Pass@1 (%) of Tool Usage and Data Science Tasks Experiments have shown that: (1) Significant performance improvement: Previous synthetic data methods were limited by heuristic extensions of real-world data and often struggled to surpass the performance ceiling of the original data. Thanks to the element decomposition and recombination mechanism, ADR synthetic data achieved better overall performance. For example, as shown in Table 2, on the LCB-v5 leaderboard, ADR helped Qwen2.5 achieve a 9.20% performance leap, reaching an accuracy of 25.37%, while the benchmark synthetic data method could only maintain a level comparable to real test data.

[0085] (2) Generalizable performance improvement: ADR demonstrates strong cross-model adaptability, achieving stable performance enhancements on various base models. For example, as shown in Table 2, the ADR method brings a 7.57% performance improvement on the Qwen2.5-Coder-7B-Instruct model. On the Llama-3.1-8B-Instruct and Qwen3-8B models, the improvement is still significant, reaching 7.38% and 11.77% respectively, both exceeding the best baseline method.

[0086] (3) The essential enhancement of intrinsic reasoning ability: such as Figure 5 As shown, on the Pass@8 metric, which reflects the model's deep thinking ability, ADR demonstrates a significant performance gain that surpasses all benchmark methods. First, there's a leap in performance: ADR guides the model to achieve a jump from an initial 28.74% to 33.53% (a 4.79% improvement). In contrast, the most robust benchmark (TACO) only fluctuated from 28.74% to 29.34%, a net improvement of only 0.60%, while other benchmarks (such as KodCode and EducationalInstruct) even experienced performance decline or stagnation in the later stages of training. Second, there's an overwhelming lead throughout the entire training lifecycle: From the initial training phase (100 steps), ADR quickly widens the gap with other methods. Throughout the entire training lifecycle, ADR's performance curve remains at the top, and the shaded area representing the improvement continuously expands. This fully demonstrates that ADR effectively synthesizes highly challenging data at the edge of the model's capabilities through atomic recombination, not only improving the model's final score but also continuously stimulating the model's inherent logical reasoning potential during training.

[0087] (4) Strong cross-domain generalization ability: As shown in Table 3, ADR also brought stable performance gains in tool invocation (BigCodeBench) and data science (DS-1000) tasks (improving by 3.37% and 6.16% respectively), proving the universality of the reorganization framework.

[0088] Although the present invention has been disclosed above with reference to embodiments, it is not intended to limit the present invention. Appropriate modifications or equivalent substitutions made by those skilled in the art to the technical solutions of the present invention should be covered within the protection scope of the present invention, which is defined by the claims.

Claims

1. A method for synthesizing verifiable code data based on atomized decomposition and recombination, characterized in that, Includes the following steps: Atomic elements are extracted and patterns are optimized from seed instances to construct an atomic element space; Logically rearrange the atomic elements in the atomic element space to obtain new element combinations; Based on the new element combination, task synthesis is performed to generate a code task description; The validity of the code task description is verified to obtain verifiable code synthesis data; The verifiable code synthesis data is refined in the despace to obtain test-enhanced code synthesis data.

2. The method as described in claim 1, characterized in that, Atomic element extraction and pattern optimization are performed on seed instances to construct an atomic element space, including: The seed instance is decomposed according to the task mode, and multiple atomic elements constituting the code task are extracted. The utility index of the atomic element is quantified using information theory signals. The utility index includes a first index for characterizing element diversity and a second index for characterizing the correlation between elements. Based on the utility index, perform optimization operations on the current task mode and iterate and update until the first index converges or reaches the preset condition, and construct the atomic element space.

3. The method as described in claim 2, characterized in that, The utility index of the atomic element is quantified using information-theoretic signals, including: The Shannon entropy of the atomic element is calculated as the first indicator; The conditional mutual information of atomic elements under a given task is calculated as the second index.

4. The method as described in claim 2, characterized in that, Based on the aforementioned utility metrics, optimization operations are performed on the current task mode and iterative updates are made, including: The information theory signal is used to identify redundant elements, fuzzy elements, or highly discriminative elements among the atomic elements. The task mode corresponding to the atomic element is mapped and adjusted based on the identification results.

5. The method as described in claim 1, characterized in that, The atomic elements in the atomic element space are rearranged to obtain new element combinations, including: Select a core element as an anchor point from the atomic element space; Retrieve existing successful combination patterns from the element pool; Based on the anchor point and referring to the successful combination pattern, a logically coherent new combination of elements is generated using a probability distribution.

6. The method as described in claim 1, characterized in that, Based on the new element combination, task synthesis is performed to generate a code task description, including: Provides structured templates that include problem descriptions, input / output formats, and constraints; The new elements are combined and injected into the structured template to generate a code task description with linguistic diversity.

7. The method as described in claim 1, characterized in that, The validity of the code task description is verified to obtain verifiable code synthesis data, including: Generate corresponding reference solutions and test case generators for the code task description; Run the test case generator to generate a test case set; The reference solution is run in an isolated sandbox environment to process the test case set. The validity of the code task description is judged based on the consistency of the running results, and verifiable code synthesis data is obtained by filtering.

8. The method as described in claim 7, characterized in that, The validity of the code task description is verified to obtain verifiable code synthesis data, which also includes: The reference scheme was logically verified using static code analysis tools. By combining the logical verification results and the execution verification results, verifiable code synthesis data is obtained.

9. The method as described in claim 1, characterized in that, The verifiable code synthesis data is subjected to despace refinement to obtain test-enhanced code synthesis data, including: Generate a set of near-error solutions with logical flaws for the verifiable code synthesis data; The near-error solutions are evaluated using the current test case set corresponding to the verifiable code synthesis data, and their failure rate is calculated. The test case generator is iteratively updated based on the failure rate until the generated test cases can distinguish between the reference solution and the near-fault solution, thus obtaining test-enhanced code synthesis data.

10. A verifiable code data synthesis system based on atomized decomposition and recombination, characterized in that, include: The element extraction module is used to extract atomic elements and optimize patterns from seed instances, and to construct an atomic element space. The element recombination module is used to logically recombine the atomic elements in the atomic element space to obtain new element combinations. The task synthesis module is used to synthesize tasks based on the new element combination and generate code task descriptions. The task verification module is used to verify the validity of the code task description and obtain verifiable code synthesis data. The data refinement module is used to perform despace refinement on the verifiable code synthesis data to obtain test-enhanced code synthesis data.