A prompt word iterative optimization method and system based on a large language model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By constructing an automated closed loop within a large language model and utilizing diverse task input switching and multi-model evaluation optimization, the problems of low efficiency and overfitting in prompt word construction are solved, achieving efficient and automated prompt word optimization.

CN122242770APending Publication Date: 2026-06-19CHENGDU QIFENG SHUNSHI TECHNOLOGY CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHENGDU QIFENG SHUNSHI TECHNOLOGY CO LTD
Filing Date: 2026-05-13
Publication Date: 2026-06-19

Application Information

Patent Timeline

13 May 2026

Application

19 Jun 2026

Publication

CN122242770A

IPC: G06N5/04; G06F16/3329; G06F40/284; G06F16/35

AI Tagging

Application Domain

Natural language data processing Inference methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A system function comparison method, a page display method, and a computing device
CN122219968AError detection/correctionReverse engineering
Intelligent metal physical and chemical laboratory detection report automatic generation system
CN122242474ANatural language data processing Office automation
Diversity-preserved domain adaptation using text-to-image diffusion for 3D generative model
US12657667B2Image enhancement Image analysis
Electronic system and method for providing suggested revised electronic communications in real time based on a recipient communication style
US20260170241A1Semantic analysis Input/output processes for data processingPersonalizationTelecommunications
Method and apparatus with data description
US20260170242A1Digital data information retrieval Natural language data processingLinguistic modelData description

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing prompt word construction methods are inefficient, difficult to guarantee consistency and optimality, cannot achieve automated closed-loop optimization of prompt words, and suffer from overfitting problems.

Method used

By introducing reasoning, evaluation, and optimization modules into the large language model, an automated closed loop is formed. By utilizing diverse task input switching mechanisms and multi-model evaluation and optimization, a complete closed loop of generation-reasoning-evaluation-optimization is constructed, automatically improving the quality of prompt words.

Benefits of technology

It achieves automated closed-loop iterative optimization of prompt words, improving the quality of prompt words and their generalization ability across samples and scenarios, and avoiding overfitting.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122242770A_ABST

Patent Text Reader

Abstract

This invention discloses a method and system for iterative optimization of prompt words based on a large language model, relating to the field of artificial intelligence technology. The method includes: inputting the current inference input text into a large inference language model and outputting the current inference result; inputting the current task input and the current inference result into an evaluation large language model for iterative evaluation and inference, outputting an evaluation result, which includes an overall score; determining whether the overall score reaches a preset score threshold or whether the current iteration count has reached the maximum iteration count; if so, outputting the current prompt word as the optimized prompt word; otherwise, proceeding to the next step; combining the current prompt word, the current task input, the current inference result, and the current evaluation result to generate an optimized prompt input text; inputting the optimized prompt input text into the optimization large language model and outputting the improved prompt word for the next round of iteration. This invention achieves automated closed-loop iterative optimization of prompt words.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method and system for iterative optimization of prompt words based on a large language model. Background Technology

[0002] Large language models are a class of deep neural network language models trained on massive amounts of text data. Through pre-training and instruction fine-tuning, they have achieved powerful text understanding and generation capabilities. In recent years, large language models, represented by GPT-4, Claude, and LLaMA, have made significant progress in natural language processing, code generation, and knowledge question answering, becoming an important infrastructure for the application of artificial intelligence.

[0003] In practical engineering deployments, how to guide large language models to output high-quality, expected results is the core issue limiting their application effectiveness. Cue word engineering, as a technique that guides model behavior without modifying model weights, has received widespread attention from academia and industry.

[0004] Existing methods for constructing prompt words primarily rely on manual methods. These methods suffer from several drawbacks: designing high-quality prompt words manually is extremely difficult and inefficient, and consistency and optimality are hard to guarantee. In addition, there are automated prompt word optimization methods based on gradients or discrete search. However, these methods require access to model gradient information, are unsuitable for closed-source models, produce unreadable prompt words, have fixed optimization targets for specific tasks, are prone to overfitting, and cannot achieve automated closed-loop optimization. Large language model-assisted iterative prompt word optimization methods suffer from several drawbacks: the evaluation phase relies on a fixed evaluation set, making it difficult to prevent overfitting; only a single model is involved, leading to homogeneity bias; the optimization process is executed serially, resulting in low efficiency and failing to achieve automated closed-loop optimization. Finally, methods for automatic generation and tuning of large model prompt words based on fixed evaluation sets also have drawbacks. The key issues are: throughout the optimization process, evaluation and optimization are based on a fixed set of task inputs. Each iteration uses the same task inputs to measure the quality of the prompts and drive the optimization, resulting in the final prompts performing well only on specific task inputs and exhibiting a significant decrease in generalization ability on diverse task inputs, i.e., "prompt overfitting." No technical means are proposed to proactively introduce task input diversity within the iterative optimization loop to prevent overfitting, thus failing to achieve automated closed-loop optimization of prompts. Furthermore, the prompt optimization method based on candidate sets and iterative elimination has the following drawbacks: candidate generation relies on randomness and lacks an inherent understanding of the semantic quality of the prompts; the elimination mechanism introduced during the candidate set construction stage may prematurely discard potentially high-quality candidates; and the optimization process is coupled with task inference execution, making separation impossible and hindering automated closed-loop optimization of prompts.

[0005] Therefore, there is an urgent need for an automated and systematic method for self-iterative optimization of prompt words.

[0006] Therefore, a method and system for iterative optimization of prompt words based on large language models are developed to solve the above problems. Summary of the Invention

[0007] The present invention proposes a method and system for iterative optimization of prompt words based on large language models to solve the problems of low efficiency, difficulty in ensuring consistency and optimality, and inability to achieve automated closed-loop optimization of prompt words in existing prompt word construction methods.

[0008] The present invention achieves the above object through the following technical solutions: The present invention provides a method for iterative optimization of prompt words based on large language models, including: Obtain the current task input of the current target task in the task sample library. The current task input includes the natural language description text of the current target task. Input the current task input into the inference large language model to generate a matching current prompt word. Concatenate the current prompt word and the current task input in a preset format to obtain the current inference input text. Input the current inference input text into the inference large language model and output the current inference result; Input the current task input and the current inference result into the evaluation large language model for iterative evaluation and inference, and output the evaluation result. The evaluation result includes an overall score. Determine whether the overall score reaches the preset score threshold or whether the current iteration count reaches the maximum iteration count. If so, output the current prompt word as the optimized prompt word. If not, proceed to the next step; Combine the current prompt word, the current task input, the current inference result, and the current evaluation result to generate an optimized prompt input text. Input the optimized prompt input text into the optimization large language model and output the improved prompt word for the next round for the next round of iteration.

[0009] Furthermore, it also includes triggering a task switching mechanism according to the total number of valid samples S in the task sample library, including: When S≥50, adopt a strategy of switching target tasks round by round; When 20<S<50, adopt a fixed-round strategy of switching target tasks according to the preset number of iteration-round intervals; Obtain the target task complexity index of the target task. The target task complexity index includes a structural dimension index and a preset business performance requirement. The structural dimension index includes the number of built-in branch rules of the task, the number of business scenario classifications, the dimension of task input variables, and the number of output format and compliance constraints. The preset business performance requirement includes the core scenario accuracy rate and the cross-scenario generalization qualification rate. When the target task meets any one of the following conditions, adopt a strategy of switching target tasks in stages: Condition 1: The number of built-in branch rules in the task is ≥10 and the number of business scenario categories is ≥5; Condition 2: The number of input variables for the task is ≥8 and the number of output format and compliance constraints is ≥3; Condition 3: The preset business requirement is that the accuracy rate in the core scenario is ≥95% and the cross-scenario generalization pass rate is ≥90%. A phased switching strategy has higher priority than a strategy that switches the target task in rounds or a fixed number of rounds. The strategy of switching target tasks in stages includes: Phase 1: The first 50% of the total iterations will use the task with the highest scenario coverage and the greatest business weight from the task sample library as the target task after the switch. The tasks with the highest scenario coverage and the greatest business weight are those pre-set in the task sample library.

[0010] Phase 2: For the remaining 50% of iterations, implement a strategy of switching target tasks round by round.

[0011] Furthermore, the task switching mechanism triggered based on the total number S of valid samples in the task sample library also includes: When S≤20, the adaptive saturation detection strategy will be superimposed on the current strategy; When S>20, the adaptive saturation detection strategy will be enabled as an optional overlay rule by manual configuration. The adaptive saturation detection strategy includes: monitoring the improvement margin of the current round, which is the difference between the overall score of the previous round and the overall score of the current round; when the improvement margin of consecutive preset rounds is lower than the preset saturation threshold, it is determined that the optimization of the current task input has approached saturation, and the system immediately switches to the next task input; when the improvement margin of two consecutive rounds is less than the preset saturation threshold but has not reached the preset number of rounds, the current task input is used to continue iterating until the saturation judgment condition or the normal switching condition of the basic strategy is met, whichever triggers first.

[0012] Furthermore, when switching target task input, the new task input can be selected in any of the following ways: Random sampling method: Randomly select a different task input from the task sample library; Polling selection: Select samples sequentially in the preset order of the task sample library to ensure even traversal; Difficulty increment selection: Select from low to high difficulty tags according to the task's preset difficulty level; Domain-based selection: Prioritize inputs that differ most from the current task input in terms of theme and domain.

[0013] Furthermore, the input variable dimension is the number of mutually independent variable fields in a single task input; the built-in branching rule is the number of condition processing branches embedded in the task workflow definition; the business scenario classification is the number of different business scenario categories covered by the task sample library; and the output format and compliance constraints are the total number of format specifications and compliance requirements that the task output must meet.

[0014] Furthermore, after receiving the natural language description text of the initial target task, the natural language description text of the initial target task is input into the inference big language model, and the inference big language model automatically generates initial prompt words adapted to the target task. At the same time, the parameters are initialized: the size of the task sample library and the complexity index of the target task are read, the task switching mechanism is triggered according to the total number of valid samples S in the task sample library, and the maximum number of iterations is set. After initialization, the iteration round number is set, the initial iteration round number is 0, and the initial task input is selected from the task sample library.

[0015] Furthermore, the current prompts include the core execution rules of the target task, output format specifications, constraints, and precautions.

[0016] The core execution rules for the target task include the specific execution process of the task; the output specifications define the style of the model output, such as outputting according to the number of Markdown characters; the constraints refer to the boundaries in the model inference process, such as prohibiting the creation of fictitious time, location, task information, etc.; the notes contain supplementary explanations for this task, such as note: the input may contain ambiguous references, and inference should be performed strictly according to the input.

[0017] Furthermore, the reasoning large language model, the evaluation large language model, and the optimization large language model are different large language models from each other, and both the evaluation large language model and the optimization large language model are composed of several large language models. Therefore, the current task input and the current reasoning result are input into the evaluation large language model for iterative evaluation and reasoning, including: The inference results and task inputs are respectively fed into each model in the large language model evaluation. Each model outputs its evaluation result independently, and the overall score is obtained according to the following formula: Score_i=Σ(wk×ei(k).score); Where wk is the preset weight of the k-th model, the evaluation accuracy of each model on the validation set is used as the weight of each model, and ei(k).score is the evaluation result of the k-th model; The current prompt word, current task input, current inference result, and current evaluation result are combined to generate an optimized prompt input text. This optimized prompt input text is then input into the optimized large language model to output the improved prompt word for the next round, including: Input the current evaluation result, current prompt word, current inference result, and current task into each model in the optimization model group. Each model independently generates an optimized prompt word candidate version. Perform the following fusion operation on all prompt word candidate versions: The process of selecting the winning candidate involves performing rapid inference between each candidate prompt and the current task input inference big language model. Then, the corresponding evaluation big language model scores each inference result, selecting the candidate prompt with the highest score as the improved prompt for the next round. If scores are the same, the candidate prompt with the smaller version number is prioritized for the next round. For example, if the scores of the xth generation and the (x+1)th generation are the same, the xth generation prompt is selected as the candidate prompt. Comprehensive optimization and fusion operation: Input the complete text of each candidate prompt word into the inference big language model, and add the instruction "Comprehensively analyze the advantages and disadvantages of each version and generate an optimal prompt word version that integrates the advantages of each version and avoids the defects of each version". The inference big language model outputs the fused version as the improved prompt word for the next round. The strictest constraint merging and fusion operation: All constraints appearing in each candidate prompt word version are unioned, duplicates are removed, and then merged into a unified constraint clause. The task execution logic description with the highest confidence level (i.e., the most frequent occurrence) is retained as the optimal execution logic. The unified constraint clause and the optimal execution logic are used to construct the next round of improved prompt words. "Constraints" will appear as a separate field in the prompt words. Once the fusion is complete, the next iteration will continue based on the improved prompts for the next round.

[0018] The present invention also provides a system for the aforementioned iterative optimization method for prompt words based on a large language model, comprising: The reasoning module is used to obtain the current task input of the current target task from the task sample library. The current task input includes the natural language description text of the current target task. The current task input is input into the reasoning big language model to generate a matching current prompt word. The current prompt word and the current task input are concatenated according to a preset format to obtain the current reasoning input text. The current reasoning input text is input into the reasoning big language model to output the current reasoning result. The evaluation module is used to input the current task input and the current inference result into the evaluation big language model for iterative evaluation and inference, and output the evaluation result. The evaluation result includes an overall score. It determines whether the overall score reaches a preset score threshold or whether the current iteration number reaches the maximum iteration number. If so, the current prompt word is output as the optimal prompt word; otherwise, it proceeds to the next step. An optimization module, which is used to combine the current prompt, the current task input, the current inference result, and the current evaluation result to generate an optimized prompt input text, input the optimized prompt input text into an optimized large language model, and output an improved version of the prompt for the next round of iteration.

[0019] Furthermore, it also includes a task switching module, which is also used to trigger a task switching mechanism according to the total number of valid samples S in the task sample library, including: When S≥50, adopt a strategy of switching target tasks round by round; When 20<S<50, adopt a fixed-round switching target task strategy according to the preset number of iteration-round intervals; Obtain the target task complexity index of the target task. The target task complexity index includes a structure dimension index and a preset business performance requirement. The structure dimension index includes the number of built-in branch rules of the task, the number of business scenario classifications, the dimension of task input variables, and the number of output format and compliance constraints. The preset business performance requirement includes the core scenario accuracy rate and the cross-scenario generalization qualification rate. When the target task meets any one of the following conditions, adopt a phased switching target task strategy: Condition 1: The number of built-in branch rules of the task ≥ 10 and the number of business scenario classifications ≥ 5; Condition 2: The dimension of task input variables ≥ 8 and the number of output format and compliance constraints ≥ 3; Condition 3: The preset business requirement is that the core scenario accuracy rate ≥ 95% and the cross-scenario generalization qualification rate ≥ 90%; The priority of adopting the phased switching strategy is higher than that of adopting the strategy of switching target tasks round by round and the fixed-round switching target task strategy; Among them, the phased switching target task strategy includes: The first stage: In the first 50% of the total number of iteration rounds, fixedly use the task with the highest scenario coverage rate and the largest business weight in the task sample library as the switched target task; The second stage: In the remaining 50% of the iteration rounds, execute the strategy of switching target tasks round by round.

[0020] The beneficial effects of the present invention are as follows: A method and system for iteratively optimizing prompts based on a large language model proposed by the present invention realizes the automatic closed-loop iterative optimization of prompts. Utilizing the generation ability, execution ability, and evaluation ability of the large language model itself, a complete closed loop of "generation - inference - evaluation - optimization" is constructed, and the quality of prompts can be automatically improved without manual intervention. BRIEF DESCRIPTION OF THE DRAWINGS

[0021] Figure 1This is a flowchart of an iterative optimization method for prompt words based on a large language model, as described in Embodiment 1 of this application. Figure 2 This is a flowchart of an iterative optimization method for prompt words based on a large language model, as described in Embodiment 2 of this application. Detailed Implementation

[0022] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.

[0023] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.

[0024] The specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0025] Definitions of terms in this application: Large language models refer to pre-trained language models with a parameter scale typically exceeding several billion, possessing capabilities such as text understanding, text generation, logical reasoning, and following natural language instructions. In this invention, the large language model performing the core task is denoted as Model A.

[0026] Prompt words: Natural language text instructions input to the large language model to guide the model in generating expected output results for a specific task. In this invention, the prompt word is denoted as p, and the prompt words for different iterative versions are denoted as p0, p1, ..., pn.

[0027] Task input: Specific input data for a particular task, such as a question-and-answer exercise, a piece of text to be translated, or a piece of code to be analyzed, denoted as t. Multiple task inputs of the same task type constitute the task sample library, and the total number of valid samples in the library is denoted as S. The structure of the task input can be characterized by the following dimensions, which are used to evaluate task complexity (see the definition of "Task Complexity"): Input variable dimension: The number of mutually independent variable fields in a single task input, denoted as D_in; Built-in branching rules: The number of condition processing branches embedded in the task workflow definition, denoted as N_branch; Business scenario classification: The number of different business scenario categories covered by the task sample library, denoted as N_scene; Output format and compliance constraints: The total number of format specifications and compliance requirements that the task output must meet, denoted as N_constraint.

[0028] Task complexity: Used to evaluate the structural complexity of the target task during the system initialization phase. The value is derived from the structural dimension indicators of the task input mentioned above. The structural dimension indicators include D_in, N_branch, N_scene, N_constraint, as well as the preset business performance requirements. The business performance requirements include the core scene accuracy and the cross-scene generalization pass rate.

[0029] Reasoning result: The prompt word p is combined with the task input t and then input into the large language model. The resulting text output by the model is denoted as r.

[0030] Evaluation Results: The quality assessment of the inference result r by the large language model or other evaluation model is denoted as e, using a 100-point scale. It should include at least: an overall quality score, detailed deductions and root causes for each evaluation dimension, and targeted improvement suggestions for the deficiencies in this round. The score difference between two adjacent rounds of evaluation is denoted as Δei = ei − e(i − 1), used to determine the optimization gain of a single round; when Δei is lower than a preset threshold, it is determined that the optimization of the current task input is approaching saturation.

[0031] Prompt overfitting refers to the phenomenon where prompts are over-adjusted for specific task inputs due to long-term evaluation and optimization using fixed task inputs, resulting in a significant decline in performance on diverse, unseen task inputs.

[0032] N_switch: In a fixed-round switching strategy, the number of rounds in which the same task input is used consecutively (i.e., how many iterations are needed before switching the task input). The recommended value is Nmax / S, which must satisfy 1≤N_switch≤Nmax / S to ensure that each task input is used at least once in the entire process.

[0033] Example 1 like Figure 1 As shown, the present invention provides an iterative optimization method for prompt words based on a large language model, comprising the following steps: S101: Automatic Generation of Initial Prompt Words. The system receives the natural language description text of the target task and inputs it into the large language model A. Model A automatically generates initial prompt words p0 adapted to the target task. Initial prompt words p0 must at least include the core execution rules of the target task, output format specifications, constraints, and precautions. Simultaneously, the system completes initialization: it reads the task sample library size S and the target task complexity indicators, namely D_in, N_branch, N_scene, N_constraint, and business performance requirements. Based on the automatic matching rules in S106, it determines the task switching strategy and sets Nmax accordingly, which must satisfy the hard constraints in S104. After initialization, it sets the iteration round number i=0 and selects the first task input t0 from the task sample library.

[0034] S102: Single-sample inference input construction and execution. The current task input ti and the current prompt word pi are concatenated according to a preset format to generate a complete inference input text; the inference input text is input into the large language model A to perform inference and obtain the inference result ri for this round.

[0035] S103: Refined Evaluation of Multi-Dimensional Reasoning Quality. The task input *ti* and the reasoning result *ri* of this round are input into the large language model A. Model A performs a comprehensive quality evaluation of *ri* based on preset multi-dimensional evaluation criteria, outputting the quantitative evaluation result *ei* of this round. The evaluation result *ei* includes at least: (a) an overall quality quantitative score (out of 100); (b) detailed deduction points and root causes for each evaluation dimension; and (c) targeted improvement suggestions for the deficiencies of this round. When *i* ≥ 1, the improvement magnitude Δ *ei* = *ei* − *e*(*i* − 1) is calculated simultaneously.

[0036] S104: Multi-dimensional Iteration Termination Condition Judgment. Based on the current evaluation result ei, the following two-dimensional termination judgment is performed: First judgment condition: Whether the overall quality score in EI has reached the preset termination threshold (the default value is recommended to be 90 points, which can be adjusted according to task requirements); The second judgment condition is whether the current iteration round i has reached the preset maximum number of iterations Nmax.

[0037] If either condition is met, the current prompt word pi is output as the optimal prompt word p*, and the process terminates; if neither condition is met, proceed to step S105.

[0038] During the system initialization phase, the Nmax adaptive determination method is executed: first, set the initial Nmax, with a recommended initial value of 10, run the optimization process and monitor the overall score change curve in real time; if Δei is below 5 points for 3 to 5 consecutive iterations, the evaluation result is determined to be stable and the current Nmax is reasonable; if the score continues to rise when the initial Nmax is reached, increase Nmax by a step size of 5 and run again.

[0039] The Nmax hard constraint rule is as follows: depending on the selected switching strategy, Nmax must satisfy the following conditions: in strategy 1, Nmax ≥ S; in strategy 2, Nmax ≥ S × N_switch; in strategy 3, the constraint remains unchanged when superimposed; in strategy 4, Nmax ≥ S + 20, ensuring sufficient rounds in both phases. The above constraints are enforced when Nmax is set during system initialization and do not depend on dynamic adjustments during the iteration process.

[0040] S105: Single-round closed-loop prompt word targeted optimization. The current prompt word pi, task input ti, inference result ri, and quantitative evaluation result ei are combined, where ei includes the overall score and various evaluation details, to generate optimized prompt input text. The optimized prompt input text is input into the large language model A. Model A optimizes the current prompt word pi based on the root causes of defects and improvement suggestions in the evaluation results, generating the improved prompt word p(i+1) for the next round.

[0041] S106: Dynamic switching of task input to prevent overfitting. This step is a key technical feature that distinguishes this scenario from existing technologies. Based on the switching strategy selected during the system initialization phase in S101, it is determined whether the task input needs to be switched in this iteration, and the task input t(i+1) for the next round is determined.

[0042] The specific steps are as follows: I. Core Switching Logic

[0043] When the preset switching conditions are met, a new task input different from the current task input ti is selected from the task sample library and used as the task input t(i+1) for the next round; if the switching conditions are not met, the current task input ti is used in the next round. By dynamically adjusting and optimizing the task samples used, overfitting of prompt words to a single task input is avoided, and the cross-sample and cross-scene generalization ability of the optimized prompt words is greatly improved.

[0044] II. Switching Strategy.

[0045] Prerequisite parameters for the strategy (all determined during system initialization and not changed during iteration): Nmax: Total number of iterations throughout the entire process (determined by the S104 adaptive method, and must satisfy the S104 hard constraints). S: Total number of valid samples in the task sample library (read during the initialization phase). N_switch: Number of consecutive iterations for a single task in the fixed-round switching strategy (recommended value: Nmax / S, must satisfy 1 ≤ N_switch ≤ Nmax / S). M: Number of consecutive observation rounds for adaptive saturation detection (recommended value: 5, minimum value: 3). Δthres: Saturation determination threshold. When Δei < Δthres, it is determined that there is no significant benefit in this round of optimization (recommended value: 5 points).

[0046] Strategy 1: Round-by-round switching strategy, with the default generalization priority strategy.

[0047] Exclusive adaptation scenario: Large sample library, full-scenario generalization priority. The core goal is to maximize the traversal of all business scenarios and avoid prompt overfitting from the root cause.

[0048] Precise switching rule: After each complete iteration (steps S102 to S107 count as one iteration), immediately switch the task input once, strictly ensuring that different task inputs are used in each iteration. After the full traversal of the sample library, it is executed in a loop in the original order until Nmax is reached.

[0049] Automatic adaptation trigger condition: When S ≥ 50, the system automatically selects Strategy 1.

[0050] Strategy 2: Fixed-round switching strategy, depth-first guarantee strategy.

[0051] Exclusive adaptation scenario: Small sample library, depth optimization priority. Solve the problem of insufficient depth of single-task optimization caused by round-by-round switching when the sample size is small.

[0052] Precise switching rule: After using the same task input continuously for N_switch iterations, switch the task input once. Each task input is used at least once in the whole process.

[0053] Automatic adaptation trigger condition: When 20 < S < 50, the system automatically selects Strategy 2, and the recommended value of N_switch is Nmax / S.

[0054] Strategy 3: Adaptive saturation detection superposition rule, which can be superimposed on any stage of Strategy 1, 2, or 4.

[0055] Function positioning: Strategy 3 is not an independent basic strategy, but an early switching trigger mechanism that can be superimposed on any basic strategy to detect whether the current task input has reached optimization saturation, thus avoiding ineffective iterations. When S ≤ 20, the system automatically forces Strategy 3 to be superimposed on Strategy 2.

[0056] Precise switching rules: In addition to the normal switching conditions of the current basic strategy, monitor the trend of Δei in real time. When Δei is lower than Δthres for M consecutive iterations, it is determined that the optimization of the current task input has reached saturation. Immediately switch to the next task input without waiting for the normal switching cycle of the basic strategy. When Δei < Δthres for 2 consecutive rounds but not reaching M rounds, it is determined that further optimization is needed. Continue iterating using the current task input until the saturation judgment condition or the normal switching condition of the basic strategy is met, and take the first one.

[0057] Automatic adaptation trigger conditions: (a) When S≤20, the system will force the overlay of strategy 3 onto the current strategy (either strategy 2 or strategy 4); (b) When S>20, strategy 3 will be enabled manually as an optional overlay rule.

[0058] Strategy 4: Phased switching strategy.

[0059] Dedicated Adaptation Scenarios: The core objective of optimizing high-complexity tasks in a step-by-step manner, starting with in-depth optimization and then expanding the scope, is to first solidify the basic capabilities of prompt words in core business scenarios, and then expand the generalization boundaries across scenarios to solve the problem of diverging optimization directions caused by directly traversing the entire scenario for high-complexity tasks.

[0060] Precise switching rules: The entire optimization process is divided into two stages, and the system automatically performs stage switching: In the first phase, the first 50% of the total iterations, with a minimum of 20 iterations, will use representative core task inputs from the task sample library that have the highest scenario coverage and the greatest business weight. The inputs will be optimized in a concentrated manner to ensure that the prompt words reach the preset basic quality pass line in the core task scenarios. The recommended threshold is 80 points. In the second phase, for the remaining 50% of iterations, we strictly followed strategy one, fully traversing all task inputs in the sample library, covering edge and special scenarios, to test and improve the cross-scenario generalization ability of the prompt words.

[0061] Strategy 3 can be superimposed on both stages of Strategy 4 to further improve iteration efficiency.

[0062] Automatic adaptation trigger conditions: When the system detects that the target task meets any one of the following conditions during the initialization phase, automatic matching strategy four will be used, with higher priority than S-based strategy one / two matching: Condition 1: The number of built-in branch rules N_branch in the task is greater than or equal to 10 and the number of business scenario categories N_scene is greater than or equal to 5; Condition 2: The dimension of the task input variable D_in ≥ 8 and the number of output format and compliance constraints N_constraint ≥ 3; Condition 3: The preset business requirements are that the accuracy rate (Acc_core) in the core scenario is ≥95% and the pass rate (Pass_gen) in the cross-scenario generalization is ≥90% (this condition indicates that the business has extremely high quality requirements and needs to ensure that the core capabilities meet the standards in stages before expanding generalization).

[0063] Strategy Combination Rules and Priorities (1) Strategy 1 and Strategy 2 are mutually exclusive (automatically determined by the size of S) and cannot be used simultaneously.

[0064] (2) Strategy 3 can be superimposed on any stage of Strategy 1, Strategy 2 or Strategy 4, only providing an additional triggering mechanism for "early switching" without changing the normal switching logic of the basic strategy.

[0065] (3) Strategy 4 has a higher priority than Strategy 1 / 2; when Strategy 4 is triggered, the system automatically ignores the automatic matching results of Strategy 1 / 2 based only on S.

[0066] (4) The priority of manually configured strategy selection is always higher than the system's automatic matching result.

[0067] (5) The system supports real-time detection of scene changes during the iteration process; if the sample library size or task complexity changes, the system will automatically push a strategy rematch prompt, which will be executed after manual confirmation.

[0068] III. Rules for Selecting Task Input When switching task inputs, the selection method for the new task input (ensuring that the new input is different from the current input): Random sampling: Randomly select one different task input from the task sample library; Polling selection (recommended): Select samples sequentially in the preset order of the task sample library to ensure even traversal; Difficulty increment selection: Select according to preset difficulty labels from low to high, and guide prompts gradually adapt to more complex scenarios; Domain-specific selection: Prioritize inputs that differ most from the current task input in terms of topic and domain to maximize the generalization test effect.

[0069] After the entire sample library has been traversed, the selected rules are executed repeatedly until Nmax is reached.

[0070] S107: Iteration loop trigger. Based on the next round task input t(i+1) determined in step S106, update the iteration round number i to i+1, return to step S102, and enter the next round of iteration until the termination condition of step S104 is met.

[0071] Key innovations of this embodiment: Innovation Point 1: It integrates four capabilities—prompt word generation, task reasoning, quality assessment, and prompt word optimization—within the same large language model A, forming a fully automated closed loop without the need for manual intervention.

[0072] Innovation Point Two: An active task input switching mechanism (S106) is introduced into the iterative optimization loop. Task diversity is treated as an intrinsic constraint in the optimization process, ensuring that the evaluation signal for each iteration comes from diverse task scenarios. This eliminates the conditions for overfitting of prompt words at the algorithmic level. Compared to existing technologies, which use a fixed set of task inputs for each iteration and optimize for the optimal performance of fixed samples rather than the universality of diverse scenarios, the S106 mechanism of this invention is a specific technical means not disclosed in existing technologies.

[0073] Innovation Point 3: Four task switching strategies and one superposition rule are proposed. In particular, the S threshold (S<50 / S≤20) of Strategy 2 / 3 provides an objective triggering mechanism based on the sample database size; the saturation judgment of Strategy 3 (based on Δei being lower than Δthres for M consecutive rounds) transforms the switching decision from manual experience judgment to automatic execution by the algorithm; the triggering conditions of Strategy 4 (quantitative indicators such as N_branch, D_in, and Acc_core) regularize the identification of high-complexity tasks, improving the engineering operability and determinism of the method.

[0074] Example 2 The main generation and inference model (Model A) is responsible for generating initial prompts based on the task description and performing target task inference using the final optimized prompts. A general-purpose model with a large number of parameters (such as open-source models like Qwen) can be used.

[0075] The evaluation model group (models B1, B2, ..., Bk) consists of K large language models (K≥1; when K=1, it degenerates into a single evaluation model, losing the advantages of multiple models). Each model is responsible for independently evaluating the prompt word inference results and outputting its own evaluation result. Evaluation models and Model A should ideally be from different vendors, have different architectures, or be of different sizes to introduce diversity in evaluation perspectives. The weights of each model are determined through validation set evaluation during the system initialization phase (see S203).

[0076] The optimization model group (models C1, C2, ..., Cl) consists of L large language models (L≥1), responsible for iteratively optimizing the prompt words based on the comprehensive evaluation results of the evaluation model group. The optimization models can share the same group as the evaluation models, or different models can be selected independently.

[0077] like Figure 2 As shown, the present invention provides an iterative optimization method for prompt words based on a large language model, comprising the following steps: S201: System Initialization and Initial Prompt Generation. Input the target task description into model A, which generates the initial prompt p0. Simultaneously, the system performs weight initialization for the evaluation model group: Each model Bk in the evaluation model group evaluates a standardized validation sample set of known correct answers (a test set known to GroundTruth, recommended to be at least 10 samples). The initial weight wk of each model is used as the evaluation accuracy on the validation set (the degree of agreement between the evaluation result and GroundTruth, expressed as a percentage). All weights are normalized so that Σwk=1. If no validation set is available, the initial weights of each model are set to be equal (wk=1 / K). Set the iteration round number i=0, and set the maximum number of iterations Nmax (recommended initial value 20, adjustable according to task requirements).

[0078] S202: Inference Execution. Combine the current prompt word pi with the task input t, input it into model A to perform inference, and obtain the inference result ri.

[0079] S203: Multi-model cross-evaluation and aggregation of comprehensive results. The inference result ri and task input t are respectively input into each model B1, B2, ..., Bk in the evaluation model group. Each model independently outputs evaluation results ei(1), ei(2), ..., ei(k). Each evaluation result includes an overall score (out of 100) and evaluation details. The comprehensive evaluation result Ei is obtained by aggregation as follows: The overall score Score_i = Σ(wk × ei(k).score), where wk is the normalized weight initialized in step S201; if the weights need to be dynamically adjusted during the iteration process (optional), then after every N_update round, the accuracy of each model is re-verified using the validation set and wk is updated (the recommended value for N_update is 10); the overall evaluation details Feedback_i is the union of the evaluation details of each model, and the same defect points are sorted by frequency of occurrence, with higher frequency defects being reflected in the improvement suggestions first.

[0080] S204: Termination condition judgment. If Score_i reaches the preset termination threshold (recommended default value of 90 points) or the current iteration i reaches Nmax, then termination is determined, and the current pi is used as the optimal prompt word p*, and proceed to step S207; otherwise, continue to the next step.

[0081] S205: Multi-model cross-optimization. The comprehensive evaluation result Ei (including the comprehensive score Score_i and the ranked improvement suggestions Feedback_i), the current prompt word pi, the inference result ri, and the task input t are respectively input into each model C1, C2, ..., Cl in the optimization model group. Each model independently generates optimized prompt word candidate versions p_cand(1), p_cand(2), ..., p_cand(l).

[0082] S206: Optimize the result fusion and generate the next round of prompt words. Perform the following fusion operation on the L candidate prompt word versions (the three methods are listed in order of priority, with method (a) being used first): Method (a) Selection of winning candidates (recommended, lowest computational cost): Perform fast reasoning (single reasoning) with the task input t for L candidate prompts p_cand(1)...p_cand(l), and evaluate the model group to score each reasoning result. Select the candidate version with the highest score as p(i+1). When the scores are the same, the version with the smaller number (i.e., C1 is preferred over C1) is selected first, because the optimization model with the smaller number is usually larger or has a higher weight in this task.

[0083] Method (b) Model A Comprehensive Selection (Recommended, when L≥3 and computational resources are sufficient): Input the complete text of all L candidate versions into Model A, with the additional instruction "Comprehensively analyze the advantages and disadvantages of each version and generate an optimal prompt word version that integrates the advantages of each version and avoids the defects of each version", and Model A outputs the integrated version as p(i+1).

[0084] Method (c) Strictest constraint merging (suitable for scenarios with extremely high compliance requirements): Take the union of all constraints appearing in the L candidate versions, remove duplicates and merge them into a unified constraint clause; retain the task execution logic description with the highest confidence (most frequent occurrence); construct p(i+1) with the merged constraint + optimal execution logic.

[0085] After fusion is complete, increment i by 1, return to step S202, and continue to the next iteration.

[0086] S207: Final Inference Output. When the termination condition of step S204 is met, the pi at this point is the optimal prompt word p* (i.e., the pi of the current iteration when S204 terminates; this prompt word has been optimized based on the comprehensive evaluation results in S205-S206, but has not yet undergone a new round of inference). The optimal prompt word p* is combined with the actual business input of the target task, and model A performs the final task inference to output the final result.

[0087] Key innovations of this embodiment: Innovation Point 1: Multi-model cross-evaluation mechanism. Multiple heterogeneous large language models are introduced to independently evaluate the inference results. Weighted aggregation is performed based on weights determined by the accuracy of the validation set, which overcomes the subjective bias problem of single-model evaluation. The dynamic weight update mechanism further ensures the objectivity of the aggregation results.

[0088] Innovation Point 2: Multi-model cross-optimization mechanism. Multiple optimization models independently generate optimized candidate versions based on the comprehensive evaluation results. Three fusion methods (winning selection / model comprehensive optimization / most stringent constraint merging) are used to integrate optimization suggestions from multiple perspectives, introducing optimization diversity and avoiding local optimality caused by being trapped in a single model preference.

[0089] Innovation Point 3: Decoupling of the main inference model and the evaluation / optimization models. Model A focuses on task inference, and evaluation and optimization are carried out by other models, achieving professional division of functional roles; the weights of the evaluation models are objectively determined by the accuracy of the validation set instead of being assigned subjectively by humans.

[0090] Embodiment 3 This embodiment provides a system for the above-mentioned method for iteratively optimizing prompts based on a large language model, including: An inference module, which is used to obtain the current task input of the current target task in the task sample library. The current task input includes the natural language description text of the current target task. The current task input is input into the inference large language model to generate a matching current prompt. The current prompt and the current task input are concatenated in a preset format to obtain the current inference input text. The current inference input text is input into the inference large language model to output the current inference result; An evaluation module, which is used to input the current task input and the current inference result into the evaluation large language model for iterative evaluation inference, and output an evaluation result. The evaluation result includes an overall score. It is judged whether the overall score reaches the preset score threshold or whether the current iteration number reaches the maximum iteration number. If so, the current prompt is output as the most optimized prompt. If not, it proceeds to the next step; An optimization module, which is used to combine the current prompt, the current task input, the current inference result, and the current evaluation result to generate an optimized prompt input text. The optimized prompt input text is input into the optimization large language model to output the improved prompt for the next round and perform the next round of iteration.

[0091] The system further includes a task switching module, which is also used to trigger a task switching mechanism according to the total number of valid samples S in the task sample library, including: When S≥50, adopt a strategy of switching the target task round by round; When 20<S<50, adopt a fixed-round target task switching strategy according to the preset number of iteration-round intervals; Obtain the target task complexity metrics, which include structural dimension metrics and preset business performance requirements. Structural dimension metrics include the number of built-in branch rules, the number of business scenario categories, the dimension of task input variables, and the number of output format and compliance constraints. Preset business performance requirements include core scenario accuracy and cross-scenario generalization pass rate. When the target task meets any one of the following conditions, a phased target task switching strategy is adopted: Condition 1: The number of built-in branch rules in the task is ≥10 and the number of business scenario categories is ≥5; Condition 2: The number of input variables for the task is ≥8 and the number of output format and compliance constraints is ≥3; Condition 3: The preset business requirement is that the accuracy rate in the core scenario is ≥95% and the cross-scenario generalization pass rate is ≥90%. A phased switching strategy has higher priority than a strategy that switches the target task in rounds or a fixed number of rounds. The strategy of switching target tasks in stages includes: Phase 1: The first 50% of the total iterations will use the task with the highest scenario coverage and the greatest business weight from the task sample library as the target task after the switch. Phase 2: For the remaining 50% of iterations, implement a strategy of switching target tasks round by round.

[0092] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the technical principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for iterative optimization of prompt words based on a large language model, characterized in that, It includes: Obtain the current task input of the current target task in the task sample library. The current task input includes the natural language description text of the current target task. Input the current task input into the inference large language model to generate a matching current prompt. Concatenate the current prompt and the current task input in a preset format to obtain the current inference input text. Input the current inference input text into the inference large language model and output the current inference result; Input the current task input and the current inference result into the evaluation large language model for iterative evaluation and inference, and output the evaluation result. The evaluation result includes an overall score. Determine whether the overall score reaches the preset score threshold or whether the current iteration count reaches the maximum iteration count. If so, output the current prompt as the optimized prompt. If not, proceed to the next step; Combine the current prompt, the current task input, the current inference result, and the current evaluation result to generate an optimized prompt input text. Input the optimized prompt input text into the optimization large language model and output the improved prompt for the next round for the next round of iteration.

2. The prompt word iterative optimization method based on a large language model according to claim 1, characterized in that, It also includes triggering a task switching mechanism based on the total number of valid samples S in the task sample library, including: When S≥50, adopt a strategy of switching target tasks round by round; When 20<S<50, adopt a strategy of switching target tasks at fixed rounds according to the preset number of iteration round intervals; Obtain the target task complexity index of the target task. The target task complexity index includes a structural dimension index and a preset business performance requirement. The structural dimension index includes the number of built-in task branch rules, the number of business scenario classifications, the dimension of task input variables, and the number of output format and compliance constraints. The preset business performance requirement includes the core scenario accuracy rate and the cross-scenario generalization qualification rate. When the target task meets any of the following 1 condition, adopt a strategy of switching target tasks in stages: Condition 1: The number of built-in task branch rules ≥ 10 and the number of business scenario classifications ≥ 5; Condition 2: The dimension of task input variables ≥ 8 and the number of output format and compliance constraints ≥ 3; Condition 3: The preset business requirement is that the core scenario accuracy rate ≥ 95% and the cross-scenario generalization qualification rate ≥ 90%; The priority of adopting the strategy of switching target tasks in stages is higher than that of adopting the strategy of switching target tasks round by round and the strategy of switching target tasks at fixed rounds; Among them, the strategy of switching target tasks in stages includes: The first stage: The first 50% of the total iteration rounds, fixedly use the task with the highest scenario coverage rate and the largest business weight in the task sample library as the switched target task; The second stage: The remaining 50% of the iteration rounds, execute the strategy of switching target tasks round by round.

3. The method for iterative optimization of prompt words based on a large language model according to claim 2, characterized in that, Triggering the task switching mechanism based on the total number of valid samples S in the task sample library also includes: When S≤20, superimpose the adaptive saturation detection strategy on the current strategy; When S>20, use the adaptive saturation detection strategy as an optional superimposed rule, which is enabled by manual configuration; The adaptive saturation detection strategy includes: monitoring the improvement margin of the current round, which is the difference between the overall score of the previous round and the overall score of the current round; when the improvement margin of consecutive preset rounds is lower than the preset saturation threshold, it is determined that the optimization of the current task input has approached saturation, and the system immediately switches to the next task input; when the improvement margin of two consecutive rounds is less than the preset saturation threshold but has not reached the preset number of rounds, the current task input is used to continue iterating until the saturation judgment condition or the normal switching condition of the basic strategy is met, whichever triggers first.

4. The prompt word iterative optimization method based on a large language model according to claim 3, characterized in that, When switching target task inputs, the new task input can be selected using any of the following methods: Random sampling method: Randomly select a different task input from the task sample library; Polling selection: Select samples sequentially in the preset order of the task sample library to ensure even traversal; Difficulty increment selection: Select from low to high difficulty tags according to the task's preset difficulty level; Domain-based selection: Prioritize inputs that differ most from the current task input in terms of theme and domain.

5. The prompt word iterative optimization method based on a large language model according to claim 3, characterized in that, The input variable dimension is the number of independent variable fields in a single task input; the built-in branching rules are the number of condition processing branches embedded in the task workflow definition; the business scenario classification is the number of different business scenario categories covered by the task sample library; and the output format and compliance constraints are the total number of format specifications and compliance requirements that the task output must meet.

6. The method for iterative optimization of prompt words based on a large language model according to claim 3, characterized in that, After receiving the natural language description text of the initial target task, the natural language description text of the initial target task is input into the inference big language model. The inference big language model automatically generates initial prompt words that are adapted to the target task. At the same time, the parameters are initialized: read the size of the task sample library and the complexity index of the target task, trigger the task switching mechanism according to the total number of valid samples S in the task sample library, and set the maximum number of iterations. After initialization, the iteration round number is set, the initial iteration round number is 0, and the initial task input is selected from the task sample library.

7. The method for iterative optimization of prompt words based on a large language model according to claim 1, characterized in that, The current prompts include the core execution rules of the target task, output format specifications, constraints, and precautions.

8. The method for iterative optimization of prompt words based on a large language model according to claim 7, characterized in that, The inference large language model, evaluation large language model, and optimization large language model are different large language models from each other. Furthermore, both the evaluation and optimization large language models are composed of several large language models. Therefore, the current task input and the current inference result are input into the evaluation large language model for iterative evaluation and inference, including: The inference results and task inputs are respectively fed into each model in the large language model evaluation. Each model outputs its evaluation result independently, and the overall score is obtained according to the following formula: Score_i=Σ(wk×ei(k).score); Where wk is the preset weight of the k-th model, and the evaluation accuracy of each model on the validation set is used as the weight of each model, and ei(k).score is the evaluation result of the k-th model; The current prompt word, current task input, current inference result, and current evaluation result are combined to generate an optimized prompt input text. This optimized prompt input text is then input into the optimized large language model to output the improved prompt word for the next round, including: Input the current evaluation result, the current prompt, the current reasoning result, and the current task input into each model in the optimization model group respectively. Each model independently generates a candidate version of the optimized prompt. Perform the following fusion operations on all candidate versions of the prompt: Winning candidate selection fusion operation: Independently perform quick reasoning on each candidate version of the prompt with the large language model for reasoning on the current task input, and then have their respective corresponding evaluation large language models score each reasoning result. Select the candidate version of the prompt with the highest score as the improved version of the prompt for the next round; when the scores are the same, preferentially select the candidate version of the prompt corresponding to the one with a smaller version number as the improved version of the prompt for the next round; Comprehensive optimal selection fusion operation: Input the complete text of each candidate version of the prompt into the large language model for reasoning, append the instruction "Comprehensively analyze the advantages and disadvantages of each of the above versions, and generate an optimal version of the prompt that combines the advantages of each version and avoids the defects of each version", and have the large language model for reasoning output the fusion version as the improved version of the prompt for the next round; Strictest constraint merging fusion operation: Take the union of all constraint conditions that appear in each candidate version of the prompt, remove duplicates, and merge them into a unified constraint clause. Retain the task execution logic description with the highest confidence level, that is, the one that appears most frequently, as the optimal execution logic, and construct the improved version of the prompt for the next round with the unified constraint clause and the optimal execution logic; After the fusion is completed, continue the next round of iteration according to the improved version of the prompt for the next round.

9. A system for the iterative optimization method of prompt words based on a large language model as described in claims 1-8, characterized in that, Including: A reasoning module, which is used to obtain the current task input of the current target task in the task sample library. The current task input includes the natural language description text of the current target task. Input the current task input into the large language model for reasoning to generate a matching current prompt. Concatenate the current prompt and the current task input in a preset format to obtain the current reasoning input text. Input the current reasoning input text into the large language model for reasoning to output the current reasoning result; An evaluation module, which is used to input the current task input and the current reasoning result into the large language model for iterative evaluation and reasoning, and output an evaluation result. The evaluation result includes an overall score. Determine whether the overall score reaches the preset score threshold or whether the current number of iterations reaches the maximum number of iterations. If so, output the current prompt as the optimized prompt. If not, proceed to the next step; An optimization module, which is used to combine the current prompt, the current task input, the current reasoning result, and the current evaluation result to generate an optimized prompt input text. Input the optimized prompt input text into the large language model for optimization to output the improved version of the prompt for the next round and perform the next round of iteration.

10. The system for an iterative optimization method of prompt words based on a large language model according to claim 9, characterized in that, It further includes a task switching module, and the task switching module is also used to trigger a task switching mechanism according to the total number of valid samples S in the task sample library, including: When S≥50, adopt a strategy of switching the target task round by round; When 20<S<50, adopt a fixed-round target task switching strategy according to the preset number of iteration round intervals; Obtain the target task complexity metrics, which include structural dimension metrics and preset business performance requirements. Structural dimension metrics include the number of built-in branch rules, the number of business scenario categories, the dimension of task input variables, and the number of output format and compliance constraints. Preset business performance requirements include core scenario accuracy and cross-scenario generalization pass rate. When the target task meets any one of the following conditions, a phased target task switching strategy is adopted: Condition 1: The number of built-in branch rules in the task is ≥10 and the number of business scenario categories is ≥5; Condition 2: The number of input variables for the task is ≥8 and the number of output format and compliance constraints is ≥3; Condition 3: The preset business requirement is that the accuracy rate in the core scenario is ≥95% and the cross-scenario generalization pass rate is ≥90%. A phased switching strategy has higher priority than a strategy that switches the target task in rounds or a fixed number of rounds. The strategy of switching target tasks in stages includes: Phase 1: The first 50% of the total iterations will use the task with the highest scenario coverage and the greatest business weight from the task sample library as the target task after the switch. Phase 2: For the remaining 50% of iterations, implement a strategy of switching target tasks round by round.