A Method and System for Incremental Differential Modeling of Large-Scale Process Recipes Based on Tool Constraints

By adopting a tool-constrained incremental differential modeling method for large-scale process formulations, this paper solves the problem of difficulty in identifying parameter variation patterns in process formulation analysis. It realizes structured indexing and parameter differential statistics for JSON process formulations, improves the accuracy and interpretability of analysis results, and supports streaming output and context-constrained stable analysis.

CN122308280APending Publication Date: 2026-06-30WUHAN ZHIXIAN FUTURE TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WUHAN ZHIXIAN FUTURE TECHNOLOGY CO LTD
Filing Date
2026-03-16
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies lack specific analytical logic for the structural characteristics of formulations in process formulation analysis, making it difficult to adapt to the step-by-step changes in parameters. They have not built a dedicated set of analytical tools for process formulations, and their ability to identify parameter change patterns and perform differential statistics is insufficient, failing to meet the needs of quickly locating parameter changes and accurately identifying key differences.

Method used

The incremental differential modeling method for large-scale process recipes based on tool constraints generates node identity identifiers through path normalization and stable hash functions, constructs structural signature snapshots and sparse incremental differential trees, and combines multi-layer analysis tools and constrained inference control mechanisms to achieve structured indexing and parameter differential statistics of JSON process recipes, outputting structured JSON results.

Benefits of technology

It achieves a complete understanding of the structure of process formulations, reduces the context dependence of large language models, accurately extracts parameter change patterns, improves the accuracy and interpretability of analysis results, supports streaming output and context-constrained stable analysis, and reduces the cost of manual analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122308280A_ABST
    Figure CN122308280A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for incremental differential modeling of large-scale process recipes based on tool constraints. For JSON-formatted process recipe files, it performs structural parsing and path standardization on the recipe data, generating normalized paths and calculating stable node identifiers. It constructs structural signature snapshots according to process steps, generates leaf node hash sets based on node identifiers and parameter values, and aggregates these to obtain structural signature hash values. For parameters with the same node identifier, it performs cross-step aggregation, constructing a sparse incremental value sequence that only records the step points where values ​​change, and calculates the minimum and maximum values ​​of different value quantities, change step points, and numerical parameters based on this sequence. The analytical capabilities are encapsulated into callable tools that output structured JSON results. Driven by system prompts containing tool descriptions, the large model calls the tool in a closed loop of reasoning-action-observation to generate interpretable conclusions, supporting streaming output and context compression.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of process formulation data processing and intelligent analysis, specifically involving a method and system for incremental differential modeling of large-scale process formulations based on tool constraints. Background Technology

[0002] Process formulations are widely used in manufacturing, chemical, and semiconductor industries. They are stored in structured formats such as JSON, containing multi-dimensional process steps and massive amounts of parameter information. The parameter design and variation patterns of the formulation directly affect product quality, production yield, and process stability. Traditional process formulation analysis methods often rely on manual step-by-step parameter analysis and formulation comparison, resulting in low analysis efficiency, high dependence on human experience, and difficulty in accurately capturing the coupling variation patterns between parameters. Furthermore, when faced with complex nested JSON formulation files, incomplete parameter extraction and incomplete analysis of variation characteristics are prone to occur. At the same time, there is a lack of standardized analysis tools and interpretable analysis conclusions, making it difficult to meet the high-efficiency requirements of rapid process formulation optimization and root cause localization in modern production.

[0003] To address the shortcomings of traditional methods, some improved technologies have emerged in related fields. For example, patent CN119003737A discloses a method for attribution analysis of index fluctuations that integrates a large language model. This method determines the index to be attributed based on user queries, generates a semantic analysis request by combining text templates and attribution dimension metadata, generates and validates a DSL object through a large language model, maps the attribution analysis request to the target attribution analysis tool to obtain the results, and then summarizes the attribution results by the large language model. However, this method focuses primarily on the attribution analysis of index fluctuations, without designing dedicated analysis dimensions for the step-by-step parameter structure of process formulations. It lacks the ability to aggregate and differentially analyze formulation parameters step by step, and its use of the large language model is limited to semantic parsing and result summarization. It does not build a dedicated set of analysis tools based on the structured characteristics of process formulations, making it difficult to adapt to the needs of analyzing the step-by-step variation patterns of parameters in process formulations.

[0004] Another patent, CN121009870A, proposes a method for automatically compiling aircraft manufacturing process instructions guided by a large language model. Its core is to extract structured knowledge from unstructured process text, construct a graph-structured expert database based on a large language model, extract non-geometric manufacturing information by combining it with component object models, digitize geometric features through point cloud sampling, and then enhance the process instructions through semantic retrieval. While this method constructs a process knowledge graph, it focuses on the automated compilation of process instructions rather than the parameter variation analysis and difference mining of process recipes. It does not perform differential statistics and change pattern determination for the step-by-step sequence of recipe parameters. Furthermore, the construction process of the graph-structured expert database is complex and lacks adaptability to structured data such as process recipes, failing to accurately extract the variation characteristics and key differences of parameters in the recipe.

[0005] Patent CN121009900A discloses a method for unstructured text parsing and question answering based on a large language model. This method acquires unstructured text periodically, extracts key entities and data structures using a large language model, dynamically generates or updates JSON structure templates, converts the text into structured data stored in a time-series database, and then generates question-and-answer or trend analysis results based on user natural language queries. While primarily focused on unstructured text parsing and time-series data querying, and supporting dynamic generation of JSON structure templates, this method lacks a deep structural index for process formulation JSON files. It also lacks the ability to aggregate formulation parameters by path and perform differential analysis along steps. Furthermore, the application of the large language model is limited to text parsing and result generation, failing to encapsulate dedicated parameter analysis tools to meet the actual needs of process formulation analysis, making it difficult to deeply explore the changing patterns and coupling relationships of formulation parameters.

[0006] In summary, while existing patented technologies have improved the automation level of data processing and instruction compilation to some extent, they still have significant shortcomings in process formulation analysis scenarios: They lack dedicated analysis logic for the structured characteristics of formulations, making it difficult to adapt to the step-by-step changes in parameters; they have not built a dedicated set of analysis tools for process formulations, resulting in insufficient ability to identify parameter change patterns and perform differential statistics; and the invocation of large language models lacks a constraint mechanism that matches the formulation structure, leading to insufficient relevance and interpretability of analysis results, failing to fully meet the practical needs of quickly locating parameter changes and accurately identifying key differences in process formulation analysis. Therefore, there is an urgent need for a process formulation analysis technology that can adapt to JSON format process formulations, possess step-by-step parameter analysis capabilities, and rely on tool constraints to improve analysis accuracy, in order to address the deficiencies of existing technologies. Summary of the Invention

[0007] The purpose of this invention is to address the shortcomings of the prior art by proposing a method and system for incremental differential modeling of large-scale process recipes based on tool constraints. This method can establish a structural index and construct a parameter tree for JSON process recipes, complete parameter differential statistics, toolify analytical capabilities, and output structured JSON results. Driven by system prompts, the large language model calls the tool in an alternating reasoning and action mode to generate interpretable analytical conclusions, and supports streaming output as well as stable output when the context is limited.

[0008] The technical solution to achieve the purpose of this invention is:

[0009] A method for incremental differential modeling of large-scale process formulations based on tool constraints, the method comprising:

[0010] Step S1: Obtain a process recipe file containing multiple process steps and offset dimensions. The process recipe file is in JSON format. Perform path normalization processing on the process recipe file, convert the parameter name of each parameter item in the JSON data into a unique normalized path string, and generate a node identity identifier that is independent of the process steps based on the normalized path using a stable hash function, so as to ensure that the same parameter has a consistent identity identifier under different process steps.

[0011] Step S2: Construct a structural signature snapshot for each process step. For each process step: Construct a leaf node hash set based on the node identity and corresponding parameter values, where each leaf node hash value is generated by the node identity and parameter values; Sort the leaf node hash set according to the node identity to generate the structural signature hash value for the corresponding process step, where the structural signature hash value is the hash value obtained by performing aggregate hash calculation on the sorted leaf node hash set; Store the leaf node hash set and the structural signature hash value as a structural signature snapshot of the process step for fast consistency determination between subsequent steps.

[0012] Step S3: Construct a cross-step sparse incremental difference structure based on node identity identifiers. Traverse the data of each process step in the order of process steps and maintain the mapping relationship between node identity identifiers and the most recent record value. When the value of a node in the current process step is inconsistent with the value of the most recent record, only the node identity identifier corresponding to that process step and its value change information are recorded. In this way, a sparse incremental sequence containing only the process step points with value changes is constructed for each node, forming a structured incremental difference tree.

[0013] Step S4: Generate statistical summary information based on the structured incremental difference tree. For each node in the structured incremental difference tree, calculate the number of different values ​​based on its sparse incremental sequence, determine the process step points where the values ​​change, and calculate the minimum and maximum values ​​for numerical parameters. The statistical summary information is calculated based on the pre-constructed sparse incremental sequence without retracing the original JSON data, thereby reducing the computational complexity of the difference analysis.

[0014] Step S5: Construct a multi-layered analysis tool based on a structured parameter tree and differential index, and establish a tool capability registration and invocation constraint mechanism. Specifically, this includes: creating structured capability declaration objects for the analysis tools within the multi-layered analysis tool; the capability declaration objects include the tool name, functional semantic description, input field structure, output JSON schema, applicable analysis type, and dependent data source; registering the capability declaration objects as the capability boundary space that the large language model can invoke, and enabling the analysis tools to output structured JSON results based on the data structure formed in steps S1–S4; and constraining the tool invocation scope of the large language model based on the capability declaration objects.

[0015] Step S6: Construct a constrained reasoning control mechanism based on structural pattern constraints and tool capability constraints to drive the large language model to perform recipe difference analysis in a closed-loop manner of reasoning-action-observation. Specifically, this includes: injecting a tool list and corresponding capability declaration objects into the system prompt words, so that the large language model aligns the tool input field structure when generating tool call requests and ensures that the action input is legal and the tool output can be structured and parsed; limiting the boundary of the large language model's reasoning behavior, so that the analysis is based on the structured JSON results returned by the analysis tool and that no fictitious or unverified data is allowed, and calling path parsing tools to determine the normalized path or node identity before performing parameter analysis to generate interpretable analysis results.

[0016] Furthermore, the path normalization process includes: not expanding array indices when traversing JSON data, and uniformly abstracting array nodes into array placeholder markers path[] in the normalized path string.

[0017] Furthermore, the node identity identifier is generated from a hash input containing the normalized path string and parameter type information.

[0018] Furthermore, the fast consistency determination includes: when the structural signature hash values ​​of adjacent process steps are the same, it is determined that the parameter sets corresponding to the adjacent process steps are consistent.

[0019] Furthermore, the value change information recorded in the sparse incremental sequence includes the identifier of the changed process step, the node identity identifier, and the changed parameter value.

[0020] Furthermore, the process step points in the statistical summary information that have undergone value changes are the set of process step points in the sparse incremental sequence that have value records; the different number of values ​​is calculated based on the set of values ​​in the sparse incremental sequence.

[0021] Furthermore, the multilayer analysis tool includes:

[0022] The first type of analytical tool is used to generate value sequences, value change summaries, or trend summaries for a single parameter.

[0023] The second type of analytical tool is used to generate differential results or statistical information on hot spots of change between process steps;

[0024] The third type of analysis tool is used to generate global summary information or differential pattern recognition results based on the differential results of multiple process steps.

[0025] Furthermore, the structured capability declaration object also includes a data source identifier for the output structured JSON result, which is used to indicate that the output field corresponds to the normalized path, node identity identifier, sparse incremental sequence, and statistical summary information in steps S1–S4.

[0026] Furthermore, the restricted inference control mechanism includes: when the tool call request generated by the large language model does not satisfy the input field structure, refusing to execute the corresponding tool call and returning a structured error message to prompt it to regenerate a valid tool call request.

[0027] A large-scale process formulation incremental differential modeling system based on tool constraints, the system comprising:

[0028] The formula data access module is used to acquire and parse process formula data files, convert the raw structured data into a unified data representation form within the system, and identify and classify the formula data according to the process step dimension and offset dimension to generate an ordered dimension index structure.

[0029] The path normalization and node identity generation module is used to perform path normalization processing on the parameter names in the process formula data, generate a unique normalized path string, and generate a node identity identifier that is independent of the process steps based on the normalized path through a stable hash function.

[0030] The structure signature snapshot construction module is used to construct a leaf node hash set for each process step based on the node identity identifier and the corresponding parameter value, and to generate the structure signature hash value of the corresponding process step by sorting the leaf node hash set by the node identity identifier. The leaf node hash set and the structure signature hash value are stored as a structure signature snapshot of the process step for rapid consistency determination between steps.

[0031] The multi-step parameter incremental difference tree construction module is used to traverse the data of each process step in the order of process steps, maintain the mapping relationship between node identity and the most recent recorded value, and record the node identity and value change information of the corresponding process step when the node value changes, so as to construct a sparse incremental sequence containing only the process step points with value changes for each node, forming a structured incremental difference tree.

[0032] The parameter change statistics and difference analysis module is used to calculate the number of different values ​​based on the sparse incremental sequence of the structured incremental difference tree, determine the process step points where the values ​​change, and calculate the minimum and maximum values ​​of numerical parameters. The statistical calculation is completed based on the pre-constructed sparse incremental sequence without retracing the original structured data.

[0033] The analysis tool system module is used to construct a multi-layered analysis tool system and establish a tool capability registration and invocation constraint mechanism. It creates a structured capability declaration object for the analysis tool. The capability declaration object includes the tool name, functional semantic description, input field structure, output JSON schema, applicable analysis type, and dependent data source. The capability declaration object is registered as a capability boundary space that can be invoked by the large language model, so that the analysis tool outputs structured JSON results based on the data structure formed in steps S1–S4.

[0034] The prompt word and restricted reasoning control module is used to generate preset system prompt words and inject a tool list and corresponding capability declaration objects into the system prompt words. Based on structural pattern constraints and tool capability constraints, it drives the large language model to call the analysis tool in a closed loop of reasoning-action-observation. It limits the analysis to be generated based on the structured JSON results returned by the analysis tool and does not allow the creation of unverified data. Before performing parameter analysis, it calls the path parsing tool to determine the normalized path or node identity.

[0035] The output module is used to output the analysis results;

[0036] The context processing module is used to compress historical information when the context of a large language model is exceeded, and to generate the final analysis results based on the analysis tools.

[0037] Compared with the prior art, the present invention, employing the above technical solution, has the following beneficial effects:

[0038] (1) The incremental differential modeling method and system for large-scale process recipes based on tool constraints proposed in this invention establishes a structured index for JSON-formatted process recipe files through depth-first traversal. The index is only built for objects and array nodes, and array nodes are abstracted as array placeholders. The step number and array index are not expanded during traversal. At the same time, complete structural information such as path, depth, and type are recorded for each node. This not only realizes the complete structural understanding of the recipe file, but also provides low-token-consumption structural pattern information for large language models. It avoids large models directly processing the full recipe data, greatly reduces their dependence on context, and effectively solves the context over-limit problem when large models analyze the full recipe data.

[0039] (2) This invention constructs a parameter tree with process steps as the core analysis dimension, aggregates parameters of the same path to form a value sequence from step to value, and completes differential analysis based on the value sequence to accurately extract statistical features such as the number of different values ​​and the number of changing steps. It can also abstract and determine parameter change patterns such as stable, gradual change and sudden change based on this, and transform the differences in complex nested JSON recipe data into parameter-level and engineering-level interpretable change information, providing accurate and efficient parameter data support for semiconductor process yield analysis and defect root cause location, and greatly reducing the cost of manual step-by-step and parameter-by-parameter analysis.

[0040] (3) This invention encapsulates parameter analysis capabilities into an analysis tool that returns structured JSON results. By using system prompts to constrain the large language model to call the tool in ReAct mode, the large model illusion problem is avoided from the root. At the same time, the analysis process has a clear thinking, action, and observation chain, which greatly improves the accuracy and interpretability of process formulation analysis results.

[0041] (4) This invention supports streaming output of analysis process and conclusion, and designs an automatic compression processing mechanism for context overrun, which effectively solves the reasoning failure problem caused by context overflow of large models, ensures the system operation stability in complex process formula analysis scenarios, and the modular architecture design makes the solution have good engineering implementation. Attached Figure Description

[0042] Figure 1 This is a flowchart of the incremental differential modeling method for large-scale process formulations based on tool constraints proposed in this invention.

[0043] Figure 2 This is a schematic diagram of the incremental differential modeling system for large-scale process formulations based on tool constraints proposed in this invention.

[0044] Figure 3 This is a schematic diagram of the streaming output method in an embodiment of the present invention;

[0045] Figure 4 This is a schematic diagram of context overload compression processing in an embodiment of the present invention. Detailed Implementation

[0046] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0047] A method for incremental differential modeling of large-scale process formulations based on tool constraints, the method comprising:

[0048] Step S1: Obtain a process recipe file containing multiple process steps and offset dimensions. The process recipe file is in JSON format. Perform path normalization processing on the process recipe file, convert the parameter name of each parameter item in the JSON data into a unique normalized path string, and generate a node identity identifier that is independent of the process steps based on the normalized path using a stable hash function, so as to ensure that the same parameter has a consistent identity identifier under different process steps.

[0049] Step S2: Construct a structural signature snapshot for each process step. For each process step: Construct a leaf node hash set based on the node identity and corresponding parameter values, where each leaf node hash value is generated by the node identity and parameter values; Sort the leaf node hash set according to the node identity to generate the structural signature hash value for the corresponding process step, where the structural signature hash value is the hash value obtained by performing aggregate hash calculation on the sorted leaf node hash set; Store the leaf node hash set and the structural signature hash value as a structural signature snapshot of the process step for fast consistency determination between subsequent steps.

[0050] Step S3: Construct a cross-step sparse incremental difference structure based on node identity identifiers. Traverse the data of each process step in the order of process steps and maintain the mapping relationship between node identity identifiers and the most recent record value. When the value of a node in the current process step is inconsistent with the value of the most recent record, only the node identity identifier corresponding to that process step and its value change information are recorded. In this way, a sparse incremental sequence containing only the process step points with value changes is constructed for each node, forming a structured incremental difference tree.

[0051] Step S4: Generate statistical summary information based on the structured incremental difference tree. For each node in the structured incremental difference tree, calculate the number of different values ​​based on its sparse incremental sequence, determine the process step points where the values ​​change, and calculate the minimum and maximum values ​​for numerical parameters. The statistical summary information is calculated based on the pre-constructed sparse incremental sequence without retracing the original JSON data, thereby reducing the computational complexity of the difference analysis.

[0052] Step S5: Construct a multi-layered analysis tool based on a structured parameter tree and differential index, and establish a tool capability registration and invocation constraint mechanism. Specifically, this includes: creating structured capability declaration objects for the analysis tools within the multi-layered analysis tool; the capability declaration objects include the tool name, functional semantic description, input field structure, output JSON schema, applicable analysis type, and dependent data source; registering the capability declaration objects as the capability boundary space that the large language model can invoke, and enabling the analysis tools to output structured JSON results based on the data structure formed in steps S1–S4; and constraining the tool invocation scope of the large language model based on the capability declaration objects.

[0053] Step S6: Construct a constrained reasoning control mechanism based on structural pattern constraints and tool capability constraints to drive the large language model to perform recipe difference analysis in a closed-loop manner of reasoning-action-observation. Specifically, this includes: injecting a tool list and corresponding capability declaration objects into the system prompt words, so that the large language model aligns the tool input field structure when generating tool call requests and ensures that the action input is legal and the tool output can be structured and parsed; limiting the boundary of the large language model's reasoning behavior, so that the analysis is based on the structured JSON results returned by the analysis tool and that no fictitious or unverified data is allowed, and calling path parsing tools to determine the normalized path or node identity before performing parameter analysis to generate interpretable analysis results.

[0054] Furthermore, the path normalization process includes: not expanding array indices when traversing JSON data, and uniformly abstracting array nodes into array placeholder markers path[] in the normalized path string.

[0055] Furthermore, the node identity identifier is generated from a hash input containing the normalized path string and parameter type information.

[0056] Furthermore, the fast consistency determination includes: when the structural signature hash values ​​of adjacent process steps are the same, it is determined that the parameter sets corresponding to the adjacent process steps are consistent.

[0057] Furthermore, the value change information recorded in the sparse incremental sequence includes the identifier of the changed process step, the node identity identifier, and the changed parameter value.

[0058] Furthermore, the process step points in the statistical summary information that have undergone value changes are the set of process step points in the sparse incremental sequence that have value records; the different number of values ​​is calculated based on the set of values ​​in the sparse incremental sequence.

[0059] Furthermore, the multilayer analysis tool includes:

[0060] The first type of analytical tool is used to generate value sequences, value change summaries, or trend summaries for a single parameter.

[0061] The second type of analytical tool is used to generate differential results or statistical information on hot spots of change between process steps;

[0062] The third type of analysis tool is used to generate global summary information or differential pattern recognition results based on the differential results of multiple process steps.

[0063] Furthermore, the structured capability declaration object also includes a data source identifier for the output structured JSON result, which is used to indicate that the output field corresponds to the normalized path, node identity identifier, sparse incremental sequence, and statistical summary information in steps S1–S4.

[0064] Furthermore, the restricted inference control mechanism includes: when the tool call request generated by the large language model does not satisfy the input field structure, refusing to execute the corresponding tool call and returning a structured error message to prompt it to regenerate a valid tool call request.

[0065] A large-scale process formulation incremental differential modeling system based on tool constraints, the system comprising:

[0066] The formula data access module is used to acquire and parse process formula data files, convert the raw structured data into a unified data representation form within the system, and identify and classify the formula data according to the process step dimension and offset dimension to generate an ordered dimension index structure.

[0067] The path normalization and node identity generation module is used to perform path normalization processing on the parameter names in the process formula data, generate a unique normalized path string, and generate a node identity identifier that is independent of the process steps based on the normalized path through a stable hash function.

[0068] The structure signature snapshot construction module is used to construct a leaf node hash set for each process step based on the node identity identifier and the corresponding parameter value, and to generate the structure signature hash value of the corresponding process step by sorting the leaf node hash set by the node identity identifier. The leaf node hash set and the structure signature hash value are stored as a structure signature snapshot of the process step for rapid consistency determination between steps.

[0069] The multi-step parameter incremental difference tree construction module is used to traverse the data of each process step in the order of process steps, maintain the mapping relationship between node identity and the most recent recorded value, and record the node identity and value change information of the corresponding process step when the node value changes, so as to construct a sparse incremental sequence containing only the process step points with value changes for each node, forming a structured incremental difference tree.

[0070] The parameter change statistics and difference analysis module is used to calculate the number of different values ​​based on the sparse incremental sequence of the structured incremental difference tree, determine the process step points where the values ​​change, and calculate the minimum and maximum values ​​of numerical parameters. The statistical calculation is completed based on the pre-constructed sparse incremental sequence without retracing the original structured data.

[0071] The analysis tool system module is used to construct a multi-layered analysis tool system and establish a tool capability registration and invocation constraint mechanism. It creates a structured capability declaration object for the analysis tool. The capability declaration object includes the tool name, functional semantic description, input field structure, output JSON schema, applicable analysis type, and dependent data source. The capability declaration object is registered as a capability boundary space that can be invoked by the large language model, so that the analysis tool outputs structured JSON results based on the data structure formed in steps S1–S4.

[0072] The prompt word and restricted reasoning control module is used to generate preset system prompt words and inject a tool list and corresponding capability declaration objects into the system prompt words. Based on structural pattern constraints and tool capability constraints, it drives the large language model to call the analysis tool in a closed loop of reasoning-action-observation. It limits the analysis to be generated based on the structured JSON results returned by the analysis tool and does not allow the creation of unverified data. Before performing parameter analysis, it calls the path parsing tool to determine the normalized path or node identity.

[0073] The output module is used to output the analysis results;

[0074] The context processing module is used to compress historical information when the context of a large language model is exceeded, and to generate the final analysis results based on the analysis tools.

[0075] The system prompts and user prompts in this invention are used to drive a large language model to complete process formulation analysis under tool constraints. The content of the prompts can be adjusted or replaced according to the formulation structure, tool set, and application scenario, and does not constitute a limitation on the scope of protection of this invention. The prompts preferably include formulation structure pattern information and tool description information, and can be injected by the system at runtime using placeholders, such as injecting current formulation structure pattern information, available tool set information, dialogue history, and user questions.

[0076] The prompt words are preferably used to constrain the large language model to perform analysis in an alternating process of reasoning and action. This process includes at least four stages: thinking, action, observation, and answer. The action stage triggers tool invocation, the observation stage corresponds to the structured results returned by the tool, and the answer stage generates the final conclusion based on the observation stage results. To improve controllability and verifiability, the prompt words preferably specify evidence constraint rules, meaning the final conclusion is generated solely based on data returned by the tool or confirmed observations. When the required information is neither in the obtained tool results nor obtainable from available tools, further inference is stopped, and the user is prompted to supplement the missing data.

[0077] To reduce ambiguity and ensure project reproducibility, prompts preferably specify key-value fidelity rules: when referencing recipe fields, parameter names, key names, or values, retain their original string form and case from the original recipe data, without translating or rewriting key names or values; if explanation is needed, additional descriptions can be added while retaining the original key names. To avoid looping tool calls and invalid queries, prompts preferably specify stopping conditions and anti-loop rules: when all relevant data for the target path has been fully acquired and the remaining tasks are only filtering, sorting, judging, or summarizing, the tool is not called again and the final conclusion is output directly; when it is detected that the same tool and the same input parameters are about to be repeated, the tool call is stopped and the conclusion is output or missing information is indicated.

[0078] To facilitate understanding of the implementation of the above methods and systems, the following examples illustrate the construction of the structure index, the parameter tree difference statistics, the tool invocation, and the inference process.

[0079] Example 1: Structured parsing and indexing of process recipe JSON:

[0080] (1) Input data format.

[0081] In this embodiment, the process recipe is stored in JSON format, and its typical structure may include dimensions such as step and offset, for example:

[0082] {

[0083] "step": {

[0084] "1": [ ...

[0086] {

[0087] "name": "Heater_Position",

[0088] "valueType": "string",

[0089] value: "Process"

[0090] },

[0091] {

[0092] "name": "Gap:A",

[0093] "valueType": "double",

[0094] value: "9.3"

[0095] }, ...

[0097] ],

[0098] "2": [ ...

[0100] {

[0101] "name": "Heater_Position",

[0102] "valueType": "string",

[0103] value: "Process"

[0104] },

[0105] {

[0106] "name": "Gap:A",

[0107] "valueType": "double",

[0108] value: "9.3"

[0109] }, ...

[0111] ],

[0112] },

[0113] "offset": { ...

[0115] }

[0116] }

[0117] Where: step represents the process step dimension; each step contains multiple process parameter nodes; parameter nodes can be numerical values, strings, or nested objects.

[0118] (2) JSON structure indexing method.

[0119] The system performs a depth-first search (DFS) traversal on the process recipe JSON. During the traversal:

[0120] First, a structural index is established for object nodes and array nodes; second, the array or step number is not expanded into a specific index value, but uniformly abstracted as a structural path; finally, the following information is recorded for each structural node:

[0121] Node path, node type (object / array / value), hierarchy depth, and number of child nodes.

[0122] For example, for the path: Root → step → 1 → H2_1B RampRate, the structure index only retains H2_1BRampRate and not the specific step number.

[0123] This significantly reduces the number of tokens required for structural description, preserves the complete semantic structure of process parameters, and provides a stable schema for subsequent large language model analysis.

[0124] Furthermore, to ensure consistency with the structural signature snapshot in step S2, in this embodiment, the system can extract all parameter leaf nodes for any process step, generate a leaf node hash value for each leaf node based on "node identity identifier and parameter value", and form a leaf node hash set; then sort the leaf node hash set according to the node identity identifier, perform aggregate hash calculation on the sorted leaf node hash set to obtain the structural signature hash value of the process step, and store the leaf node hash set and the structural signature hash value as a structural signature snapshot for fast consistency determination between steps.

[0125] Example 2: Modeling the step dimension parameter change based on Tree Delta.

[0126] The system uses the step as the analysis dimension, constructing a parameter tree node for each step and aggregating parameters along the same path to form a step→value sequence. Each parameter node includes at least the parameter path, the step→value sequence, and difference statistics (delta_summary). An example data structure is shown below:

[0127] {

[0128] "path": "H2_1B RampRate",

[0129] "series": [(1, 0.0), (2, 0.0)]

[0130] }

[0131] Tree Delta analysis method:

[0132] Based on this, the system performs difference analysis on the values ​​of the same path parameter across multiple steps, calculating the number of different values ​​(distinct_count), the number of steps where changes occurred (change_points), the minimum value (min), and the maximum value (max). Example analysis results are as follows:

[0133] {

[0134] "path": "HF_power:A",

[0135] "distinct_count": 6,

[0136] "change_points": [1, 12, 14, 15, 16, 17, 18],

[0137] "min": 0.0,

[0138] "max": 1400.0

[0139] }

[0140] Through the above method, the system transforms complex JSON differences into parameter-level semantic change features, enabling engineers to directly locate which parameters change, where, and how frequently. This provides structured evidence fields for subsequent tool-level hotspot identification, trend summarization, and large language model interpretation. Preferably, to be consistent with steps S3-S4, the system can record only the steps where the value of the same node identifier changes, forming a sparse incremental value sequence. The statistical summary information such as distinct_count, change_points, min, and max can be directly calculated based on this sparse incremental value sequence, thus avoiding re-traversing the original JSON data.

[0141] Example 3: Step-to-Step Difference Analysis.

[0142] The system supports differential analysis on any two steps, preferably by first aligning the parameter sets of the two steps by path, then comparing the corresponding parameter values ​​and generating difference records. In some embodiments, each difference record can also be appended with a stability identifier (e.g., node_id) for auditing and playback. The output can be structured JSON; an example of the output is shown below.

[0143] {

[0144] "path": "Step_Name",

[0145] "from": "Pump 1",

[0146] "to": "Pump 2"

[0147] }

[0148] This tool is used to accurately locate process changes between steps and can be directly used as evidence input for subsequent large language model interpretation, thereby reducing ambiguity caused by relying solely on natural language descriptions.

[0149] Example 4: Parameter trend analysis and time series abstraction.

[0150] Based on the `series`, `distinct_count`, and `change_points` fields output in Example 2, the system summarizes the changing trends of parameters along the `step` dimension and can abstract change patterns such as stable, gradual, or abrupt changes. The output can be structured JSON; an example output is provided below.

[0151] {

[0152] "path": "H2_1B RampRate",

[0153] "series": [(1, 0.0)],

[0154] "summary": {

[0155] "distinct_count": 1,

[0156] "change_points": [1],

[0157] "min": 0.0,

[0158] "max": 0.0

[0159] }

[0160] }

[0161] By compressing long sequences into change patterns and key indicators, the system reduces the cost of manual step-by-step inspection and provides more stable and interpretable evidence for large language models. The system can also provide coupling hints based on the co-occurrence relationships of change points across multiple parameters to help locate potential linked changes.

[0162] Example 5: Parameter path parsing and fuzzy positioning.

[0163] To lower the barrier for users to understand the underlying JSON structure, when users only provide parameter names or keywords instead of complete paths, the system performs string matching or fuzzy matching on the path set based on the structure index built in Implementation Example 1, returning a set of candidate paths that can be sorted by relevance. Subsequent tools then use the accurate paths for further analysis, or the large language model selects the most reasonable candidate under tool constraints. This implementation significantly reduces engineering problems caused by incorrect paths leading to analysis failures and avoids the large language model generating non-existent paths out of thin air.

[0164] Example 6: Large Language Model Analysis Process Based on Tool Calls (Inference-Action-Observation Closed Loop).

[0165] The system encapsulates capabilities such as differencing, trend analysis, and path resolution into multiple callable tools. Each tool preferably returns structured JSON results and retains directly referable evidence fields (e.g., path, distinct_count, change_points, from / to, etc.), enabling the large language model to generate a strongly bound conclusion-evidence output during the Answer phase. When a user asks a question, the large language model operates in ReAct mode under the constraints of system prompts: first, the required evidence type is defined in the Thought phase; then, the tool is invoked in the Action phase to obtain the Observation; finally, a conclusion is generated solely based on the Observation, and evidence fields are cited to support the explanation, reducing illusions and improving reproducibility and auditability. In particular, the large language model's reasoning and conclusion generation use the structured JSON returned by the tool and confirmed observations as evidence sources, avoiding unconstrained inference based directly on the full original recipe text.

[0166] Example 7: Streaming Reasoning and Context Control.

[0167] The system uses a streaming approach to return the analysis process and results, including information on stages such as thought, action, observation, and answer, allowing users to see the analysis progress and intermediate evidence in real time. Figure 3 As shown, the streaming output module preferably outputs thinking, action, observation and answer segment by segment in the order of stages. The action stage triggers tool calls and the observation stage outputs structured JSON evidence returned by the tool, so that the analysis process is traceable and verifiable.

[0168] When a risk of exceeding the context limit in a large language model is detected, the system performs context control and compression processing. For example... Figure 4As shown, the context processing module preferably first checks whether the context length / token consumption exceeds the threshold, and then compresses it with "field-level evidence" as the smallest retention unit: retaining the key structure schema, key path, and differential statistical fields (such as distinct_count, change_points, min, max, etc.) and the evidence returned by the tool. If necessary, the analysis tool is called again or the final analysis result is generated based on the retained evidence, so as to maintain output stability and ensure the consistency of the evidence chain under context-constrained conditions.

[0169] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for tool constraint based large model process recipe delta differential modeling, characterized in that, The method includes: Step S1: Obtain a process recipe file containing multiple process steps and offset dimensions. The process recipe file is in JSON format. Perform path normalization processing on the process recipe file, convert the parameter name of each parameter item in the JSON data into a unique normalized path string, and generate a node identity identifier that is independent of the process steps based on the normalized path using a stable hash function, so as to ensure that the same parameter has a consistent identity identifier under different process steps. Step S2: Construct a structural signature snapshot for each process step. For each process step: Construct a leaf node hash set based on the node identity and corresponding parameter values, where each leaf node hash value is generated by the node identity and parameter values; Sort the leaf node hash set according to the node identity to generate the structural signature hash value for the corresponding process step, where the structural signature hash value is the hash value obtained by performing aggregate hash calculation on the sorted leaf node hash set; Store the leaf node hash set and the structural signature hash value as a structural signature snapshot of the process step for fast consistency determination between subsequent steps. Step S3: Construct a cross-step sparse incremental difference structure based on node identity identifiers. Traverse the data of each process step in the order of process steps and maintain the mapping relationship between node identity identifiers and the most recent record value. When the value of a node in the current process step is inconsistent with the value of the most recent record, only the node identity identifier corresponding to that process step and its value change information are recorded. In this way, a sparse incremental sequence containing only the process step points with value changes is constructed for each node, forming a structured incremental difference tree. Step S4: Generate statistical summary information based on the structured incremental difference tree. For each node in the structured incremental difference tree, calculate the number of different values ​​based on its sparse incremental sequence, determine the process step points where the values ​​change, and calculate the minimum and maximum values ​​for numerical parameters. The statistical summary information is calculated based on the pre-constructed sparse incremental sequence without retracing the original JSON data, thereby reducing the computational complexity of the difference analysis. Step S5: Construct a multi-layered analysis tool based on a structured parameter tree and differential index, and establish a tool capability registration and invocation constraint mechanism. Specifically, this includes: creating structured capability declaration objects for the analysis tools within the multi-layered analysis tool; the capability declaration objects include the tool name, functional semantic description, input field structure, output JSON schema, applicable analysis type, and dependent data source; registering the capability declaration objects as the capability boundary space that the large language model can invoke, and enabling the analysis tools to output structured JSON results based on the data structure formed in steps S1–S4; and constraining the tool invocation scope of the large language model based on the capability declaration objects. Step S6: Construct a constrained reasoning control mechanism based on structural pattern constraints and tool capability constraints to drive the large language model to perform recipe difference analysis in a closed-loop manner of reasoning-action-observation. Specifically, this includes: injecting a tool list and corresponding capability declaration objects into the system prompt words, so that the large language model aligns the tool input field structure when generating tool call requests and ensures that the action input is legal and the tool output can be structured and parsed; limiting the boundary of the large language model's reasoning behavior, so that the analysis is based on the structured JSON results returned by the analysis tool and that no fictitious or unverified data is allowed, and calling path parsing tools to determine the normalized path or node identity before performing parameter analysis to generate interpretable analysis results.

2. The tool constraint based large model process recipe delta differential modeling method of claim 1, wherein, The path normalization process includes: not expanding array indices when traversing JSON data, and uniformly abstracting array nodes into array placeholder markers path[] in the normalized path string.

3. The incremental differential modeling method for large-scale process formulations based on tool constraints according to claim 1, characterized in that, The node identity identifier is generated by a hash input containing the normalized path string and parameter type information.

4. The incremental differential modeling method for large-scale process formulations based on tool constraints according to claim 1, characterized in that, The fast consistency determination includes: when the structure signature hash values ​​of adjacent process steps are the same, it is determined that the parameter sets corresponding to the adjacent process steps are consistent.

5. The incremental differential modeling method for large-scale process formulations based on tool constraints according to claim 1, characterized in that, The sparse incremental sequence records the value change information, including the identifier of the changed process step, the node identity identifier, and the changed parameter value.

6. The incremental differential modeling method for large-scale process formulations based on tool constraints according to claim 1, characterized in that, The process step points in the statistical summary information that have undergone value changes are the set of process step points in the sparse incremental sequence that have value records; the different number of values ​​are calculated based on the set of values ​​in the sparse incremental sequence.

7. The incremental differential modeling method for large-scale process formulations based on tool constraints according to claim 1, characterized in that, The multi-layer analysis tool includes: The first type of analytical tool is used to generate value sequences, value change summaries, or trend summaries for a single parameter. The second type of analytical tool is used to generate differential results or statistical information on hot spots of change between process steps; The third type of analysis tool is used to generate global summary information or differential pattern recognition results based on the differential results of multiple process steps.

8. The incremental differential modeling method for large-scale process formulations based on tool constraints according to claim 1, characterized in that, The structured capability declaration object also includes a data source identifier for the output structured JSON result. The data source identifier is used to indicate that the output field corresponds to the normalized path, node identity identifier, sparse incremental sequence, and statistical summary information in steps S1–S4.

9. The incremental differential modeling method for large-scale process formulations based on tool constraints according to claim 1, characterized in that, The restricted inference control mechanism includes: when the tool call request generated by the large language model does not meet the input field structure, refusing to execute the corresponding tool call and returning a structured error message to prompt it to regenerate a valid tool call request.

10. A large-scale process formulation incremental differential modeling system based on tool constraints, characterized in that, The system includes: The formula data access module is used to acquire and parse process formula data files, convert the raw structured data into a unified data representation form within the system, and identify and classify the formula data according to the process step dimension and offset dimension to generate an ordered dimension index structure. The path normalization and node identity generation module is used to perform path normalization processing on the parameter names in the process formula data, generate a unique normalized path string, and generate a node identity identifier that is independent of the process steps based on the normalized path through a stable hash function. The structure signature snapshot construction module is used to construct a leaf node hash set for each process step based on the node identity identifier and the corresponding parameter value, and to generate the structure signature hash value of the corresponding process step by sorting the leaf node hash set by the node identity identifier. The leaf node hash set and the structure signature hash value are stored as a structure signature snapshot of the process step for rapid consistency determination between steps. The multi-step parameter incremental difference tree construction module is used to traverse the data of each process step in the order of process steps, maintain the mapping relationship between node identity and the most recent recorded value, and record the node identity and value change information of the corresponding process step when the node value changes, so as to construct a sparse incremental sequence containing only the process step points with value changes for each node, forming a structured incremental difference tree. The parameter change statistics and difference analysis module is used to calculate the number of different values ​​based on the sparse incremental sequence of the structured incremental difference tree, determine the process step points where the values ​​change, and calculate the minimum and maximum values ​​of numerical parameters. The statistical calculation is completed based on the pre-constructed sparse incremental sequence without retracing the original structured data. The analysis tool system module is used to construct a multi-layered analysis tool system and establish a tool capability registration and invocation constraint mechanism. It creates a structured capability declaration object for the analysis tool. The capability declaration object includes the tool name, functional semantic description, input field structure, output JSON schema, applicable analysis type, and dependent data source. The capability declaration object is registered as a capability boundary space that can be invoked by the large language model, so that the analysis tool outputs structured JSON results based on the data structure formed in steps S1–S4. The prompt word and restricted reasoning control module is used to generate preset system prompt words and inject a tool list and corresponding capability declaration objects into the system prompt words. Based on structural pattern constraints and tool capability constraints, it drives the large language model to call the analysis tool in a closed loop of reasoning-action-observation. It limits the analysis to be generated based on the structured JSON results returned by the analysis tool and does not allow the creation of unverified data. Before performing parameter analysis, it calls the path parsing tool to determine the normalized path or node identity. The output module is used to output the analysis results; The context processing module is used to compress historical information when the context of a large language model is exceeded, and to generate the final analysis results based on the analysis tools.