Workflow batch execution and lifecycle facts repository construction method and system

By using a table-based control plane and a lifecycle fact base plane architecture, we have solved the problems of high interaction threshold, lack of batch processing and status tracking, difficulty in low-code complex workflows, and lack of context sharing and reuse in enterprise business scenarios. We have achieved the ability to support cross-step and cross-batch data sharing and reuse by structuring and retrieving the full lifecycle data of the workflow, thereby improving work efficiency and user experience.

CN122240616APending Publication Date: 2026-06-19SHANGHAI SHENGCHUN SUHUIYUN TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI SHENGCHUN SUHUIYUN TECHNOLOGY CO LTD
Filing Date
2026-03-17
Publication Date
2026-06-19

Smart Images

  • Figure CN122240616A_ABST
    Figure CN122240616A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for batch execution of workflows and construction of a lifecycle fact base. Through a dual-plane architecture of a spreadsheet control plane and a lifecycle fact base plane, it transforms a spreadsheet into an AI batch processing console. The method includes: receiving a target workflow and extracting input parameter pattern definitions; dynamically generating a table structure containing input columns, result columns, and status columns; monitoring changes in table rows to trigger task creation; retrieving historical lifecycle data based on association identifiers and assembling context before execution; submitting tasks carrying context information to the execution engine for parallel execution; obtaining execution status and results through real-time communication and backfilling them into the table; persisting table data and process data to a data storage system and establishing associations to support context reuse for subsequent tasks. This invention lowers the barrier to AI use and achieves structured accumulation and reuse of batch processing, status tracking, and lifecycle data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of computer data processing technology, specifically relating to a method and system for batch execution of workflows and construction of a lifecycle fact base. Background Technology

[0002] With the popularization of artificial intelligence technology, enterprises and business personnel have an increasing demand for using AI capabilities in office scenarios. However, existing AI tools have significant interaction barriers when facing business personnel. Existing technologies mainly fall into the following categories: The first category is conversational AI tools, such as ChatGPT and Claude chat interfaces. These tools use a single-turn dialogue mode, making it difficult to process structured data in batches and unable to handle the batch analysis needs of hundreds or thousands of data points; the second category is Excel AI plugins, such as Microsoft Copilot and WPS AI. These tools have simple functions and can only call a single model to complete simple tasks, unable to run complex multi-step workflows; the third category is RPA (Robotic Process Automation) tools, such as UiPath and Power Automate. These tools have a high learning curve, requiring the configuration of complex flowcharts and cannot be directly connected to AI agents; the fourth category is traditional ETL (Extract, Transform, Load) tools, such as Apache Airflow. These tools run purely in the background without a front-end interactive interface, making them inaccessible to business personnel.

[0003] Existing technologies suffer from the following technical shortcomings: First, there is a disconnect between business users and AI tools. Business personnel are accustomed to using Excel or spreadsheet software to process data but lack programming skills and cannot configure complex workflows. Data analysts, while familiar with Jupyter / Notebook, require programming fundamentals, resulting in a steep learning curve. Operations personnel are accustomed to batch processing spreadsheets, but existing AI tools are chat-based and cannot perform batch operations. Second, there is a lack of batch processing and status tracking. In user scenarios requiring analysis of public opinion from 1000 companies, existing solutions are insufficient: ChatGPT can only query one line at a time, not in batches; API calls require writing code for looping; it's unclear which tasks are running or completed; and results are scattered and difficult to aggregate. Third, there is a difficulty in low-code implementation of complex workflows. Business personnel lack JSON / YAML syntax, existing low-code platforms are overly complex, have high learning costs, and lack WYSIWYG configuration interfaces. Fourth, there is a lack of support for context sharing and reuse. Existing solutions lack a structured mechanism for storing and sharing workflow execution data: context is scattered across table cells, logs, or temporary caches, making it difficult to retrieve and reuse data across multiple steps and batches. Fifth, there is a lack of structured data persistence and searchable association. The lack of structured associations between table rows, instances, steps, and artifacts makes it impossible to retrieve, audit, rerun, and reuse data.

[0004] Therefore, providing a method and system for batch execution of workflows and construction of a lifecycle fact base to solve the above problems is a technical problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0005] The technical problem to be solved by this invention is to provide a method and system for batch execution of workflows and construction of lifecycle fact bases using spreadsheets as the entry point, so as to overcome the technical defects in the prior art, such as the disconnect between business users and AI tools, lack of batch processing and status tracking, difficulty in low-code complex workflows, lack of context sharing management and reuse, and lack of structured data persistence and searchable association.

[0006] To address the aforementioned technical challenges, this invention employs a dual-plane architecture of "table control plane + lifecycle fact base plane": on the table side, rows are used as task units to implement batch declaration, triggering, state machine, and backfilling; on the data side, the entire lifecycle data of workflow instances is structurally deposited into the data storage system and searchable associations are established, supporting contextual retrieval assembly, thereby enabling data reuse across steps and batches.

[0007] The lifecycle fact base is a logical functional module responsible for the structured accumulation, association establishment, and contextual retrieval of workflow lifecycle data. The data storage system is the implementation carrier of the lifecycle fact base and can be a relational database, a non-relational database, or a hybrid storage system, preferably a relational database. It is used to persistently store tabular data and lifecycle data generated during workflow execution. The tabular data includes at least column definitions, row records, cell values, and status column values ​​and their version change records. The data storage system establishes searchable associations and indexes between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers to support contextual retrieval assembly for conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step product records. In this invention, unless otherwise stated, "lifecycle fact base" refers to the logical module implemented by the data storage system, and the two can be used interchangeably in the functional description.

[0008] Furthermore, this invention persistently stores the table structure and data of a spreadsheet in a structured manner within the data storage system through the lifecycle fact base, forming a queryable relational data representation; and establishes traceable associations and indexes between table rows and workflow instances, step executions, and step products. Thus, before subsequent workflows are executed, the required context fragments can be directly retrieved, joined, and aggregated based on relational data, and assembled with lifecycle products to form a context package, thereby achieving data sharing and reuse across steps and batches, improving the efficiency of context retrieval and analysis, and reducing costs.

[0009] Specifically, the lifecycle fact base stores table structures, row records, cell values, and the relationships between rows, instances, steps, and products in the data storage system, which serves as the implementation carrier, and establishes an index. Subsequent context assembly can be achieved directly through conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system. This allows for rapid retrieval of the required context fragments at a controllable cost. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step product records. Compared to existing solutions where context is scattered in log files, temporary cell caches, or unstructured object storage, making efficient retrieval difficult, this invention significantly improves context retrieval speed and greatly reduces retrieval costs. It also supports audit traceability and historical task reruns, laying a structured foundation for subsequent cross-step and cross-batch data sharing and management.

[0010] The system enables business personnel to use AI capabilities in batches through a familiar spreadsheet interface without writing code; at the same time, it persists the full lifecycle data of each task in a structured manner to the data storage system and provides a context retrieval and reuse mechanism, so that subsequent steps and tasks can be retrieved, trimmed and injected with historical context.

[0011] According to one aspect of the present invention, a method for batch execution of workflows and construction of a lifecycle fact base is provided, comprising the following steps: S1: Receive the target workflow and extract the input parameter mode definition. Receive the target workflow and extract the input parameter pattern definition from it. The input parameter pattern definition includes at least one or more of the following: name, type, description, whether it is required or has a default value.

[0012] S2: Dynamically generate table column structure The table column structure is dynamically generated based on the input parameter pattern definition. This table column structure includes input columns, result columns, and status columns. Specifically, corresponding input columns are generated based on the mapping relationship between input parameter types and table column types. The mapping relationship includes: text type maps to text input columns, numeric type maps to numeric input columns, enumeration type maps to dropdown selection columns, and array type maps to multi-row input columns.

[0013] S3: Monitor changes in table row data and trigger the creation of a background task. The system monitors changes in table row data. When it detects new or modified data in the input column or when the status column is marked as pending execution, it triggers the creation of a background task. The monitoring includes detecting new or modified data in the input column, detecting a change in the status column from other values ​​to pending execution, or detecting batch execution operations triggered by the user. Before triggering the background task creation, the system generates a unique row identifier for each row of data. This row identifier can be generated based on the row number, the hash value of the input parameters, or a combination key, and is used to uniquely identify a row of tasks and all its associated lifecycle data. Based on the row identifier and input parameters, a deduplication check is performed to determine whether the task corresponding to that row is already in execution or completed, thus avoiding duplicate triggering.

[0014] S4: Context Assembly Before executing the task, historical lifecycle data is retrieved from the data storage system based on the current row and its associated identifiers, and then processed to form context information. Specifically, at least one of the row identifier, tenant identifier, project identifier, or workflow version identifier is used as the associated identifier. Utilizing the searchable association relationships and indexes between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers established in the data storage system, historical lifecycle data fragments associated with the current row are retrieved from the data storage system. The retrieval methods include structured conditional retrieval and / or semantic retrieval based on vector similarity; wherein, the structured conditional retrieval includes conditional queries, join queries, and aggregation queries based on the data model supported by the data storage system, and the query objects include at least table row records, cell values, workflow instance records, step execution records, and step product records; the data storage system supports the structured retrieval through its built-in query engine, indexing mechanism, or external query layer. The required context fragments are assembled from the data storage system through conditional queries, join queries, and aggregation operations. The retrieved historical lifecycle data fragments are then trimmed, aggregated, and normalized to form a context package with a preset structure. The pruning is performed based on at least one of a time window, a relevance threshold, or a context budget constraint; the context package includes at least a set of structured fields, a lifecycle data summary, reference pointers, and constraint information.

[0015] The context packet is merged with the input parameters of the current row to form enhanced input data. The structured field set in the context packet can be directly mapped to workflow input parameters, and the lifecycle data summary serves as additional reference information for each step of the workflow.

[0016] S5: Submit tasks for parallel execution, and obtain execution status and results. Tasks with attached context information are submitted to the execution engine for parallel execution to obtain their execution status and results. Specifically, line groups that can be executed in parallel are identified, tasks are submitted in batches, and the number of concurrently executed tasks is controlled. The execution engine is a workflow execution engine that supports complex workflows with multiple steps and models. Obtaining execution status and results is achieved through a real-time communication mechanism, which includes at least one of long-connection communication, polling, or server-sent events. The real-time communication mechanism supports reconnection after disconnection. When communication is interrupted, the system automatically initiates a reconnection mechanism, caches status updates and result data generated during the interruption, and pushes all cached data to the client after the connection is restored to ensure data integrity and consistency.

[0017] S6: Backfill execution status and results to the table The execution status is updated to the status column of the corresponding table row, and the execution result is updated to the result column of the corresponding table row. The update includes at least one of batch backfilling of intermediate results, error backfilling, or incremental update.

[0018] S7: Persist lifecycle data and establish relationships. The table column structure, table row data, and process data generated during task execution are persisted to the data storage system. The table row data includes at least row records, cell values, and status column values, as well as their version change records. Simultaneously, a searchable association is established between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers. Specifically, this includes mappings between table row identifiers and workflow instance identifiers, between workflow instance identifiers and step execution records, and between step identifiers and product identifiers. An index is created for these mappings to support conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system, performed by table rows, workflow instances, or execution steps. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step product records, providing a structured retrieval foundation for subsequent task context reuse. The process data includes at least one or more of the following: status transition records, step-level intermediate products, or key decision information.

[0019] Furthermore, the method also includes a feedback signal acquisition step: acquiring feedback signals, the feedback signals including at least one of manual confirmation, manual correction, manual scoring, automatic verification results, or downstream system acknowledgments; associating the feedback signals with corresponding row identifiers and workflow instance identifiers and storing them in the data storage system for quality tracking, auditing, or manual verification.

[0020] Another aspect of the present invention provides a workflow batch execution and lifecycle fact base construction system, comprising: The pattern mapping module is used to receive the target workflow and extract the input parameter pattern definition from it, and dynamically generate a table column structure based on the input parameter pattern definition. The table column structure includes an input column, a result column, and a status column. The row-level triggering module is used to monitor changes in table row data. When it detects that new or modified data is added to the input column or that the status column is marked as pending execution, it triggers the creation of a background task. The context assembly module is used to retrieve historical lifecycle data from the data storage system based on the current row and its associated identifier before executing the task, and then process it to form context information. A workflow execution engine is used to receive tasks with attached context information and execute them in parallel to obtain the execution status and results of the tasks. An asynchronous backfill module is used to update the execution status to the status column of the corresponding table row and update the execution result to the result column of the corresponding table row; The lifecycle data persistence module is used to persist the table column structure, the table row data, and the process data generated during task execution to the data storage system, and to establish the association between each persisted data to support context reuse for subsequent tasks.

[0021] Compared with the prior art, the present invention has the following beneficial effects: First, it lowers the barrier to entry. This invention uses a familiar spreadsheet interface, allowing business personnel to utilize AI capabilities without needing to learn coding. Business personnel can start using it directly without training.

[0022] Second, it improves batch processing efficiency. This invention supports large-scale parallel data processing, capable of handling tens of thousands of data entries, significantly improving work efficiency.

[0023] Third, it achieves transparent and controllable status. This invention allows users to view task execution progress and results in real time, providing a clear overview of the status and a good user experience.

[0024] Fourth, it simplifies development and maintenance. The workflow developers of this invention only need to define the input parameter schema once, which can then be used by business users without requiring repeated development of the front-end interface.

[0025] Fifth, it achieves WYSIWYG (What You See Is What You Get). This invention uses tables to intuitively display the input-output mapping relationship, making the configuration process intuitive and easy to understand.

[0026] Sixth, achieving data traceability and reuse. This invention persists lifecycle data to the data storage system in a structured manner, establishing searchable relationships and indexes between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers. It supports conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system to quickly retrieve and assemble table row records, cell values, workflow instance records, step execution records, and step product records, enabling audit traceability, historical task reruns, and context reuse, providing a data foundation for subsequent analysis and optimization. Attached Figure Description

[0027] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0028] Figure 1 This is a schematic diagram of the overall architecture of the workflow batch execution and lifecycle fact base construction system provided in this embodiment of the invention; Figure 2 This is a flowchart of the workflow batch execution and lifecycle fact base construction method provided in this embodiment of the invention; Figure 3 This is a schematic diagram of the state machine transition framework provided in an embodiment of the present invention; Figure 4 This is a timing diagram for batch processing in an embodiment of the present invention. Detailed Implementation

[0029] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0030] The embodiments of this invention are written in a progressive manner.

[0031] Example 1: System Architecture like Figure 1 As shown, this invention provides a workflow batch execution and lifecycle fact base construction system. Its core lies in transforming spreadsheets into an AI batch processing console, using spreadsheet rows to manage the task lifecycle. The system adopts a dual-plane plus execution layer architecture: the spreadsheet control plane, responsible for column generation, row-level triggering, backfilling, and feedback; the lifecycle fact base plane, responsible for structured data accumulation, association establishment, and contextual retrieval; and the execution layer, composed of a workflow execution engine, model services, message queues, etc., responsible for the actual execution of workflow instances.

[0032] Specifically, the system includes the following functional modules: The pattern mapping module, belonging to the table control plane, receives the target workflow and extracts the input parameter pattern definition from it. It then dynamically generates a table column structure based on the input parameter pattern definition, including input, result, and status columns. This module transforms the workflow developer's definition into a table interface that business users can directly manipulate, achieving a "what you see is what you get" configuration experience.

[0033] The row-level triggering module, belonging to the table control plane, is used to monitor changes in table row data. When it detects newly added or modified data in an input column or when a status column is marked as pending, it triggers the creation of a background task. This module supports deduplication detection, batch aggregation, and concurrency control to ensure the accuracy and efficiency of task triggering.

[0034] The context assembly module, belonging to the lifecycle fact base plane, is used to retrieve historical lifecycle data from the data storage system based on the current row and its associated identifiers before task execution, and then process it to form context information. This module extracts reusable context fragments from historical data through structured conditional retrieval and / or semantic similarity retrieval, enabling data sharing across steps and batches. The structured conditional retrieval includes conditional queries, join queries, and aggregation queries based on the data model supported by the data storage system. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step output records.

[0035] The workflow execution engine, belonging to the execution layer, receives tasks with additional context information and executes them in parallel to obtain the task's execution status and results. This engine adopts a microservice architecture, supports horizontal scaling, and uses message queues to achieve asynchronous task scheduling, enabling it to handle complex workflows with multiple steps and models.

[0036] The asynchronous backfill module, belonging to the table control plane, is used to update the execution status to the status column of the corresponding table row and the execution result to the result column of the corresponding table row through a real-time communication mechanism. This module supports batch backfilling of intermediate results, error backfilling, and incremental updates, providing real-time execution progress feedback.

[0037] The lifecycle data persistence module, belonging to the lifecycle fact base plane, is used to persist the table column structure, table row data, and process data generated during task execution to the data storage system. The table row data includes at least row records, cell values, and status column values ​​and their version change records. Simultaneously, it establishes searchable associations between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers. Specifically, it maps table row identifiers to workflow instance identifiers, workflow instance identifiers to step execution records, and step identifiers to product identifiers. Indexes are created for these mappings to support conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system, performed by table rows, workflow instances, or execution steps. Query objects include at least table row records, cell values, workflow instance records, step execution records, and step product records to support context reuse for subsequent tasks.

[0038] The data storage system, serving as the implementation carrier of the lifecycle fact base, is used to persistently store table structure data, table row data, lifecycle process data, and various relationships. The table row data includes at least row records, cell values, and status column values, as well as their version change records. The data storage system establishes searchable relationships and indexes between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers. It supports contextual retrieval assembly for conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system. Query objects include at least table row records, cell values, workflow instance records, step execution records, and step product records.

[0039] The six functional modules mentioned above work together to form a complete three-layer architecture: "table control plane - execution layer - lifecycle fact base plane". Users operate through the table interface, and the data flows into the execution layer through row-level triggers. During the execution process, the lifecycle data generated is persisted to the fact base by the persistence module. The context assembly module retrieves historical data from the fact base to support subsequent tasks. The execution results are displayed in the table in real time through the asynchronous backfill module, forming a closed-loop data flow system.

[0040] All modules communicate through well-defined interfaces, ensuring the system's scalability and stability. Row identifiers serve as the core association key, running through table rows, workflow instances, step execution, and product records, enabling traceability throughout the entire process and laying a structured foundation for subsequent auditing, reruns, and context reuse.

[0041] In this embodiment, at the system level, the table front-end layer is implemented using a web front-end framework, supporting dynamic rendering and real-time updates of table components. The back-end engine layer is implemented using a server-side framework, responsible for schema mapping, row-level triggering, and backfilling. The workflow execution engine adopts a microservice architecture, supports horizontal scaling, and uses message queues to achieve asynchronous task scheduling. The long-connection service uses a dedicated server, supporting high-concurrency long connections. The data storage system uses a relational database to store table data and workflow definitions.

[0042] The implementation of the contextual retrieval assembly includes structured conditional retrieval based on the data model supported by the data storage system (retrieval through query, join and aggregation operations) and / or semantic retrieval based on vector similarity; for data storage systems that support vector retrieval, semantic retrieval capabilities can be provided through their built-in vector index extensions, or by a separate vector retrieval component.

[0043] During system operation, data flow strictly adheres to timing and dependency constraints. User workflow selection triggers schema extraction, schema mapping generates a table column structure, user form input triggers row-level actions, triggers create tasks and submit them to the workflow engine, the workflow engine executes the tasks and pushes status updates, the backfill engine receives the results and updates the table, and the persistence module writes lifecycle data to the data storage system to support contextual retrieval and reuse. Row identifiers and entity identifiers are used across table rows, workflow instances, step executions, and artifact records to achieve traceable association. All modules communicate through well-defined interfaces to ensure system scalability and stability.

[0044] Example 2: General Process like Figure 2 As shown, this invention provides a method for batch execution of workflows and construction of a lifecycle fact base, including the following steps: S1: Receive the target workflow and extract the input parameter mode definition. This step is performed by the pattern mapping module, which receives the target workflow selected by the user through the workflow selection interface. Workflows are predefined by developers and stored in the workflow repository. The workflow definition uses a structured data format and includes a workflow identifier, workflow name, workflow description, input parameter pattern definition, and output parameter pattern definition.

[0045] The input parameter pattern definition is the core data structure of this invention, which includes the name, type, description, whether it is required, default value, and validation rules for each input parameter. Input parameter types include: text type (representing text strings); numeric type (representing numbers); enumeration type (representing enumerated values); array type (representing arrays); and boolean type (representing boolean values).

[0046] S2: Dynamically generate table column structure This step is performed by the column generation unit of the pattern mapping module, which generates the corresponding table column type based on the input parameter type. The mapping relationship between input parameter type and table column type is as follows: Text-type mappings are used to create text input columns, supporting single-line text input. Numeric values ​​are mapped to numeric input columns, supporting numeric input and validation; Enumerated mappings are used to create dropdown selection columns that display a predefined list of options; Array-type mappings can be converted into multi-line input columns, supporting multi-line text or comma-separated input.

[0047] The table column structure also includes result columns and status columns. Result columns are generated based on the output parameter pattern definition; common result column types include: analysis result column, confidence column, generated content column, and classification result column. Status columns are used to identify the execution status of each task row, with status values ​​including: blank row, pending execution, executing, completed, and failed.

[0048] S3: Monitor changes in table row data and trigger task creation. This step is executed by the row-level triggering module, which internally contains a change monitoring unit and a task creation unit. The change monitoring unit uses the following monitoring mechanisms: cell value change monitoring, which listens for value change events in all input columns; status column change monitoring, which listens for value change events in the status column; and batch operation monitoring, which listens for multiple rows of data selected by the user.

[0049] The task creation unit employs a deduplication mechanism to prevent duplicate triggering of the same row. Before triggering the creation of a background task, the system generates a unique row identifier for each row of data. This row identifier can be generated based on the row number, the hash value of the input parameters, or a combination key, and is used to uniquely identify a row of tasks and all its associated lifecycle data. Deduplication includes: maintaining a set of row identifiers for already triggered tasks; checking whether the row identifier already exists before creating a new task; if it already exists and the task is not yet completed, the row is skipped.

[0050] S4: Context Assembly Before executing the task, the context assembly module retrieves historical lifecycle data from the data storage system based on the current row and its associated identifiers. The retrieval can employ structured conditional retrieval or semantic retrieval based on vector similarity. The structured conditional retrieval includes conditional queries, join queries, and aggregation queries based on the data model supported by the data storage system. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step output records. Through conditional queries, join queries, and aggregation operations, the required context fragments are assembled from the data storage system. The retrieval results are then trimmed, aggregated, and normalized to form a context package, which serves as additional input for the task.

[0051] The pruning mechanisms include: pruning by time window, retaining only historical data from the most recent N days; pruning by relevance threshold, retaining only data with similarity higher than the threshold; and pruning by context budget, limiting the number of tokens or the amount of data in the context package.

[0052] The aggregation mechanisms include: aggregation by row identifier, which merges multiple batches of historical data in the same row; aggregation by workflow instance, which merges multiple steps of the same instance; and aggregation by rule, which merges data of similar types according to preset rules.

[0053] The standardization process includes: formatting the data into a uniform JSON structure; adding metadata tags; and generating a data summary.

[0054] In practical application, before executing a new task, the system retrieves historical lifecycle data fragments from the data storage system based on the associated identifiers of the current row (such as tenant ID, project ID, workflow version). Retrieval can employ structured conditional retrieval (such as by time range or task status) or similarity retrieval (vectorizing historical output text and recalling based on vector similarity). The retrieval results are then pruned (limiting the number of results or tokens) and aggregated (merging according to rules) to form a context package, which is then injected into the workflow instance for execution.

[0055] S5: Submit tasks for parallel execution, and obtain execution status and results. The workflow execution engine receives tasks with additional context information, identifies line groups that can be executed in parallel, submits tasks in batches, and controls the number of tasks executed concurrently. The task scheduling unit uses a task queue mechanism to support high-concurrency task scheduling. The concurrency control unit uses a rate-limiting mechanism to set a maximum concurrency and control the number of tasks executed simultaneously.

[0056] The asynchronous backfill module receives the task's execution status and results through a real-time communication mechanism. This mechanism preferably uses long-lived connections, establishing a connection when the table is loaded, with the server proactively pushing updates on the execution status. The pushed content includes: row identifiers, status values, progress percentages, and result data.

[0057] S6: Backfill execution status and results to the table The asynchronous backfill module updates the received execution status to the status column of the corresponding table row and updates the execution result to the result column of the corresponding table row. The backfill mechanism includes: batch backfilling of intermediate results, suitable for long task scenarios; error backfilling, writing error information to the status column; and incremental updates, updating only the cells that have changed.

[0058] S7: Persist lifecycle data and establish relationships. The lifecycle data persistence module persists the table column structure, table row data, and process data generated during task execution to the data storage system. The table row data includes at least row records, cell values, and status column values, along with their version change records. Process data includes: status transition records, step-level intermediate products, and key decision information. During persistence, a mapping relationship is established between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers. An index is created on the mapping, supporting conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system, performed by table rows, workflow instances, or execution steps. Query objects include at least table row records, cell values, workflow instance records, step execution records, and step product records, providing a structured retrieval foundation for context reuse in subsequent tasks.

[0059] Feedback signal acquisition In optional steps, the system collects feedback signals, including manual confirmation, manual correction, manual scoring, automatic verification results, or downstream system acknowledgments. These signals are associated with the corresponding row identifiers and workflow instance identifiers and stored in the data storage system for quality tracking, auditing, or manual verification.

[0060] like Figure 3 As shown, this embodiment of the invention also provides a state machine transition mechanism to describe the complete lifecycle state transition of a single-line task. The state machine transition includes the following states: empty line state, indicating that the user has not entered any data; pending execution state, indicating that the user has entered data or manually marked it as pending execution; executing state, indicating that the task is being executed, and a progress bar can be displayed in the table; completed state, indicating that the task was executed successfully, and the result is displayed in the result column; and failed state, indicating that the task failed, and error information is displayed in the status column.

[0061] The state machine transition mechanism also includes transition rules between states: when a user enters data in the input column, the state column automatically changes from an empty row to a pending state; when the system starts executing a task, the state column changes from a pending state to an executing state; when the task is successfully executed, the state column changes from an executing state to a completed state; when the task fails, the state column changes from an executing state to a failed state; the user can manually remark rows in the failed state as pending states to trigger task retry.

[0062] Example 3: Batch Processing of Public Opinion Analysis In this embodiment, business personnel paste 1,000 company names into a table, and the system automatically analyzes public opinion in parallel.

[0063] The workflow is configured as "Public Opinion Analysis Agent". The table is automatically generated as follows: Company Name column, Input column, text input type, user enters company name; Analysis Result column, Result column, stores the public opinion analysis results returned by AI; Status column, Status column, indicates the execution status; Confidence column, Result column, stores the confidence score of the analysis results.

[0064] The execution flow is as follows: 0 seconds: User pastes 1000 company names for batch input; 1 second: System identifies new data, row-level triggers detect changes; 2 seconds: System creates 1000 workflow instances, triggered in parallel; 2 to 30 seconds: Public opinion analysis is performed, 1000 agents run in parallel; 30 to 60 seconds: Results are gradually backfilled, backfilling occurs upon completion of each task; 60 seconds: All tasks are completed, and the status column displays "Completed".

[0065] In the aforementioned scenario of batch processing of public opinion analysis, the specific implementation of step S4, context assembly, is as follows: When multiple analysis requests are initiated for the same company name (e.g., weekly regular analysis), before executing a new task, the system retrieves the company's historical public opinion analysis results from the data storage system based on the company name as an association identifier. The retrieved historical data is then pruned (e.g., only the most recent 3 analysis results are retained) and aggregated (multiple analysis results are merged into trend data) to form a context package, which serves as additional input for the new task. This context package enables the current public opinion analysis to perform comparative analysis (e.g., "public opinion score increased by 5% compared to last month") or trend judgment (e.g., "negative public opinion is on the rise") based on historical data, significantly improving the depth and value of the analysis results.

[0066] The specific implementation of data persistence in step S7 is as follows: During each execution of public opinion analysis, the system persists the following data to the data storage system: (1) Table column structure: column definitions for company name column, analysis result column, and confidence score column; (2) Table row data: 1000 company names and their corresponding analysis results and confidence scores; (3) Process data: status flow records of each workflow instance (such as "in execution → completed"), step-level intermediate products (such as the original summary of public opinion analysis), and key decision information (such as sentiment classification results). At the same time, the system establishes the following relationships in the data storage system: mapping of company name (row identifier) ​​to workflow instance ID, mapping of workflow instance ID to public opinion analysis step record, mapping of step ID to public opinion summary product, and indexes these mappings. These persisted data and relationships provide a directly searchable and reusable historical context for subsequent batches of public opinion analysis, supporting trend analysis and comparative analysis, while meeting the needs of audit traceability and historical task rerun.

[0067] like Figure 4As shown in the figure, this embodiment of the invention demonstrates the complete timeline for batch processing 1000 data entries. Taking the above-mentioned batch processing scenario for public opinion analysis as an example, the timeline flow is as follows: At time T0, the user pastes 1000 company names, which are then batch-entered into the input column of the table; At time T1, the system identifies new data, and the row-level trigger module detects a change in the input column data. At time T2, the row-level trigger module creates 1000 workflow instances and submits them to the workflow execution engine in parallel; From time T2 to T30, the workflow execution engine executes 1000 public opinion analysis tasks in parallel, with each task running independently; From time T30 to T60, as tasks are completed one after another, the asynchronous backfilling module gradually backfills the execution results to the result column of the corresponding table row through a real-time communication mechanism, and the status column is synchronously updated to "completed". At time T60, all 1000 tasks were completed, and the status column for all rows showed "Completed". The results column displayed the complete public opinion analysis results.

[0068] Throughout the process, the state transition of each task follows... Figure 3 The state machine mechanism shown reflects state changes in real time in the table, from pending execution to execution, and then to completion or failure. State column values ​​and their version change records are persistently stored in the data storage system, ensuring traceability of each state change. This persistent data, along with the established relationships and indexes, supports subsequent contextual retrieval and assembly through conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system. Query objects include at least table row records, cell values, workflow instance records, step execution records, and step output records, providing a structured retrieval foundation for contextual reuse in subsequent tasks.

[0069] Example 4: Batch Generation of Product Descriptions This embodiment is Figure 2 The general process shown is a typical application in the field of batch generation of product descriptions, and its specific implementation is as follows: In this embodiment, e-commerce operators generate product descriptions in batches.

[0070] The workflow configuration includes the following: Product Name column, Input column, Text Input Type; Product Features column, Input column, Multi-line Input Type; Target User column, Input column, Text Input Type; Generate Description column, Result column, storing AI-generated product descriptions; Status column, Status column, indicating execution status.

[0071] The user pastes data for 100 products and clicks the run button. The system recognizes 100 new rows of data and creates 100 workflow instances to execute in parallel. Each workflow calls a large language model to generate product descriptions based on product names, product characteristics, and target users. The system gradually populates the results, and the user can see the generation progress in real time.

[0072] In the above-mentioned scenario of batch generation of product descriptions, the specific implementation of step S4, context assembly, is as follows: For product description generation tasks under the same product category, before executing a new task, the system retrieves historically generated excellent product descriptions (such as descriptions with human ratings higher than 4.5) from the data storage system based on the product category as an association identifier. The retrieved historical descriptions are then trimmed (e.g., limited to returning 5 examples) and standardized (uniformly formatted as "title + body" structure) to form a context package, which serves as additional input for the new task. This context package provides style references and expression templates for the large language model, enabling newly generated product descriptions to maintain brand consistency while drawing on the expression methods and selling point organization logic of historical successful cases, significantly improving the generation quality.

[0073] The specific implementation of data persistence in step S7 is as follows: During each product description generation process, the system persists the following data to the data storage system: (1) Table column structure: column definitions for product name, product features, target user, and generated description; (2) Table row data: input parameters for 100 products and generated product descriptions; (3) Process data: status transition records for each workflow instance, input / output pairs for large language model calls, and key decision information (such as generation version number). At the same time, the system establishes the following relationships in the data storage system: mapping between product ID (row identifier) ​​and workflow instance ID, mapping between workflow instance ID and generation step records, and mapping between step ID and generated product (product description text), and indexes these mappings. In addition, the system also associates and stores subsequent manual rating feedback with the corresponding row identifiers in the data storage system. These persisted data and relationships provide a high-quality historical example library for subsequent batches of product description generation. Throughout the process, the status transition of each task follows the... Figure 3 The state machine mechanism shown ensures the traceability of the task lifecycle, and the state column values ​​and their version change records are persistently stored in the data storage system. The established relationships and indexes support subsequent contextual retrieval assembly through conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step output records, providing a structured retrieval foundation for the reuse of context in subsequent tasks.

[0074] Example 5: Intelligent Classification of Customer Service Work Orders This embodiment is Figure 2 The general process shown is a typical application in the customer service field, and its specific implementation is as follows: In this embodiment, customer service personnel prioritize work orders in batches.

[0075] Workflow configuration includes the following: Work Order ID column, Input column, text input type; Work Order Content column, Input column, multi-line input type; Priority column, Drop-down selection column, options include P0 (urgent), P1 (high), P2 (medium), P3 (low); Classification Result column, Result column, stores the classification criteria returned by AI; Status column, Status column, indicates the execution status.

[0076] A user pastes 500 work order data entries and clicks the run button. The system recognizes 500 new rows of data and creates 500 workflow instances to execute in parallel. Each workflow calls a large language model to analyze the work order content, determine priorities, and provide classification criteria. The system gradually populates the results, and customer service personnel can view the classification progress in real time.

[0077] In the aforementioned intelligent customer service ticket classification scenario, the specific implementation of step S4, context assembly, is as follows: For customer service ticket classification tasks, before executing a new task, the system retrieves the customer's historical ticket processing records or historical classification results of similar issues from the data storage system based on the customer ID or issue type as the association identifier. The retrieved historical data is then pruned (e.g., only the 5 most recent similar tickets are retained) and aggregated (multiple classification results are merged and statistically analyzed) to form a context package, which serves as additional input for the new task. This context package enables the large language model to make judgments based on historical processing experience, such as "this customer submitted a P1 priority ticket for the same issue last month" or "similar payment failure issues are usually classified as P2 priority," thereby improving the accuracy and consistency of classification and avoiding misjudgments due to a lack of context.

[0078] The specific implementation of data persistence in step S7 is as follows: During each work order classification execution process, the system persists the following data to the data storage system: (1) Table column structure: column definitions for work order ID column, work order content column, priority column, and classification result column; (2) Table row data: input parameters and classification results of 500 work orders; (3) Process data: status transition records of each workflow instance, classification reasoning process of the large language model, and key decision information (such as classification confidence and basis keywords). At the same time, the system establishes the following associations in the data storage system: mapping of work order ID (row identifier) ​​to workflow instance ID, mapping of workflow instance ID to classification step record, mapping of step ID to classification product (priority and classification basis), and indexes are established for these mappings. In addition, the system also stores the subsequent manual confirmation or correction results of customer service personnel as feedback signals in the data storage system. These persisted data and associations provide a historical case library for subsequent batches of work order classification. Throughout the process, the status transition of each task follows Figure 3 The state machine mechanism shown ensures the traceability of the task lifecycle, and the state column values ​​and their version change records are persistently stored in the data storage system. The established relationships and indexes support subsequent contextual retrieval assembly through conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step output records, providing a structured retrieval foundation for the reuse of context in subsequent tasks.

[0079] One or more embodiments in this application are intended to cover all such substitutions, modifications, and variations that fall within the broad scope of this application. Therefore, any omissions, modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of one or more embodiments in this application should be included within the protection scope of this application.

[0080] If a flowchart is used in this application, it is used to illustrate the operations performed by the system according to embodiments of this application. It should be understood that the preceding or following operations are not necessarily performed in exact order. Instead, the steps can be processed in reverse order or simultaneously. Furthermore, other operations can be added to these processes, or one or more steps can be removed from them.

[0081] The foregoing has provided a detailed description of a workflow batch execution and lifecycle fact base construction method and system provided in this application. The above description of the disclosed embodiments enables those skilled in the art to implement or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for batch execution of workflows and construction of a lifecycle fact base, characterized in that, Includes the following steps: Receive the target workflow and extract the input parameter pattern definition from it; The table column structure is dynamically generated based on the input parameter pattern definition. The table column structure includes an input column, a result column, and a status column. Monitor changes in table row data. When new or modified data is detected in the input column or the status column is marked as pending execution, trigger the creation of a background task. Before executing the task, historical lifecycle data is retrieved from the data storage system based on the current row and its associated identifier. The retrieval includes structured conditional retrieval and / or semantic retrieval based on vector similarity. The structured conditional retrieval includes conditional queries, join queries, and aggregation queries based on the data model supported by the data storage system. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step output records, so as to assemble the required context fragments from the data storage system and form context information after processing. The task with the attached context information is submitted to the execution engine for parallel execution to obtain the execution status and results of the task; Update the execution status to the status column of the corresponding table row, and update the execution result to the result column of the corresponding table row; The table column structure, the table row data, and the process data generated during task execution are persisted to the data storage system. The table row data includes at least row records, cell values, and status column values ​​and their version change records. A searchable association and index are established between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers to support context reuse for subsequent tasks. The data storage system is a relational database, a non-relational database, or a hybrid storage system, used to persistently store table data and lifecycle data generated during workflow execution; The data storage system supports structured queries, join queries, and aggregation operations, and has established searchable relationships and indexes.

2. The workflow batch execution and lifecycle fact base construction method as described in claim 1, characterized in that, The input parameter mode definition includes at least one or more of the following: input parameter name, type, description, whether it is required or has a default value; The dynamic generation of table column structure specifically involves generating corresponding input columns based on the mapping relationship between input parameter types and table column types. The mapping relationships include: text type mapping to text input columns, numeric type mapping to numeric input columns, enumeration type mapping to drop-down selection columns, and array type mapping to multi-row input columns.

3. The workflow batch execution and lifecycle fact base construction method as described in claim 1, characterized in that, The monitoring of changes in table row data includes: detecting newly added or modified data in the input column, detecting changes in the status column from other values ​​to pending execution status, or detecting batch execution operations triggered by the user; Before triggering the creation of a background task, a deduplication check is also included: based on the row identifier and input parameters, it is determined whether the task corresponding to the row is already being executed or has been completed, in order to avoid repeated triggering.

4. The workflow batch execution and lifecycle fact base construction method as described in claim 1, characterized in that, The specific steps for retrieving historical lifecycle data from the data storage system based on the current row and its associated identifier are as follows: Using at least one of row identifier, tenant identifier, project identifier, or workflow version identifier as the association identifier, retrieve historical lifecycle data fragments associated with the current row from the data storage system; The retrieval methods include structured conditional retrieval and / or semantic retrieval based on vector similarity; the structured conditional retrieval includes conditional queries, join queries and aggregation operations based on the data model supported by the data storage system, and the query objects include at least table row records, cell values, workflow instance records, step execution records and step output records; The data storage system supports the structured condition retrieval through a query engine or indexing mechanism.

5. The workflow batch execution and lifecycle fact base construction method as described in claim 1, characterized in that, The processed context information includes: The retrieved historical lifecycle data fragments are trimmed, aggregated, and normalized to form a context package with a pre-defined structure; The pruning is performed based on at least one of time window, relevance threshold, or context budget constraint; the context package includes at least a set of structured fields, a lifecycle data summary, reference pointers, and constraint information.

6. The workflow batch execution and lifecycle fact base construction method as described in claim 1, characterized in that, The step of submitting the task with the attached context information to the execution engine for parallel execution specifically involves: identifying line groups that can be executed in parallel, submitting tasks in batches, and controlling the number of tasks executed concurrently. The execution engine is a workflow execution engine.

7. The workflow batch execution and lifecycle fact base construction method as described in claim 1, characterized in that, The acquisition of the execution status and execution result of the task is achieved through a real-time communication mechanism; The real-time communication mechanism includes at least one of long-connection communication, polling mechanism, or server-sent events; The step of updating the execution status and execution results to the corresponding table rows includes at least one of batch backfilling of intermediate results, error backfilling, or incremental updates.

8. The workflow batch execution and lifecycle fact base construction method as described in claim 1, characterized in that, The process data includes at least one or more of the following: state transition records, step-level intermediate products, or key decision information; The specific relationships between the persistent data are: the mapping between table row identifiers and workflow instance identifiers, the mapping between workflow instance identifiers and step execution records, and the mapping between step identifiers and product identifiers. An index is created on the mapping relationship to support conditional queries, join queries, and aggregation operations based on the data model supported by the data storage system, by table row, workflow instance, or execution step. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step output records.

9. The workflow batch execution and lifecycle fact base construction method as described in claim 1, characterized in that, It also includes the following steps: Collect feedback signals, which include at least one of manual confirmation, manual correction, manual scoring, automatic verification results, or downstream system feedback. The feedback signal is associated with the corresponding row identifier and workflow instance identifier and stored in the data storage system for quality tracking, auditing or manual verification.

10. A workflow batch execution and lifecycle fact base construction system, characterized in that, include: The pattern mapping module is used to receive the target workflow and extract the input parameter pattern definition from it, and dynamically generate a table column structure based on the input parameter pattern definition. The table column structure includes an input column, a result column, and a status column. The row-level triggering module is used to monitor changes in table row data. When it detects that new or modified data is added to the input column or that the status column is marked as pending execution, it triggers the creation of a background task. The context assembly module is used to retrieve historical lifecycle data from the data storage system based on the current row and its associated identifier before executing the task. The retrieval includes structured conditional retrieval and / or semantic retrieval based on vector similarity. The structured conditional retrieval includes conditional queries, join queries, and aggregation queries based on the data model supported by the data storage system. The query objects include at least table row records, cell values, workflow instance records, step execution records, and step output records, so as to assemble the required context fragments from the data storage system and form context information after processing. A workflow execution engine is used to receive tasks with attached context information and execute them in parallel to obtain the execution status and results of the tasks. An asynchronous backfill module is used to update the execution status to the status column of the corresponding table row and update the execution result to the result column of the corresponding table row; The lifecycle data persistence module is used to persist the table column structure, the table row data, and the process data generated during task execution to the data storage system. The table row data includes at least row records, cell values, and status column values ​​and their version change records. It also establishes a searchable association and index between table row identifiers, workflow instance identifiers, step identifiers, and product identifiers to support context reuse for subsequent tasks. The data storage system is a relational database, a non-relational database, or a hybrid storage system, used to persistently store table data and lifecycle data generated during workflow execution; The data storage system supports structured queries, join queries, and aggregation operations, and has established searchable relationships and indexes.