A production control method and system based on large-scale intelligent agents
By acquiring multi-source data in manufacturing workshops for preprocessing and parsing, and using pre-trained large-scale model agents to generate candidate intentions and perform rapid simulation and formal verification, the real-time performance and security issues in existing technologies are resolved, achieving efficient production control and safety verification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 中视互联(北京)科技有限公司
- Filing Date
- 2026-04-13
- Publication Date
- 2026-06-30
AI Technical Summary
Existing technologies in manufacturing workshops suffer from insufficient real-time performance, weak verifiability, high risk of going live, and difficulty in utilizing short-term time slots to increase production capacity. In particular, they are difficult to guarantee safety and promptly tap into production capacity opportunities when facing high-frequency dynamic changes.
By acquiring and preprocessing multi-source data to generate a time-series state stream, and inputting it into a pre-trained large model agent to generate candidate high-level intentions, the data is parsed and rapidly simulated. A formal verifier is then used for judgment, and finally, control commands are executed in a hardware sandbox to ensure compliance and security.
A traceable chain of evidence, from large model inference to hardware execution, is constructed to ensure that each candidate command undergoes physical state prediction and constraint compliance verification, providing hardware-level security independent of software state and improving the real-time performance and security of production control.
Smart Images

Figure CN122308300A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of intelligent production control technology, specifically relating to a production control method, system, electronic device, and storage medium based on a large-scale intelligent agent model. Background Technology
[0002] In manufacturing workshops, with the widespread deployment of computing power and networked sensors, more and more companies are introducing large-scale pre-trained models and model-based intelligent agent technologies into the production site in order to improve production scheduling efficiency, equipment collaboration capabilities, and anomaly response speed. By combining large models with workshop data, scheduling and collaboration suggestions can be made at the senior management level, providing intelligent decision support for production control, changeover arrangements, and fault handling. In recent years, research and industrial practices have begun to emerge that combine large model decision-making with technologies such as digital twins, edge computing, and online learning, aiming to achieve intelligent control and continuous optimization that is closer to the actual situation on the ground.
[0003] In existing technologies, workshop-level scheduling and multi-agent collaboration problems are computationally complex. As the number of devices, task types, and constraints increase, the computational load for plan generation and conflict coordination grows exponentially. Dynamic factors in the field environment, such as temporary line changes, workpiece placement errors, sensor drift, and equipment failures, cause resource contention, coverage blind spots, or safety risks in the actual execution of the ideal plan calculated in advance. In addition, the output of large models has a certain degree of uncertainty and variability, and its suggestions often lack clear provable safety boundaries and traceable evidence chains, making it difficult to directly use for low-latency, high-reliability field control. Furthermore, many existing solutions rely on offline training and lack the ability to quickly predict and instantly verify behavior within a short time window, thus exhibiting significant deficiencies in real-time performance, verifiability, and robustness to unexpected situations.
[0004] To address these issues, some existing solutions improve decision-making quality by refining optimization algorithms, introducing rule-based filtering, or employing long-term digital twins. Other solutions decentralize key control logic to programmable logic controllers or safety controllers to implement limiting and emergency stop protection at the hardware level. Still others use shadow testing to reduce the risk of directly deploying new strategies, or employ methods such as federated learning and differential privacy to update models while protecting data privacy. However, these improvements often struggle to simultaneously satisfy the close linkage between high-level intelligent planning, short-term verifiable prediction, and hardware-level unbypassable protection. Furthermore, when faced with high-frequency dynamic changes in the workshop, such as micro-time slot utilization, sudden resource conflicts, and sensor anomalies, existing methods still find it difficult to ensure both security and the timely discovery and utilization of short-term capacity improvement opportunities. Summary of the Invention
[0005] In view of the aforementioned existing problems, the present invention is proposed.
[0006] This invention provides a production control method, system, electronic device, and storage medium based on a large model intelligent agent, aiming to solve one or more problems of existing technologies, such as insufficient real-time performance, weak verifiability, high risk of going live, and difficulty in utilizing short time slots to increase production capacity.
[0007] To solve the above problems, the present invention proposes the following technical solution:
[0008] In a first aspect, embodiments of the present invention provide a production control method based on a large-scale intelligent agent, comprising:
[0009] Acquire multi-source data and preprocess it to obtain a time-series state stream;
[0010] The time-series state stream is input into a pre-trained large model agent to obtain a candidate high-level intent set.
[0011] The candidate high-level intent set is parsed to obtain the low-level command sequence;
[0012] A fast simulation is performed on the low-level command sequence to obtain short-time physical state trajectories; the short-time physical state trajectories are then comprehensively scored to obtain a scoring result.
[0013] The formal validator is applied to evaluate the scoring results to obtain a compliance determination result;
[0014] Based on the compliance determination result, the control command to be issued is determined; the control command to be issued is issued to the hardware sandbox for gating processing to obtain the gating processing result.
[0015] As a preferred embodiment, the acquisition and preprocessing of multi-source data to obtain a time-series state stream includes:
[0016] By subscribing to the device controller, sensors, and robot control cabinet using a unified OPC architecture, and writing timestamps at a fixed frequency and caching breakpoint resume at the edge gateway, and using PTP time synchronization to correct for delays, the raw data stream is obtained. The raw data stream is preprocessed, and key fields are retained in the preprocessed raw data stream according to the key resource priority rule. The data stream is then merged according to device ID and timestamp to obtain a time-series state stream.
[0017] As a preferred implementation, the step of inputting the temporal state stream into a pre-trained large model agent to obtain a candidate high-level intent set includes:
[0018] By extracting four types of features from the time-series state flow within a fixed sliding window—equipment occupancy, task queue length, lower limit of safe distance, and alarm count—and constructing instruction templates and injecting current process parameters, candidate high-level intentions are generated by inferring from a pre-trained large model. The confidence level is calculated by normalizing the log-likelihood of each intention and superimposing constraint violation penalties, thus obtaining a set of candidate high-level intentions.
[0019] In a preferred embodiment, parsing the candidate high-level intent set to obtain a low-level command sequence includes:
[0020] The parser obtains the action primitive sequence corresponding to each intent by key-value matching and rule binding of the candidate high-level intent set and external benchmark resources; it obtains the instruction template parameter set by resource binding and mutex lock allocation of the action primitive sequence; and it obtains the low-level command sequence by filling the instruction template parameter set into the instruction template.
[0021] As a preferred embodiment, the production control method further includes:
[0022] Resource binding and mutex lock allocation are applied to the action primitive sequence. The time offset is calculated based on the lock table and the expected start time, and the start time is written back to generate an instruction set carrying resource tokens. Synchronization barriers and sequence relationship annotations are applied to parallel actions. A synchronization barrier identifier is written at each synchronization point and a predecessor-successor flag is generated. The barrier is released after the predecessor command is completed and submits its completion status, resulting in an executable action timeline. Based on the executable action timeline, out-of-bounds pre-checks and corrections are performed on pose, velocity, torque, and region boundaries to complete resource scheduling, timing coordination, and constraint correction of the low-level command sequence.
[0023] As a preferred embodiment, the step of performing fast simulation on the low-level command sequence to obtain short-time physical state trajectories includes:
[0024] Digital twin simulation of short-track edge paths is performed using low-level command sequences: Let the step size ΔWL and number of steps WS, and the time window Tw, in the initial state... Press down Recursively, the physical state sequence and control sequence for steps t=0...WS are obtained, yielding the short-time physical state trajectory, where... The next state vector, Let F be the system state vector at step t, and let F be the state transition function. Let t be the control input vector for step t.
[0025] As a preferred embodiment, the comprehensive scoring of the short-time physical state trajectory to obtain the scoring result includes:
[0026] By calculating key risk indicators for the physical state sequence and control sequence using a pre-constructed risk function, quantifiable risk metrics are obtained, and non-compliant candidates are screened out accordingly. The algorithmic formula for the risk function is defined as follows:
[0027] T1, for the low-level command sequence Order and the current state Short-track digital twin simulation is adopted, and the physical state sequence and control sequence within the future time window are obtained by iterating according to the step size ΔWL. The lower limit of the safe distance is calculated for any device, and the calculated lower limit of the safe distance is normalized to generate a safety margin score PSafe for the executability determination of candidate commands and the synthesis of comprehensive scores.
[0028] T2 involves statistically analyzing the newly added executable time slot set All, filtered from the simulation trajectory within the simulation window by resource exclusivity constraints, synchronization barriers, and safety boundary gating. For each slot j, the parallel availability (Useful) is calculated based on resource token non-conflict and synchronization alignment results. Furthermore, the beat improvement information gain (Beat) is calculated based on the difference between the baseline beat and the predicted beat, generating the capacity gain score (BE). The algorithm formula for calculating the gain score is as follows:
[0029] ,
[0030] Where BE is the gain score, All is the set of time slots, j is the index of the slot in the time slot set, Useful is the parallel availability, and Beat is the beat improvement information gain. As a weighting factor, The summation symbol;
[0031] T3 generates a mass fraction by weighting and combining the acceleration sequence, lower safety distance limit, torque sequence, and planning window set obtained from the simulation. The weighted combination formula is as follows:
[0032] ,
[0033] in Let r be the quality score and r be the decision step index. Here, represents the weighting coefficient for the trajectory smoothness score, and D represents the trajectory smoothness score. Here, E represents the weighting coefficient of the boundary margin fraction. Here, G represents the weighting factor for the load score. H represents the weighting coefficient for the plan overlap score, where H is the plan overlap score.
[0034] Based on the weighted synthesis formula, the energy cost score Eng is generated by normalizing the in-step time and energy consumption and then performing a weighted synthesis.
[0035] By applying hinge aggregation to the signed violations of speed limits, acceleration limits, torque limits, restricted areas, resource exclusivity, and collision distance constraints, a penalty score is obtained and used to synthesize the comprehensive score. The algorithm formula for hinge aggregation is as follows:
[0036] ,
[0037] in Let Ser be the penalty score for decision step r, and r be the index of the decision step. Let u be the summation symbol, U be the constraint set, and Pun be the constraint weight. Let Y be a signed violation function constraining u. Let A be the system state vector at step r. Let B be the control input vector to be evaluated in the r-th step;
[0038] T4, based on all scores from T1 to T3, generates a comprehensive score by weighting each score for determining the candidate command set. The algorithm formula for the comprehensive score is as follows:
[0039] ,
[0040] S represents the overall score, and r represents the decision step index. The weighting coefficients for the safety margin score. For safety margin fractions, The weighting coefficients for the capacity gain fraction. This represents the capacity gain fraction. The weighting coefficient for the quality score. For quality fraction, The weighting coefficient for the energy cost score. Energy cost score The weighting coefficient for the penalty score. As a penalty for low scores, Let r be the constraint weights for decision step r;
[0041] in , , , , , The score is obtained by determining the candidate command set and sorting the candidates from highest to lowest comprehensive score within the set that meets the determination.
[0042] As a preferred implementation, the application of a formal validator to determine the scoring results and obtain a compliance determination result includes:
[0043] By performing stepwise verification on each candidate of the evidence-bearing pass set and its simulated trajectory, signed violations are calculated according to the constraint set, and the positive parts are aggregated to obtain the penalty score. Threshold judgments are applied to the safety margin and comprehensive score, setting a safety margin threshold PSafemin and a comprehensive score threshold Srmax, under the condition that... , And penalty points Generate compliance conclusions. Sr represents the safety margin score and the overall score. For non-compliant candidates, a revised command is generated according to the repair priority. The revised command is re-simulated and the above verification is repeated to obtain the compliance judgment result.
[0044] As a preferred implementation, determining the control command to be issued based on the compliance determination result includes:
[0045] Based on the compliance determination result, the control command to be issued is determined from the corresponding low-level command sequence;
[0046] The control command to be issued is sent to the hardware sandbox for gating processing. When an out-of-bounds error is detected, a limiting or blocking process is performed to generate a gating processing result. The gating processing result is recorded to obtain the control command to be issued.
[0047] As a preferred implementation method,
[0048] Secondly, the present invention provides a production control system based on a large-scale intelligent agent, comprising:
[0049] Data acquisition and processing module: used to acquire multi-source data and preprocess it to obtain a time-series state stream;
[0050] High-level intent generation module: used to input the time-series state stream into a pre-trained large model agent to obtain a set of candidate high-level intents;
[0051] Command parsing module: used to parse the candidate high-level intent set to obtain low-level command sequences;
[0052] Simulation and scoring module: used to perform fast simulation on the low-level command sequence to obtain short-time physical state trajectory; and to perform comprehensive scoring on the short-time physical state trajectory to obtain scoring result;
[0053] Formal verification module: used to apply a formal verifier to judge the scoring results and obtain a compliance judgment result;
[0054] Hardware Sandbox and Execution Module: Used to determine the control command to be issued based on the compliance judgment result; and to issue the control command to be issued to the hardware sandbox for gating processing to obtain the gating processing result; also used to open the shadow channel when the new policy is deployed, perform simulation and verification on the shadow policy and continuously compare its output difference with the current policy, and execute policy switching and generate switching record when the preset switching threshold is met.
[0055] Thirdly, this application provides an electronic device, comprising:
[0056] A processor; and a memory for storing processor-executable instructions; wherein the processor is configured to invoke instructions stored in the memory, which, when executed by the processor, cause the electronic device to perform the method disclosed in the first aspect above.
[0057] Fourthly, this application provides a computer-readable storage medium storing computer program instructions that, when executed by a processor, cause the method disclosed in the first aspect to be implemented.
[0058] Compared with the prior art, this application has the following beneficial effects:
[0059] 1. In existing technologies, the output of large models directly enters the execution chain, lacking provable security boundaries. This invention connects three stages: rapid simulation using short-track digital twins, quantitative scoring using a five-dimensional risk function, and stepwise deterministic verification using a formal verifier. This ensures that each candidate command undergoes physical state prediction and constraint compliance verification before issuance, generating an evidence-based pass set with accompanying simulation trajectories and verification logs. Thus, a secure verification closed loop with a traceable chain of evidence is constructed between large model inference and hardware execution.
[0060] 2. Existing software-level security protections are susceptible to being bypassed. This invention deploys a hardware sandbox at the execution link end, consisting of an independent microcontroller unit connected in series with a safety relay. This sandbox independently performs item-by-item gating checks on speed, acceleration, torque, and region boundaries at the hardware level. If limits are exceeded, it directly limits the amplitude or performs an emergency disconnection. This process does not rely on upper-layer software triggering and is physically unbypassable. Simultaneously, token bucket rate limiting constrains the command frame transmission frequency and bus bandwidth at the hardware level, thus providing a final physical guarantee for execution security independent of the software state, in addition to upper-layer software verification. Attached Figure Description
[0061] Figure 1 A schematic flowchart illustrating the production control method based on a large model intelligent agent provided in an embodiment of the present invention.
[0062] Figure 2This is a schematic framework diagram of a production control system based on a large-scale intelligent agent, provided in an embodiment of the present invention.
[0063] Figure 3 This is a comparison chart of the technical effects of the present invention, in which the black bars represent the present invention and the gray bars represent the prior art. Detailed Implementation
[0064] To make the technical means, creative features, and achieved objectives and effects of this invention easier to understand, the invention is further described below with reference to specific embodiments. However, the following embodiments are merely preferred embodiments of this invention and not all of them. Other embodiments obtained by those skilled in the art based on the embodiments described herein without creative effort are all within the protection scope of this invention. Unless otherwise specified, the experimental methods in the following embodiments are conventional methods, and the materials and reagents used in the following embodiments are commercially available unless otherwise specified.
[0065] Example 1 combined Figure 1 The diagram shown illustrates a production control method based on a large-scale intelligent agent. The specific implementation steps are as follows:
[0066] Acquire multi-source data and preprocess it to obtain a time-series state stream: By collecting multi-source data and preprocessing the collected multi-source data, a time-series state stream is generated;
[0067] Specifically, by subscribing to the device controller, sensors, and robot control cabinet using the OPC Unified Architecture (OPC UA), and writing timestamps at a fixed frequency of 90Hz and caching breakpoint resume at the edge gateway, and using the Precision Time Protocol (PTP) to correct for delays, a raw data stream containing position, speed, acceleration, torque, tasks, and alarms is obtained. The device controller is used to receive upper-level control commands and drive the corresponding production equipment to perform actions according to specified parameters. At the same time, it collects the position, speed, acceleration, torque, and operating status data of the equipment in real time and reports them to the edge gateway through the OPC UA interface, serving as the direct source of the device operating status data in the time-series state stream.
[0068] The sensors include: position sensors, speed sensors, torque sensors, distance sensors, and environmental sensing sensors, which are used to detect the equipment motion status, workpiece position, safety distance, and environmental parameters in the production site in real time, and report the detection data to the edge gateway at a fixed sampling frequency, serving as the source of environmental sensing data and safety-related data in the time-series state stream;
[0069] The robot control cabinet is used to control and monitor the motion trajectory, joint status and end effector action of industrial robots. It outputs the pose, speed, torque and alarm status of each joint of the robot in real time, and reports to the edge gateway through the OPC UA interface, serving as the source of robot running status data and alarm status data in the time-series state stream.
[0070] The raw data stream refers to the data set consisting of equipment operation status data, task status data, and alarm status data collected by the equipment controller, sensors, and robot control cabinet. It is used to characterize the original operating state of the production site at the time of data collection and serves as the data basis for subsequent preprocessing to generate a time-series status stream. Preprocessing of the raw data stream includes: edge-side unified time synchronization, cleaning, and resampling, and using PTP time synchronization to correct transmission delays; resampling at fixed intervals; performing linear interpolation on missing values; and applying median filtering to noise. First, all fields in the preprocessed raw data stream within the current sampling period are read, and each field is marked with priority according to a pre-configured resource importance level table. Second, the available bandwidth and the total number of fields within the current period are calculated. When the total number of fields is within the bandwidth limit... During the sampling period, all data is retained without truncation. When the total number of fields exceeds the bandwidth limit, they are written to the time-series state stream in descending order of priority until the bandwidth limit is reached. The remaining low-priority fields are discarded within the same period. When field conflicts occur within the same priority, fields with higher timestamp update frequency are retained first. If the frequencies are the same, fields with smaller device IDs are retained first, thus ensuring the deterministic retention results when field conflicts occur. After field filtering is completed, the retained fields are merged and aligned according to device ID and timestamp to generate the time-series state stream Flu. Through the above process, the position, speed and torque data of bottleneck equipment on the production line, the occupancy status data of shared fixtures, and the safety distance and alarm data of safety-related sensors are prioritized for retention in each sampling period, providing complete key state inputs for subsequent large model inference.
[0071] The key resource priority rule describes a sorting rule for classifying and prioritizing multi-source data fields according to their resource importance level within each sampling period. Specifically, data fields corresponding to bottleneck equipment, shared fixtures, key workstations, and safety-related sensors are marked as having the highest priority; data fields corresponding to non-bottleneck equipment, auxiliary sensors, and environmental monitoring sensors are marked as having the second highest priority; and redundant status fields and low-frequency update fields are marked as having the lowest priority. In the event of field conflicts or bandwidth constraints, fields are retained in descending order of priority, and low-priority fields are truncated or discarded when resources are insufficient. The key resource priority rule ensures that bottleneck equipment and safety-related data are preferentially written into the time-series state stream within each sampling period, serving as the core state input for subsequent large-scale model inference and simulation scoring.
[0072] The multi-source data refers to the data set consisting of equipment operation data, task scheduling data, environmental perception data, and alarm status data collected by equipment controllers, sensors, robot control cabinets, manufacturing execution systems, and upper-level monitoring systems. It is used to jointly characterize the current equipment status, task status, environmental status, and abnormal status of the production site, and serves as the data foundation for constructing a time-series state flow.
[0073] The aforementioned temporal state stream refers to a continuous temporal data set formed by performing time correction, resampling, missing data completion, anomaly removal, and field merging on the multi-source data based on timestamps and device identifiers. It is used to convert the dispersed multi-source data into a unified state expression that can be directly input into the pre-trained large model agent, serving as the input basis for candidate high-level intent generation.
[0074] The time-series state stream is input into a pre-trained large model agent to obtain a candidate high-level intent set.
[0075] Specifically, based on the time-series state flow, four types of features are extracted from the Flu time-series state flow within a fixed sliding window W=2s: equipment occupancy, task queue length, lower limit of safe distance, and alarm count. An instruction template containing objectives, hard constraints, and prohibition conditions is constructed and injected with the current process parameters. Equipment occupancy characterizes the occupied state and remaining availability of each piece of equipment within the current sliding window, reflecting the resource competition situation on the production floor. This serves as the state basis for the large model agent to determine the direction of resource scheduling and the priority of task allocation when generating candidate high-level intentions. Task queue length characterizes the backlog of tasks to be executed on each piece of equipment, reflecting the load distribution and task backlog at each workstation. This serves as the input basis for the large model agent to identify production line bottlenecks and determine the feasibility of task transfer and parallel deployment. The lower limit of safe distance characterizes the minimum safe distance between each piece of equipment and between equipment and personnel within the current sliding window, reflecting the real-time collision risk level on the production floor. This serves as the input basis for the large model agent to generate candidate high-level intentions. The input basis for applying safety constraints during intent generation and avoiding the generation of scheduling intents with collision risks; alarm count is used to characterize the cumulative number of alarm events triggered by each device and system within the current sliding window, reflecting the frequency of anomalies and the health status of equipment on the production site, and serving as the input basis for the large model agent to identify anomaly risks and avoid issuing high-load scheduling intents to devices with frequent alarms when generating candidate high-level intents; the process parameters refer to the set of parameters pre-set by the production process specifications that describe the execution conditions and quality requirements of the current production task, including at least cycle time, tooling specifications, recipe number, welding current, welding duration, and temperature range; the process parameters are read from the manufacturing execution system at the beginning of each decision cycle, injected into the process parameter field of the instruction template, and concatenated with the state feature input as the process constraint part of the inference input sequence, which is used to enable the pre-trained large model agent to perceive the current product model and process boundary during inference, thereby generating candidate high-level intents that match the current production task;
[0076] By invoking a pre-trained large model agent for inference, the device occupancy, task queue length, safety distance lower limit, and alarm count extracted within the fixed sliding window are concatenated to generate state feature input. The target field, hard constraint field, prohibition condition field, and current process parameter field from the instruction template are read and concatenated with the state feature input in a preset field order to generate an inference input sequence. This inference input sequence is then input into the pre-trained large model agent, whose decoder outputs intent markers, resource scheduling markers, execution order markers, and time trigger markers step-by-step according to an autoregressive generation method, reaching a preset threshold. Decoding stops when the end marker is reached or the preset output length is reached, generating candidate high-level intents. Log-likelihood normalization is performed on the original output scores of each intent, and the constraint violation items corresponding to the intent are read. The constraint violation items are weighted and summed according to the preset penalty weights to obtain the constraint violation penalty value. The normalized log-likelihood value is then subtracted from the constraint violation penalty value to obtain the confidence score of each intent. The intents are sorted from high to low according to their confidence scores, and intents with confidence scores higher than a preset threshold or within a preset retention range are retained to obtain a set of candidate high-level intents with confidence scores, along with their triggering conditions and effective times.
[0077] The construction of the instruction template containing objectives, hard constraints, and prohibition conditions, and the injection of current process parameters, is achieved by structuring production objectives, hard constraints, such as lower safety distance limits, upper torque limits, restricted areas, exclusive resource access, and prohibition conditions, and injecting current process parameters, such as cycle time, tooling specifications, and recipe numbers, thereby obtaining a standardized instruction template that can be directly filled and verified by a pre-trained model.
[0078] The instruction template refers to a predefined structured text framework, which includes target fields, hard constraint fields, prohibition condition fields, process parameter fields, and action parameter fields. It is used to organize production targets, safety constraints, and process parameters in a pre-defined field order into a standardized input form that can be directly filled and verified by a pre-trained large model agent. It also serves as a structural template for parameter filling in the subsequent command parsing stage, and as an intermediate carrier for the conversion from candidate high-level intents to low-level command sequences.
[0079] The pre-trained large-scale intelligent agent refers to a generative decision-making model that has undergone parameter training based on historical production control data, task scheduling corpora, and control command samples, and is used to generate a set of candidate high-level intentions based on the temporal state flow during the online operation phase. The pre-trained large-scale intelligent agent includes at least an input encoding layer, a context modeling layer, and an intention decoding layer; wherein, the input encoding layer is used to convert the temporal state flow and command templates into a unified vector representation, the context modeling layer is used to extract the state dependencies and constraints between different time steps, and the intention decoding layer is used to output the high-level control intention corresponding to the current state.
[0080] The training data for the pre-trained large-scale intelligent agent is constructed from historical production logs, equipment operation records, task scheduling records, and manually confirmed control and disposal records. Historical production logs include at least time-series data on equipment status, task status, environmental status, and alarm status. Equipment operation records include at least equipment action commands, execution order, and execution results. Task scheduling records include at least task allocation results, resource occupancy order, and task switching time. Manually confirmed control and disposal records include at least scheduling decisions, resource transfer decisions, shutdown decisions, or obstacle avoidance decisions confirmed by human engineers or existing control systems under the corresponding production state. The training samples are constructed as follows: continuous state segments are extracted from historical production logs using a fixed-length time window. The corresponding equipment status, task status, environmental status, and alarm status are used as state input samples. The manually confirmed control and disposal records or existing execution records corresponding to these state input samples are read, and the task allocation results, resource occupancy order, action trigger time, and control constraint combinations are used as target intent labels, thereby generating training sample pairs corresponding to the state input and high-level intent labels.
[0081] When training the pre-trained large model agent, firstly, discretization, numerical normalization, and identifier mapping are performed on each field in the training samples to obtain standardized training samples; then, the standardized training samples are input into the encoding layer for vectorization representation; subsequently, the vectorization result is input into the context modeling layer to extract time-related features and constraint-related features; then, the intent decoding layer outputs the high-level intent prediction result; the high-level intent prediction result is compared with the target intent label, and the training loss is calculated; the model parameters are updated using gradient backpropagation based on the training loss; when the validation set loss no longer decreases within a consecutive preset number of rounds, training is stopped and the model parameters are fixed, resulting in the pre-trained large model agent. The training loss includes at least intent type classification loss, resource allocation loss, execution order loss, and time triggering loss.
[0082] After training is completed, the converged model parameters are solidified and deployed on edge servers or industrial control servers. During the online operation phase, the model parameters are read but not updated. The real-time generated time-series state stream is concatenated with the current instruction template and input into the pre-trained large model agent. The pre-trained large model agent outputs multiple candidate high-level intentions and determines the set of candidate high-level intentions according to the output probability value or log-likelihood value.
[0083] The aforementioned candidate high-level intents are generated by performing inference calculations on the time-series state stream Flu and the instruction template, resulting in a set of candidate high-level intents containing task allocation, resource occupancy order, and time trigger points, which are used as policy inputs for subsequent decoding and verification.
[0084] The candidate high-level intent set is parsed to obtain the low-level command sequence;
[0085] The aforementioned candidate high-level intent set refers to an ordered set of multiple candidate control intents output by a pre-trained large model agent after inference based on the current temporal state flow and instruction template. Each candidate high-level intent includes at least the task allocation direction, resource occupation order, time trigger point, and confidence score. The intents in the set are arranged from high to low confidence scores and are accompanied by corresponding triggering conditions and effective times. The candidate high-level intent set is used to represent the various candidate judgments of the large model agent for subsequent scheduling decisions in the current production state, and serves as the input source for the command parsing module to perform key-value matching and rule binding.
[0086] Specifically, based on the candidate high-level intent set, the parser obtains the action primitive sequence corresponding to each intent by performing key-value matching and rule binding on the candidate high-level intent set and external benchmark resources. Specifically, by performing resource binding and mutex lock allocation on the action primitive sequence, an instruction template parameter set carrying device ID, fixture ID and resource occupancy identifier is obtained.
[0087] By filling the instruction template parameter set into the instruction template, a parameterized low-level command sequence is generated. The parameterized low-level execution command refers to the command set formed after parsing the candidate high-level intent set. The parameterized low-level execution command refers to the command entry with determined field values formed by filling the instruction template parameter set into the instruction template based on the low-level command sequence. Resource binding and mutex lock allocation are applied to the action primitive sequence. The time offset is calculated based on the lock table and the expected start time, and the start time is written back to generate an instruction set carrying a resource token. The synchronization barrier refers to the synchronization control identifier written at the convergence node of parallel actions. It is used to constrain the execution boundary of multiple parallel commands. It stipulates that the barrier can only be released and the subsequent command can be allowed to start execution after all predecessor commands have completed and submitted the completion status. This ensures the timing consistency of multi-device collaborative actions and serves as the implementation mechanism for synchronization points in the executable action timeline. By applying synchronization barriers and sequential relationship annotations to parallel actions, a synchronization barrier identifier is written at each synchronization point and a predecessor-follower sequence is generated. Following the marking, the barrier is released after the preceding command is completed and submits its completion status, resulting in an executable action timeline containing clear synchronization points and execution order. Based on this executable action timeline, out-of-bounds pre-checks and corrections are performed on pose, velocity, torque, and region boundaries to complete resource scheduling, timing coordination, and constraint correction for the low-level command sequence. Resource scheduling refers to using exclusive token allocation and mutex lock management for the resource requests and time intervals of each command in the low-level command sequence to ensure that the same resource is not concurrently occupied by multiple commands within the same time period. Timing coordination refers to using synchronization barriers and sequence markings for parallel actions in the low-level command sequence, stipulating that the barrier can only be released after the preceding command is completed and submits its completion status, ensuring the consistency of the execution order and timing of multi-device collaborative actions. Constraint correction refers to performing out-of-bounds pre-checks on the pose, velocity, torque, and region boundaries of each command in the low-level command sequence according to the executable action timeline, and sequentially applying numerical limits and path convergence processing to the out-of-limit fields to ensure that each command field meets the hard constraints.
[0088] The executable action timeline refers to the action execution sequence with time constraints formed by arranging each low-level command in an orderly manner according to the predecessor-successor relationship and time offset after completing resource token allocation, mutex lock management and synchronization barrier labeling. It includes at least the start time, resource token identifier, synchronization barrier identifier and predecessor-successor dependency relationship of each command, which is used to clarify the execution order and coordination relationship of each command, and serve as the input basis for out-of-bounds pre-detection and correction and subsequent rapid simulation.
[0089] The action primitive sequence refers to the ordered set of atomic action entries obtained after key-value matching and rule binding for each candidate high-level intent. Each primitive contains at least action type, device identifier, target workstation, resource type and time trigger information. It is used to decompose the abstract high-level intent into the smallest execution unit that can be further executed for resource binding and parameter filling, and serves as the input basis for generating the instruction template parameter set.
[0090] The resource binding refers to the parser establishing a one-to-one correspondence between each action step and the corresponding device resources, tooling resources and execution resources based on the device identifier, fixture identifier and resource type identifier in the action primitive sequence. This is used to clarify the target resources occupied by each low-level command during execution, serving as the resource basis for subsequent mutex lock allocation and timeline generation.
[0091] The mutex lock allocation resolver generates corresponding lock identifiers for each action step that has completed resource binding, according to the resource identifier that cannot be concurrently occupied within the same time period, and writes the lock identifiers into the corresponding command entries, thereby restricting multiple actions from occupying the same resource at the same time, in order to avoid resource conflicts and serve as a constraint basis for time offset calculation.
[0092] The command sequence refers to the low-level command entries formed after filling the instruction template parameter set into the instruction template, where each command field already has a definite parameter value. The parameterized low-level execution command includes at least pose target, path point, fixture action, duration, speed limit, acceleration limit, and torque limit, and is used to convert candidate high-level intentions into specific command content that can be further executed for resource scheduling, time binding, and constraint correction.
[0093] The parser performs key-value matching and rule binding between the candidate high-level intent set and external benchmark resources. Specifically, it reads the generated device identifier field, action type field, target workstation field, resource type field, and time trigger field from each candidate high-level intent in a preset field order, and writes the read field values into the key extraction buffer in sequence to generate a key set corresponding to each candidate high-level intent. The key set refers to the combination of fields consisting of device identifier, action type, target workstation, resource type, and time trigger information, which is used for subsequent resource matching and rule binding.
[0094] The device identifier field is read from each device record in the device capability table, and the value of the device identifier field is written into the primary key field as a unique identifier. Then, the storage location of the device record is registered in the index area of the device capability table using the primary key field value as an index item, thus constructing the primary key and index of the device capability table. The primary key is the device identifier, and the index is used to directly locate the corresponding device record based on the device identifier.
[0095] By reading the action type field, target workstation field, and resource type field of each primitive record in the control primitive library, and concatenating the values of the action type field, target workstation field, and resource type field in a preset concatenation order, a composite key corresponding to the control primitive library is generated; wherein, the composite key is a combination key composed of action type, target workstation, and resource type, which is used to uniquely identify the action primitive record corresponding to the current candidate high-level intention.
[0096] By reading the device identifier field, action type field, resource type field, and region type field of each template record in the constraint template library, and concatenating the values of the device identifier field, action type field, resource type field, and region type field according to a preset field order, a composite key corresponding to the constraint template library is generated. The composite key is a combination key composed of device identifier, action type, resource type, and region type, which is used to uniquely identify the constraint template record that matches the current candidate high-level intent.
[0097] A hash index is used for constant-time retrieval. Specifically, the hash calculation is first performed on the primary key and each composite key to generate a corresponding hash address. Then, the hash address is used as the retrieval entry point to read the corresponding record position in the device capability table index area, the control primitive library index area, and the constraint template library index area. When the target record position is read, the original record corresponding to that position is further read to obtain the corresponding resource entries. The resource entries refer to the device capability record, control primitive record, and constraint template record that match the current candidate high-level intent.
[0098] The key set is matched with the primary key and index for equality matching. Specifically, the device identifier field value is first read from the key set, and then the device identifier field value is compared with each primary key field value in the device capability table. When the two are completely consistent, the index position corresponding to the primary key field value is read, and the corresponding device record is read from the index position to obtain the device capability record. The device capability record refers to the data record used to describe the range of actions and operating boundaries of the target device, including at least the speed limit, acceleration limit, torque limit, travel range and device type, which is used to provide capability boundary basis for subsequent instruction template parameter filling and out-of-bounds pre-detection.
[0099] The key set and composite keys are matched for equality. Specifically, the action type field, target workstation field, resource type field, and equipment identifier field are read from the key set. Then, the fields are concatenated according to the same field order as the composite keys in the control primitive library and the constraint template library to generate a combination value to be matched. The combination value to be matched is then compared item by item with each composite key already constructed in the control primitive library and the constraint template library. When the comparison results are completely consistent, the corresponding primitive record and template record are read to obtain the hard constraint set. The hard constraint set refers to the set of constraint records used to limit the execution boundary of the current candidate high-level intent, including at least the lower limit of safety distance, the area prohibition boundary, the resource exclusivity requirement, the synchronization point constraint, and the upper limit of equipment operation, which are used for subsequent constraint injection, timeline construction, and formal verification.
[0100] By binding rules to the matched device capability records, resource entries, and hard constraint sets, specifically:
[0101] Rule R1 capability filling refers to the parser reading the upper limit of speed, upper limit of acceleration, upper limit of torque and travel range in the device capability record, and writing the corresponding parameter bits according to the predefined field correspondence in the instruction template, so that the current command has the same operating boundary as the target device;
[0102] Rule R2 primitive expansion refers to the parser reading the pose target, path point, fixture action and duration from the control primitive record obtained by matching the key set, and writing it into the corresponding fields according to the action parameter order of the instruction template, thereby converting the candidate high-level intent into executable action parameter content.
[0103] Rule R3 constraint injection refers to the parser reading the lower limit of safe distance, the restricted area boundary, the resource exclusive identifier and the synchronization point from the constraint template record and writing them into the constraint field of the instruction template, so that the generated command carries the safe boundary and coordination restriction at the formation stage.
[0104] Rule R4 time binding means that the parser uses the start time of the candidate high-level intent as the time base, and combines the end time of the preceding action and the synchronization point constraint to determine the start time of the current action. At the same time, it records the dependencies between actions, thereby providing a time basis for the generation of the timeline of subsequent executable actions.
[0105] When performing consistency checks on the filled instruction template, first check if any fields are missing, then check if the numerical range, data type, and unit of each field are consistent with the template definition, and further check if there are any conflicts between the action parameters and constraint parameters. When the check passes, a structured field set is output according to the field order of the instruction template to generate a structured low-level execution command. When the check fails, the missing key list or conflict key list is returned, and the matching and binding are re-executed according to a fixed priority order to the same type of device entries until an executable command and timeline entry are obtained.
[0106] The external benchmark resources include a device capability table, a control primitive library, and a constraint template library. The device capability table provides the operational capability boundaries of the target device, the control primitive library provides the action parameter content, and the constraint template library provides safety constraints and cooperative constraints.
[0107] The low-level command sequence refers to an ordered set of low-level command entries generated after key-value matching, rule binding, and instruction template parameter filling of the candidate high-level intent set. Each command has definite parameter values for each command field. Each command includes at least device ID, pose target, path point, fixture action, duration, speed limit, acceleration limit, torque limit, resource occupation identifier, and synchronization point information. It also undergoes boundary pre-checking and correction to ensure that the values of each field meet the hard constraints. The low-level command sequence is used to convert the candidate high-level intent into specific execution content that can be directly input for rapid simulation and formal verification, serving as the direct operation object for simulation scoring and subsequent command issuance.
[0108] Specifically, the capability filling is used to write device operation boundary parameters, the primitive expansion is used to write action execution parameters, the constraint injection is used to write safety and cooperation restriction parameters, and the time binding is used to determine the action start time and action dependencies.
[0109] The aforementioned boundary pre-detection and correction process involves the parser comparing the boundary of each low-level command execution path region with the upper limit of travel and speed in the equipment capability table and the forbidden zone of the constraint template, and making a judgment. When a boundary violation is detected, numerical limiting is applied sequentially, for example, truncating the command value to the corresponding upper or lower limit, or the path is brought inward, for example, bringing the out-of-bounds path point back into the safe boundary, thereby obtaining a corrected command that meets the hard constraints and can be directly entered into simulation and verification.
[0110] A fast simulation is performed on the low-level command sequence to obtain short-time physical state trajectories; the short-time physical state trajectories are then comprehensively scored to obtain a scoring result.
[0111] Specifically, based on the low-level command sequence, a short-track digital twin simulation of the low-level command sequence is performed on the edge side: Let the step size ΔWL and the number of steps WS, and the time window Tw, be set in the initial state. Press down Recursively, the physical state sequence and control sequence for steps t=0....WS are obtained, which can generate short-time physical state trajectories, where... The next state vector, Let F be the system state vector at step t, and let F be the state transition function. Let t be the control input vector, and let t be the initial state. The system state vector with the latest timestamp in the current time-series state stream Flu is taken as the starting state input for the short-track digital twin simulation. A risk function is constructed to calculate the key risk indicators of the physical state sequence and control sequence, obtain quantifiable risk measures, and screen out non-compliant candidates accordingly. This can prevent dangerous actions and ensure execution safety before the command is issued. The short-time physical state trajectory refers to the continuous change of physical state formed within a preset short time window after the low-level command sequence is rapidly simulated. The short-time physical state trajectory is used to characterize the state evolution trend of the corresponding low-level command sequence before execution and serves as the trajectory basis for determining the executability, risk level, and scheduling value of the command in the subsequent comprehensive evaluation.
[0112] The physical state sequence refers to the ordered set of system state vectors corresponding to each simulation step t=0....WS within a preset time window Tw, starting from the initial state and progressively recursively advancing at a fixed step size ΔWL during short-track digital twin simulation. Each system state vector contains at least the equipment position, velocity, acceleration, torque, and safety distance data for the current step. It is used to characterize the evolution of the physical execution state of the low-level command sequence within the preset short time window and serves as the source of state data for calculating safety margin score, quality score, and constraint penalty score.
[0113] The control sequence refers to the ordered set of control input vectors for each simulation step t=0....WS that corresponds step-by-step to the physical state sequence during the short-track digital twin simulation. Each step's control input vector contains at least the speed command, acceleration command, and torque command issued to each device in the current step. It is used to characterize the actual control input content of the low-level command sequence in each step within a preset short time window, and together with the physical state sequence, it serves as the source of input data for calculating each scoring item in the risk function and verifying the constraint violation of the formal checker.
[0114] The aforementioned short-track digital twin simulation refers to a simulation method that uses short-track digital twin recursive calculation on the low-level command sequence at the edge side to complete the physical state prediction within a preset short time window with a fixed step size. Unlike offline long-term simulation, the fast simulation only covers a limited number of steps within the current decision window to meet the constraints of computational latency on the online decision link, and pre-verifies the execution security and feasibility before the command is issued, serving as a data source for comprehensive scoring and formal verification.
[0115] The algorithmic formula for the risk function is defined as follows:
[0116] T1, by comparing the low-level command sequence Order with the current state Short-track digital twin simulation is adopted, and the physical state sequence and control sequence within the future time window are obtained by iterating according to the step size ΔWL. The lower limit of the safe distance is calculated for any device, and the calculated lower limit of the safe distance is normalized to generate a safety margin score PSafe for the executability determination of candidate commands and the synthesis of comprehensive scores.
[0117] The safety margin score PSafe is a quantitative index obtained by normalizing the lower limit of the safety distance of any device in any step within the simulation window. The value ranges from 0 to 1. The higher the value, the more sufficient the safety distance between devices during the execution of the candidate command and the lower the risk of collision. The lower the value, the greater the risk of narrowing the safety distance or even violation. PSafe is used to determine the executability of the candidate command and serves as the safety dimension input for the comprehensive score synthesis.
[0118] T2 involves statistically analyzing the set All of newly executable time slots selected from the simulation trajectory within the simulation window by resource exclusivity constraints, synchronization barriers, and safety boundary gating. For each slot j, the parallel availability (Useful) is calculated based on resource token non-conflict and synchronization alignment results. Furthermore, the beat improvement information gain (Beat) is calculated based on the difference between the baseline beat and the predicted beat, generating a capacity gain score (BE) to measure the candidate's contribution to throughput improvement and participate in the comprehensive scoring synthesis. The algorithm formula for calculating the gain score is as follows:
[0119] ,
[0120] Where BE is the gain score, All is the set of time slots, j is the index of the slot in the time slot set, Useful is the parallel availability, and Beat is the beat improvement information gain. As a weighting factor, The summation symbol;
[0121] T3 generates a quality score by weighting and synthesizing the acceleration sequence, lower safety distance limit, torque sequence, and planning window set obtained from the simulation. This quality score is used for trajectory quality measurement and comprehensive score synthesis within constraints. The weighted synthesis formula is as follows:
[0122] ,
[0123] in Let r be the quality score and r be the decision step index. Here, represents the weighting coefficient for the trajectory smoothness score, and D represents the trajectory smoothness score. Here, E represents the weighting coefficient of the boundary margin fraction. Here, G represents the weighting factor for the load score. H represents the weighting coefficient for the plan overlap score, where H is the plan overlap score.
[0124] Based on the weighted synthesis formula, by normalizing the in-step time and energy consumption and performing synthesis weighting, an energy cost score Eng is generated for the synthesis of the comprehensive score.
[0125] By hinge aggregation of signed violations of constraints such as speed limit, acceleration limit, torque limit, restricted area, resource exclusivity, and collision distance, a penalty score is obtained and used to synthesize the comprehensive score. The algorithm formula for hinge aggregation is as follows:
[0126] ,
[0127] in Let Ser be the penalty score for decision step r, and r be the index of the decision step. Let u be the summation symbol, U be the constraint set, and Pun be the constraint weight. Let Y be a signed violation function constraining u. Let A be the system state vector at step r. Let B be the control input vector to be evaluated in the r-th step;
[0128] T4, based on all scores from T1 to T3, generates a comprehensive score by weighting each score for determining the candidate command set. The algorithm formula for the comprehensive score is as follows:
[0129] ,
[0130] S represents the overall score, and r represents the decision step index. The weighting coefficients for the safety margin score. For safety margin fractions, The weighting coefficients for the capacity gain fraction. This represents the capacity gain fraction. The weighting coefficient for the quality score. For quality fraction, The weighting coefficient for the energy cost score. Energy cost score The weighting coefficient for the penalty score. As a penalty for low scores, Let r be the constraint weights for decision step r;
[0131] in , , , , , ≥0 and the weight sum is 1; by making a judgment on the candidate command set and sorting the candidate commands in the set that meets the judgment from high to low according to the comprehensive score, the candidate command ranked first is selected, and the short-term physical state trajectory is comprehensively scored. Around the five scoring items of safety margin, production capacity gain, trajectory quality, energy consumption cost and constraint penalty, the corresponding state quantities in the short-term physical state trajectory are quantified and assigned values respectively, and synthesized according to the preset weight to generate the scoring result.
[0132] After the candidate command set is judged through the comprehensive scoring stage with evidence, a subset of candidate commands whose comprehensive scores meet the preset threshold conditions are selected. Each candidate command is accompanied by its corresponding simulation trajectory, scores of each item and scoring synthesis process data as evidence, which is used to support the formal verifier to perform step-by-step verifiable compliance judgment on each candidate command and to serve as the source of input data for the formal verification stage.
[0133] The scoring results are the evaluation results generated after the short-time physical state trajectory is comprehensively scored according to preset scoring items and preset weights. They are derived from the output of the corresponding candidate command after assigning values to the safety margin, capacity gain, trajectory quality, energy consumption cost and constraint penalty within a preset short time window and then weighting and synthesizing them. The scoring results are used as the input for the formal verifier to perform compliance judgment, to characterize the degree of availability and priority of the current candidate command before execution, and to incorporate the scoring results into the screening process of subsequent control commands to be issued.
[0134] In one embodiment, the risk function further includes:
[0135] Taking a welding workshop for automotive parts as an example, the workshop includes 3 welding robots (numbered R1, R2, and R3), 2 shared conveyor belts (numbered C1 and C2), and 1 set of shared fixtures (numbered F1). Within a certain decision cycle, the large model agent generates three candidate high-level intentions based on the current temporal state flow: Intention I1 (R1 takes priority to preempt F1 to complete the welding of the current workpiece), Intention I2 (R2 and R3 work in parallel and share the conveyor belt on C2), and Intention I3 (R1 and R2 complete two welding processes in sequence and then hand over to F1). After parsing the low-level command sequence of the above three candidate intentions, the process enters the rapid simulation and risk function calculation stage. The calculation process of the risk function is as follows:
[0136] Regarding the T1 safety margin calculation, when performing short-track digital twin simulation of the low-level command sequence corresponding to intention I1, the simulation results show that during the preemption of F1, the safe distance between R1 and R3, which is performing the closing maneuver, narrows to below the preset lower limit in the third step, resulting in a low safety margin score PSafe after the safety distance is normalized. In the simulation trajectory corresponding to intention I2, the motion paths of R2 and R3 are independent, and the safe distance remains above the preset lower limit throughout the entire process, resulting in a high safety margin score PSafe. In the simulation trajectory corresponding to intention I3, R1 and R2 are executed sequentially, with no risk of concurrent collision, resulting in the highest safety margin score PSafe among the three.
[0137] Regarding the T2 capacity gain calculation, when statistically analyzing the time slots within the simulation window, Intent I2, due to the parallel operation of R2 and R3, identified two new executable time slots in the simulation trajectory. Furthermore, the resource tokens of the two robot motion paths did not conflict, and the synchronization alignment results met the constraints, resulting in a high parallel availability (Useful). Compared to the baseline beat, Intent I2's predicted beat was shorter, the beat improvement information gain (Beat) was positive, and the capacity gain score (BE) was the highest among the three. Intent I1 and Intent I3, both containing resource waiting or serial waiting steps, had relatively low capacity gain scores (BE).
[0138] Regarding the T3 mass score calculation, Intent I3, due to the serial execution of the two welding processes, has a smooth acceleration sequence, small torque fluctuations, sufficient safety margin, and no overlapping planning windows, resulting in superior trajectory smoothness, boundary margin, and load scores, and the highest mass score among the three. Intent I2, due to the parallel high-speed movement of R2 and R3, has a steep acceleration sequence in some steps, leading to a decrease in trajectory smoothness score. Intent I1, due to the preemptive action of R1 causing local torque to exceed the rated range of the boundary margin, has the lowest mass score.
[0139] Regarding the constraint penalty calculation, Intention I1 exhibited a safety distance violation in step 3, and its penalty score after hinge aggregation was the highest among the three; Intention I2 and Intention I3 did not exhibit any constraint violations throughout the entire simulation trajectory, and their penalty scores were both zero.
[0140] For the T4 comprehensive score synthesis, based on the above sub-item scores, Intent I2 has a significant advantage in capacity gain score but a lower safety margin score than Intent I3. Intent I3 is better than Intent I2 in both safety margin and quality score but has a lower capacity gain score. Intent I1 is ranked last in the comprehensive score due to its high penalty score. Under the current weight configuration, Intent I3 ranks first in the comprehensive score and enters the formal verification stage with its simulation trajectory and sub-item score data as evidence pass set. Intent I1 is judged as a non-compliant candidate and is screened out in the comprehensive score stage because it ranks last in the comprehensive score and its penalty score is not zero.
[0141] Through the risk function calculation described above, the system identifies and excludes intention I1 from the collision risk before issuing the command. Based on the comprehensive score, intention I3 with better safety and trajectory quality is selected between intention I2 and intention I3 for subsequent formal verification. This achieves quantitative screening and sorting of candidate commands while ensuring execution safety.
[0142] The formal validator is applied to evaluate the scoring results to obtain a compliance determination result;
[0143] The formal validator refers to a verification module that performs step-by-step deterministic verification of candidate commands and their simulation trajectories based on a predefined set of constraints. It includes at least a constraint violation calculation unit, a threshold determination unit, and a repair generation unit. The constraint violation calculation unit is used to calculate signed violations according to the constraint set and aggregate penalty scores. The threshold determination unit is used to perform threshold comparison between the safety margin and the comprehensive score and generate a compliance or non-compliance conclusion. The repair generation unit is used to generate revised commands for non-compliant candidates according to repair priorities. After the comprehensive score is completed, the formal validator performs deterministic compliance verification on the evidence-based pass set, serving as the final security assurance step before command issuance.
[0144] Specifically, by performing step-by-step verification on the evidence-bearing pass set and its simulated trajectory, the signed violation is calculated according to the constraint set, and the positive part is aggregated to obtain the penalty score; by using threshold judgment for safety margin and comprehensive score, a safety margin threshold PSafemin and a comprehensive score threshold Srmax are set, and the penalty score is obtained when the following conditions are met. , And penalty points The system generates compliance conclusions. For non-compliant candidates, it generates corrected commands according to the repair priority (time offset, velocity scaling, acceleration scaling, trajectory convergence). The repair priority refers to the order in which the formal validator attempts repair operations on non-compliant candidate commands, executed in the order of (time offset, velocity scaling, acceleration scaling, trajectory convergence). Specifically: first, it attempts time offset repair, that is, shifting the start time of the command without changing the command parameter values to avoid resource conflicts or synchronization point violations; if it is still non-compliant after time offset, it attempts velocity scaling repair, that is, proportionally reducing the velocity command in the command to within the constraint range to eliminate the velocity upper limit violation; if it is still non-compliant after velocity scaling, it attempts acceleration scaling repair, that is, proportionally reducing the acceleration command in the command to within the constraint range to eliminate the acceleration upper limit violation; if it is still non-compliant after acceleration scaling, it attempts... Trajectory retraction repair involves shrinking path points in the command that exceed the area boundary or the lower limit of the safety distance towards the safety boundary to eliminate area entry restrictions or collision distance violations. The above four types of repair operations are attempted sequentially according to priority. After each attempt, the revised command is re-simulated and verified until a compliance conclusion is generated or all repair operations fail to bring the command to a compliant state. If all repair operations are non-compliant, a rollback instruction is generated and a manual confirmation request is initiated. The design principle of the repair priority is to prioritize the repair method with the least impact on command parameters, so as to preserve the execution intent of the original command as much as possible while ensuring compliance. By re-simulating the revised command and repeating the above verification, a conclusion is obtained as to whether the repair is compliant or still non-compliant. For those still non-compliant, a rollback instruction is generated and a manual confirmation request is initiated, generating a compliance judgment result and corresponding command and verification logs for hardware gating and online control.
[0145] The comprehensive score Sr is a single scalar score obtained by weighting the safety margin score, capacity gain score, quality score, energy consumption cost score and penalty score at the current decision step r according to preset weights. The higher the value, the better the comprehensive performance of the candidate command in terms of safety, capacity contribution, trajectory quality and constraint compliance. Sr is used to sort the candidate command set that has passed the initial screening. The candidate command ranked first, along with its simulation trajectory and sub-scores, enters the formal verification stage.
[0146] The compliance determination result refers to the conclusion data output by the formal validator after performing step-by-step verification of each candidate with evidence through the pass set. It includes three states: compliant, compliant after repair, and non-compliant, as well as their corresponding candidate command entries and verification logs. Among them, the command entries in the compliant and compliant after repair states serve as the source of control commands to be issued, while the command entries in the non-compliant state are accompanied by rollback instructions and manual confirmation requests. The compliance determination result is used to provide deterministic security for execution before the command is issued, and serves as the input basis for hardware sandbox gating processing and policy online evaluation.
[0147] Based on the compliance determination result, the control command to be issued is determined; the control command to be issued is issued to the hardware sandbox for gating processing to obtain the gating processing result;
[0148] The control commands to be issued refer to the command entries in the low-level command sequence that have been determined to be compliant or have been repaired to be compliant by the formal verifier, and which are to be issued to the hardware sandbox for execution via the fieldbus according to the compliance determination results. The control commands to be issued undergo framing and signature processing before issuance. The frame structure contains the command ID, timestamp, and parameter upper limit, which is used to support the hardware sandbox in verifying the source of the command and the parameter range during the gating process. The control commands to be issued are the control instructions that are finally confirmed to be executable after the entire decision-making chain has undergone multi-source data preprocessing, large model inference, command parsing, rapid simulation scoring, and formal verification. They serve as the direct input for the hardware sandbox gating process.
[0149] The aforementioned hardware sandbox refers to an independent hardware verification device deployed between the upper-level control software and the actuator driver. It consists of an independent microcontroller unit connected in series with a safety relay and is used to perform unbypassable real-time gating checks on all issued command frames. The hardware sandbox independently samples encoder position, speed, and torque data and calculates acceleration within a fixed period. It compares the acceleration with preset upper limits and area boundary constraints. If any limit is exceeded, it directly performs amplitude limiting or emergency disconnection. This process does not depend on instructions from the upper-level software and cannot be bypassed by any upper-level software. The hardware sandbox blocks unauthorized bypasses through physical disconnection, loop self-testing, and anti-jump detection, serving as the final hardware guarantee for security in the production control link.
[0150] The gating process refers to the process by which the hardware sandbox, after receiving a command frame from the upper layer, compares the instruction values contained in the command frame with the real-time sampled data item by item within a fixed period and performs forced limiting or blocking. Specifically, this includes: comparing the encoder position, speed, torque, and differentially calculated acceleration with a preset upper limit using an inequality check; if the limit is exceeded, the corresponding instruction value is directly truncated to the preset upper limit; checking the token bucket frequency of the command frame; if there are insufficient tokens, the frame is sent later and early frames are rejected by a hardware timer; and checking the bus bandwidth using a sliding window byte count; if the threshold is exceeded, non-critical frames are queued or discarded. All of the above checks are performed independently at the hardware level and are not affected by the upper-layer software state, thereby physically blocking outbound and overload commands.
[0151] The gating processing result refers to the output data set generated after the hardware sandbox completes the gating processing. It includes the execution command entries that have passed the full gating check, the gating check conclusions item by item, the violation flag, the command ID, the timestamp, and the execution confirmation information. Among them, the execution command entries that have passed the gating are sent back to the upper-level controller to complete the execution confirmation. The violation flag and the gating check conclusion, together with the execution snapshot, are signed and stored in sequence to generate monitoring and audit data for policy deployment evaluation, compliance traceability, and fault analysis. The gating processing result, as the final output of the hardware sandbox execution stage, signifies that the control command has completed an unavoidable security verification at the physical execution level.
[0152] Specifically, by framing and signing the compliant low-level command sequence, including command ID, timestamp, and parameter upper limit, the data is sent from the fieldbus to an unavoidable hardware sandbox. The hardware sandbox samples encoder position, speed, and torque at fixed intervals and calculates acceleration differentially. By comparing the preset upper limit with the area boundary inequality, a gated check result is obtained, and if any limit is exceeded, the amplitude is directly limited, truncating the command value to the corresponding upper limit. By applying token bucket rate limiting to downlink command frames, setting the bucket capacity Vlu and the token generation rate R, tokens are deducted from each frame by the number of bytes. If there are not enough tokens, the frame is sent later. The hardware timer rejects early frames, and a sliding window byte counter is used to compare the bus bandwidth with the threshold. If the threshold is exceeded, non-critical frames are queued or discarded. This results in deterministic rate limiting of frequency and bandwidth, which can physically block overload and out-of-bounds commands. Passing commands are executed, and execution confirmation is sent back to the upper-level controller. By signing the execution snapshot, violation flag, and gating results and storing them in the order of return, monitoring and audit data are generated for traceability and online evaluation.
[0153] The token bucket rate limiting mechanism refers to a rate limiting mechanism that constrains the transmission frequency and bus bandwidth of downlink command frames by using a preset bucket capacity Vlu and a token generation rate R. Each frame deducts the corresponding tokens from the token bucket according to the number of bytes, and the frame is sent after a delay if there are not enough tokens. At the same time, a sliding window byte counter is used to monitor the bus bandwidth in real time. When the threshold is exceeded, non-critical frames are queued or dropped. The token bucket rate limiting is used to block command overload at the physical level and prevent the actuator from exceeding the limit due to a sudden concentration of commands. It serves as a frequency and bandwidth guarantee mechanism for hardware sandbox gating.
[0154] The aforementioned unbypassable hardware sandbox forces the execution commands to pass through an independent microcontroller unit (MCU) and a safety relay in series in the control link for gating. It also physically disconnects unauthorized bypasses, performs loop self-tests, and detects jumpers, making it impossible for any upper-level software to bypass the unbypassable hardware sandbox of the direct drive actuator. It can perform hardware limiting and emergency disconnection of speed, acceleration, torque, and area out-of-bounds.
[0155] Example 2, combined with Figure 2 As shown, a production control system based on a large-scale intelligent agent, using Example 1 as an example, is described below:
[0156] Data acquisition and processing module: Acquires raw data and preprocesses it to generate a time-series state stream for subsequent intent generation and simulation calls;
[0157] High-level intent generation module: Based on the temporal state flow and instruction template, it calls the pre-trained large model, outputs a set of candidate high-level intents and confidence scores, and generates sliding window features and constraint parameters as inputs for analysis and downstream simulation;
[0158] Command parsing module: used to generate parameterized low-level command sequences from candidate high-level intent sets and external benchmark resources; to allocate exclusive tokens and manage mutex locks for resource requests; to calculate time offsets and write them into synchronization barrier identifiers; and to form a serialized timeline of executable actions.
[0159] Simulation and scoring module: Iterates step-size on the digital twin of the short-track edge to generate physical state and control sequences within future time windows; calculates and normalizes the lower limit of safe distance to obtain a safety margin score; calculates the capacity gain score based on the time slot set obtained by gating; synthesizes a quality score based on trajectory smoothness, boundary margin, and load and plan overlap; generates cost scores for in-step time and energy consumption; aggregates constraint violations to obtain penalty scores; finally, synthesizes a comprehensive score and outputs a pass set with evidence.
[0160] Formal verification module: Calculates signed violations and aggregates penalty scores for the simulation trajectory and commands of the pass set according to the constraint set, performs threshold judgment on safety margin and comprehensive score; when non-compliant, generates corrected commands according to priority <time offset, velocity scaling, acceleration scaling, trajectory convergence> and performs simulation review again, outputs compliance, corrected compliance and non-compliance conclusions and corresponding commands and verification log;
[0161] Hardware Sandbox and Execution Module: After framing and signing, compliance low-level execution commands are sent to the unbypassable hardware sandbox via the system bus, generating approved execution commands and transmitting them back in real time; during execution, encoder and torque data are collected for amplitude limiting or emergency stop control, generating monitoring and audit data (including command ID, timestamp, gating result and violation flag) for online evaluation and traceability.
[0162] The aforementioned shadow verification phased deployment refers to the process where, before the new strategy is officially switched to the control link, it runs in parallel with the existing strategy in an independent shadow channel. Candidate commands are generated for the same real-time input, and simulation and verification are performed. Within a preset observation window, the output differences and safety indicators of the shadow strategy and the existing strategy are continuously compared. Only when the compliance rate, safety margin, and comprehensive score of the shadow strategy within the observation window all meet the preset switching threshold can the shadow strategy be switched to the officially executed strategy. The shadow verification is used to perform controlled verification of the new strategy without affecting current production safety, thereby shortening the strategy deployment cycle and forming an auditable chain of evidence for deployment.
[0163] The shadow verification is triggered by the hardware sandbox and execution module when a new strategy is deployed. The specific triggering conditions are as follows: when the compliance rate, safety margin, and comprehensive score of the candidate command set generated by the formal verifier for the new strategy all reach the preset observation start threshold, the hardware sandbox and execution module automatically open the shadow channel to run the new strategy and the current strategy in parallel. In the shadow channel, the candidate commands generated by the new strategy are only simulated and verified, and are not sent to the executor. The hardware sandbox and execution module continuously collects the output difference, compliance rate, safety margin, and comprehensive score of the shadow strategy and the current strategy within the preset observation window, and performs strategy switching when the preset switching threshold is met, switching the shadow strategy to the formal execution strategy, and generating a switching record to be included in the monitoring and audit data.
[0164] The preset switching threshold refers to a multi-dimensional quantitative standard for determining whether a shadow strategy meets the formal switching conditions. It includes at least three types of indicators: compliance rate threshold, safety margin threshold, and comprehensive score threshold. The compliance rate threshold is the lower limit of the proportion of compliant and corrected compliance conclusions of candidate commands generated by the shadow strategy within the observation window after formal verification. This ensures that the new strategy has stable constraint compliance capabilities in a statistical sense. The safety margin threshold is the lower limit of the average safety margin score PSafe of each decision step of the shadow strategy within the observation window. This ensures that the new strategy maintains a sufficient safety margin during execution. The comprehensive score threshold is the lower limit of the average comprehensive score Sr of each decision step of the shadow strategy within the observation window. This ensures that the new strategy's overall performance in terms of safety, capacity contribution, and trajectory quality is not lower than that of the current strategy. All three thresholds must be met simultaneously to trigger a strategy switch. If any threshold is not met, the shadow observation state continues. The preset switching threshold is pre-configured by the system administrator before strategy deployment based on the current production line safety requirements and production targets. It is archived in the switching record along with the corresponding measured indicator values as an auditable basis for the compliance of the strategy's online implementation.
[0165] Figure 3This is a comparison chart of the technical effects of a production control method and system based on an AI large-scale intelligent agent. The black bars represent the technical effects of the present invention, while the gray bars represent the technical effects of the prior art. The chart shows that the technical effects of the present invention are superior to those of the prior art.
[0166] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely preferred examples and are not intended to limit the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the present invention is defined by the appended claims and their equivalents.
Claims
1. A production control method based on a large-scale intelligent agent, characterized in that: Acquire multi-source data and preprocess it to obtain a time-series state stream; The time-series state stream is input into a pre-trained large model agent to obtain a candidate high-level intent set. The candidate high-level intent set is parsed to obtain the low-level command sequence; A fast simulation is performed on the low-level command sequence to obtain short-time physical state trajectories; the short-time physical state trajectories are then comprehensively scored to obtain a scoring result. The formal validator is applied to evaluate the scoring results to obtain a compliance determination result; The control commands to be issued are determined based on the compliance assessment results. The control command to be issued is sent to the hardware sandbox for gating processing to obtain the gating processing result.
2. The production control method according to claim 1, characterized in that, Acquire multi-source data and perform preprocessing to obtain a time-series state stream, including: By subscribing to the device controller, sensors, and robot control cabinet using a unified OPC architecture, and writing timestamps at a fixed frequency and caching breakpoint resume at the edge gateway, and using PTP time synchronization to correct for delays, the raw data stream is obtained. The raw data stream is preprocessed, and key fields are retained in the preprocessed raw data stream according to the key resource priority rule. The data stream is then merged according to device ID and timestamp to obtain a time-series state stream.
3. The production control method according to claim 1, characterized in that, The temporal state stream is input into a pre-trained large model agent to obtain a candidate high-level intent set, including: By extracting four types of features from the time-series state flow within a fixed sliding window—equipment occupancy, task queue length, lower limit of safe distance, and alarm count—and constructing instruction templates and injecting current process parameters, candidate high-level intentions are generated by inferring from a pre-trained large model. The confidence level is calculated by normalizing the log-likelihood of each intention and superimposing constraint violation penalties, thus obtaining a set of candidate high-level intentions.
4. The production control method according to claim 1, characterized in that, The candidate high-level intent set is parsed to obtain a low-level command sequence, including: The parser obtains the action primitive sequence corresponding to each intent by key-value matching and rule binding of the candidate high-level intent set and external benchmark resources; it obtains the instruction template parameter set by resource binding and mutex lock allocation of the action primitive sequence; and it obtains the low-level command sequence by filling the instruction template parameter set into the instruction template.
5. The production control method according to claim 4, characterized in that, Resource binding and mutex lock allocation are applied to action primitive sequences, including: For each action primitive, resource requests and time intervals are managed using exclusive token allocation and mutex locks. The time offset is calculated based on the lock table and the expected start time, and the start time is written back to generate an instruction set carrying resource tokens. By using synchronization barriers and sequential relationship annotations for parallel actions, a synchronization barrier identifier is written at each synchronization point and a predecessor-successor flag is generated. The barrier is released after the predecessor command completes and submits its completion status, resulting in an executable action timeline. Based on the executable action timeline, out-of-bounds pre-checks and corrections are performed on pose, velocity, torque, and region boundaries to complete resource scheduling, timing coordination, and constraint correction of the low-level command sequence.
6. The production control method according to claim 1, characterized in that, Perform fast simulation on the low-level command sequence to obtain short-time physical state trajectories, including: Digital twin simulation of short-track edge paths is performed using low-level command sequences: Let the step size ΔWL and number of steps WS, and the time window Tw, in the initial state... Press down Recursively, the physical state sequence and control sequence for steps t=0...WS are obtained, yielding the short-time physical state trajectory, where... The next state vector, Let F be the system state vector at step t, and let F be the state transition function. Let t be the control input vector for step t.
7. The production control method according to claim 1, characterized in that, The short-time physical state trajectory is comprehensively scored to obtain the scoring results, including: By calculating key risk indicators for the physical state sequence and control sequence using a pre-constructed risk function, quantifiable risk metrics are obtained, and non-compliant candidates are screened out accordingly. The algorithmic formula for the risk function is defined as follows: T1, for the low-level command sequence Order and the current state Short-track digital twin simulation is adopted, and the physical state sequence and control sequence within the future time window are obtained by iterating according to the step size ΔWL. The lower limit of the safe distance is calculated for any device, and the calculated lower limit of the safe distance is normalized to generate a safety margin score PSafe for the executability determination of candidate commands and the synthesis of comprehensive scores. T2 involves statistically analyzing the newly added executable time slot set All, filtered from the simulation trajectory within the simulation window by resource exclusivity constraints, synchronization barriers, and safety boundary gating. For each slot j, the parallel availability (Useful) is calculated based on resource token non-conflict and synchronization alignment results. Furthermore, the beat improvement information gain (Beat) is calculated based on the difference between the baseline beat and the predicted beat, generating the capacity gain score (BE). The algorithm formula for calculating the gain score is as follows: , Where BE is the gain score, All is the set of time slots, j is the index of the slot in the time slot set, Useful is the parallel availability, and Beat is the beat improvement information gain. As a weighting factor, The summation symbol; T3 generates a mass fraction by weighting and combining the acceleration sequence, lower safety distance limit, torque sequence, and planning window set obtained from the simulation. The weighted combination formula is as follows: , in Let r be the quality score and r be the decision step index. Here, represents the weighting coefficient for the trajectory smoothness score, and D represents the trajectory smoothness score. Here, E represents the weighting coefficient of the boundary margin fraction. Here, G represents the weighting factor for the load score. H represents the weighting coefficient for the plan overlap score, where H is the plan overlap score. Based on the weighted synthesis formula, the energy cost score Eng is generated by normalizing the in-step time and energy consumption and then performing a weighted synthesis. By applying hinge aggregation to the signed violations of speed limits, acceleration limits, torque limits, restricted areas, resource exclusivity, and collision distance constraints, a penalty score is obtained and used to synthesize the comprehensive score. The algorithm formula for hinge aggregation is as follows: , in Let Ser be the penalty score for decision step r, and r be the index of the decision step. Let u be the summation symbol, U be the constraint set, and Pun be the constraint weight. Let Y be a signed violation function constraining u. Let A be the system state vector at step r. Let B be the control input vector to be evaluated in the r-th step; T4, based on all scores from T1 to T3, generates a comprehensive score by weighting each score for determining the candidate command set. The algorithm formula for the comprehensive score is as follows: , S represents the overall score, and r represents the decision step index. The weighting coefficients for the safety margin score. For safety margin fractions, The weighting coefficients for the capacity gain fraction. This represents the capacity gain fraction. The weighting coefficient for the quality score. For quality fraction, The weighting coefficient for the energy cost score. Energy cost score The weighting coefficient for the penalty score. As a penalty for low scores, Let r be the constraint weights for decision step r; in , , , , , The score is obtained by determining the candidate command set and sorting the candidates from highest to lowest comprehensive score within the set that meets the determination.
8. The production control method according to claim 1, characterized in that, The formal validator is applied to evaluate the scoring results to obtain compliance evaluation results, including: By performing stepwise verification on each candidate of the evidence-bearing pass set and its simulated trajectory, signed violations are calculated according to the constraint set, and the positive parts are aggregated to obtain the penalty score. Threshold judgments are applied to the safety margin and comprehensive score, setting a safety margin threshold PSafemin and a comprehensive score threshold Srmax, under the condition that... , And penalty points Generate compliance conclusions. Sr represents the safety margin score and the overall score. For non-compliant candidates, a revised command is generated according to the repair priority. The revised command is re-simulated and the above verification is repeated to obtain the compliance judgment result.
9. The production control method according to claim 1, characterized in that, The control commands to be issued are determined based on the compliance assessment results, including: Based on the compliance determination result, the control command to be issued is determined from the corresponding low-level command sequence; the control command to be issued is issued to the hardware sandbox for gating processing, and when an out-of-bounds error is detected, a limiting or blocking processing is performed to generate a gating processing result; the gating processing result is recorded to obtain the control command to be issued.
10. A production control system based on a large-scale intelligent agent, characterized in that, Includes the following modules: Data acquisition and processing module: used to acquire multi-source data and preprocess it to obtain a time-series state stream; High-level intent generation module: used to input the time-series state stream into a pre-trained large model agent to obtain a set of candidate high-level intents; Command parsing module: used to parse the candidate high-level intent set to obtain low-level command sequences; Simulation and scoring module: used to perform fast simulation on the low-level command sequence to obtain short-time physical state trajectory; and to perform comprehensive scoring on the short-time physical state trajectory to obtain scoring result; Formal verification module: used to apply a formal verifier to judge the scoring results and obtain a compliance judgment result; Hardware sandbox and execution module: used to determine the control commands to be issued based on the compliance determination results; The control command to be issued is then sent to the hardware sandbox for gating processing to obtain the gating processing result. It is also used to enable the shadow channel when a new strategy is deployed, perform simulation and verification on the shadow strategy and continuously compare its output with the current strategy. When the preset switching threshold is met, the strategy is switched and a switching record is generated.