A method for collaborative deployment of edge environment industrial software components in an industrial cluster area

By constructing semantically driven workflow blueprints, using hybrid precision quantization with output channel sensitivity awareness, and co-scheduling under a partially observable Markov decision process framework, the independence issues of workflow generation, model quantization, and resource scheduling in traditional methods are resolved. This enables efficient and balanced deployment of industrial software components and improves the collaborative manufacturing efficiency of industrial clusters.

CN122309079APending Publication Date: 2026-06-30WUHAN UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WUHAN UNIV OF TECH
Filing Date
2026-04-03
Publication Date
2026-06-30

Smart Images

  • Figure CN122309079A_ABST
    Figure CN122309079A_ABST
Patent Text Reader

Abstract

This invention provides a method for collaborative deployment of industrial software components in the edge environment of industrial clusters, comprising: a workflow blueprint construction step, which receives industrial control task requests in natural language, calls a language model combined with a retrieval enhancement generation mechanism, and parses and maps the blueprint into a structured deployment blueprint that has undergone dimensionality reduction and compression; a model quantization execution step, which approximates the sensitivity scores of each output channel based on the Fisher information matrix, allocates differentiated quantization bit widths, performs group matrix truncation and error compensation, and generates lightweight components and computing power requirement constraints; and a resource collaborative scheduling step, which, within a partially observable Markov decision process framework, uses a long short-term memory network to extract hidden state representations, inputs them into a converter model for autoregressive calculation based on computing power constraints, generates routing deployment instructions, and completes container instantiation. This invention integrates intelligent workflow generation, precise component quantization, and collaborative deployment, and is applicable to collaborative manufacturing scenarios in industrial clusters.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of industrial software deployment, edge computing and collaborative scheduling technology, and specifically relates to a method for collaborative deployment of industrial software components in the edge environment of an industrial cluster. Background Technology

[0002] In business resource service scenarios within industrial clusters, the dense distribution of enterprises leads to highly coupled business operations, diverse demands for industrial software components, heterogeneous distribution of edge computing nodes, and dynamic resource changes. Simultaneously, the region demands extremely high requirements for real-time service response, economical resource utilization, and reliable cross-enterprise collaboration. Traditional industrial software component deployment methods have significant shortcomings in business adaptability, resource optimization and adaptation, cross-node collaborative scheduling, and component matching efficiency, making it difficult to meet the needs of efficient collaborative manufacturing and intelligent services in industrial clusters.

[0003] First, traditional industrial software component workflow generation is inefficient and poorly adaptable. Industrial clusters often have diverse business scenarios (such as automotive parts processing, electronic assembly, and logistics collaboration). Traditional workflow generation relies on manual arrangement or fixed templates, lacking semantic understanding and intelligent matching capabilities. It cannot automatically parse and map industrial control task requests in natural language form into structured component deployment schemes, resulting in time-consuming and inaccurate component matching, failing to quickly respond to the flexible and ever-changing business needs of enterprises within the region. Existing retrieval-enhanced generation methods are mostly single-module designs with high coupling between retrieval and generation. They lack independent knowledge base support and hierarchical retrieval mechanisms across the three stages of task decomposition, component matching, and workflow generation, making it difficult to dynamically adapt to the component matching needs of different businesses. Furthermore, existing methods generate highly redundant workflow information, containing a large number of natural language interpretation characters without effective dimensionality reduction and compression, increasing the contextual overhead of the language model and affecting the processing efficiency of subsequent deployment steps. In addition, the generated workflow blueprints lack a systematic multi-dimensional verification mechanism; event connections and data connections between components are not rigorously validated, easily leading to component call failures or data flow interruptions during deployment.

[0004] Secondly, existing deep learning model quantization methods are not adapted to the physical execution characteristics of industrial software components, resulting in high resource overhead. Edge nodes in industrial clusters have limited and heterogeneous resources, while traditional quantization methods often employ a uniform precision strategy, applying the same quantization bit width to all output channels of the deep learning model. This ignores the varying sensitivity of different output channels to the precision of underlying mechanical control commands. Using the same quantization precision for core channels that directly output underlying mechanical control electrical signals (such as the axial displacement command channel of a CNC machine tool or the joint angle control channel of a robotic arm) and non-critical channels responsible for asynchronous edge logging leads to degraded control precision in core channels due to quantization errors, or wastes limited edge physical memory resources due to excessively high precision in non-critical channels. Furthermore, the quantization process lacks precise means to measure the sensitivity of each output channel—directly calculating the Hessian second derivative matrix is ​​extremely costly in terms of computation and storage, especially in large-scale deep learning models for industrial software components, while simply ignoring the first-order gradient factor leads to insufficient precision in sensitivity assessment. Furthermore, existing post-training quantization algorithms experience frequent memory read / write conflicts between different computation groups when quantizing channels of varying bit widths. This results in excessively high memory access frequency during quantization reconstruction, impacting quantization execution efficiency on edge devices. More importantly, quantized industrial software components lack precise physical resource consumption calculations—the peak values ​​of static parameter memory, dynamic key-value cache, and total floating-point operations for the quantized model are not systematically estimated. This lack of reliable computational constraints for subsequent deployment and scheduling makes it easy to deploy components to resource-constrained edge nodes, leading to memory overflows or inference timeouts.

[0005] Secondly, traditional deployment methods suffer from weak cross-node collaboration capabilities and insufficient resource constraint matching. In industrial clusters, edge node resources dynamically change (e.g., resource usage fluctuates during peak production periods). Traditional deployment methods fail to construct dynamic edge node resource profiles, lacking real-time collection and structured representation of operational status data such as available processing cycles, task queue status, and network link bandwidth for each node. At the multi-node scheduling level, existing methods do not consider the impact of container cold start latency on scheduling decisions. When a new industrial software component is dispatched to an edge node, the instantaneous resource indicators of that node do not change immediately due to the time required for container image retrieval and startup. If the scheduling system relies solely on the current instantaneous resource readings for decision-making, it may dispatch too many components to the same node in a short period, leading to memory overflow a few seconds later—a state jump problem caused by the violation of Markov property. Existing reinforcement learning methods simply model the deployment problem as a fully observable Markov decision process, failing to effectively handle incomplete and discontinuous observations caused by network latency, sensor sampling intervals, and other factors in edge computing environments. This results in unstable scheduling performance in heterogeneous multi-cluster environments and uneven resource allocation across different cluster types.

[0006] Furthermore, the existing methods operate independently across three stages: workflow generation, model quantization, and resource scheduling, failing to form an integrated collaborative framework. The workflow generation stage does not consider the resource requirements for subsequent quantization and deployment; the computational power constraints generated during quantization are not directly transmitted to the scheduling stage as decision input; and the routing decisions in the scheduling stage are not linked to component dependencies and data interaction requirements in the workflow blueprint. This leads to the dual problems of "performance failure" or "resource waste" after deployment. Simultaneously, the lack of an end-to-end adaptation mechanism for the business coupling characteristics of industrial clusters and the failure to optimize communication links during cross-enterprise component collaborative deployment further exacerbate collaboration latency and limit the efficiency of collaborative manufacturing among enterprises within the region.

[0007] In summary, traditional industrial software component deployment methods have several drawbacks in industrial cluster scenarios, including a lack of semantic understanding and structured dimensionality reduction capabilities in workflow generation, neglect of the sensitivity differences of output channels to physical control precision in model quantization and a lack of accurate resource consumption calculation, failure to consider state jump issues under discontinuous observations in cross-node scheduling, and insufficient connection between the three stages. Summary of the Invention

[0008] The purpose of this invention is to address the shortcomings of the aforementioned background technology and provide a method for collaborative deployment of industrial software components in the edge environment of industrial clusters, enabling intelligent workflow generation, precise component quantification, and efficient collaborative deployment under resource constraints.

[0009] The technical solution adopted in this invention is: a method for collaborative deployment of industrial software components in the edge environment of an industrial cluster, which utilizes the physical processor and memory of the deployment control system to execute the following steps: Workflow blueprint construction steps: Receive an industrial control task request containing a natural language character sequence, call a pre-trained language model to parse the natural language instructions, and combine a dynamic modular retrieval enhancement generation matching mechanism to parse and map the industrial control task request into a structured deployment blueprint composed of multiple edge physical industrial software components. The data structure of the structured deployment blueprint includes a dimension-reduced and compressed component call sequence and its corresponding network data interaction interface and storage resource consumption characteristics. Model quantization execution steps: Analyze the computationally intensive industrial software components containing deep learning models in the structured deployment blueprint, calculate the sensitivity scores of each output channel in the deep learning model to the accuracy of the underlying mechanical control commands, assign differentiated numerical quantization bit widths to the weight tensors of different output channels, call the general post-training quantization algorithm to perform matrix truncation and error compensation calculations on each computation group, generate lightweight industrial software components, and perform physical resource consumption quantization calculations on the lightweight industrial software components to generate computing power requirement constraints; Resource collaborative scheduling steps: Obtain the operating status data of heterogeneous edge computing nodes within the target industry cluster area, construct the operating status data into an observation state vector, and, within the framework of a partially observable Markov decision process, use a cascaded long short-term memory network to extract the implicit state representation under discontinuous observations from the observation state vector and input it into the converter model. Based on the computing power requirement constraints, perform autoregressive sequence calculation to generate routing deployment instructions for multiple cluster physical nodes, and distribute the lightweight industrial software components to the corresponding physical edge nodes through the network to complete container instantiation and execution.

[0010] In the above technical solution, the workflow blueprint construction step includes: A three-level independent vector database system, comprising a task splitting library, a component interface attribute library, and a historical orchestration library, is constructed in the physical storage medium. Upon receiving the industrial control task request, a retrieval module is invoked to extract matching task sub-templates from the task splitting library using semantic vector similarity matching conditions. The extracted matching task sub-templates are input into the language model to generate structured sub-task feature vectors, and vector retrieval is performed in the component interface attribute library using the sub-task feature vectors to locate the corresponding physical software component application programming interface entity. A component call sequence is constructed based on the physical software component application programming interface entity, and redundant natural language interpretation characters generated by the language model are filtered out according to an agreed dimensionality reduction protocol to generate the structured deployment blueprint in a specific compression format. The structured deployment blueprint is then translated into a standard deployment description file.

[0011] The above technical solution, after generating the structured deployment blueprint in a specific compression format and before translating it into the standard deployment description file, further includes: Trigger the workflow state machine verification mechanism to perform multi-dimensional verification on the structured deployment blueprint; the dimensions of the multi-dimensional verification include the number of components, component names, event connection relationships, data connection relationships, and event-data association relationships; after the multi-dimensional verification is passed, the verified structured deployment blueprint is output.

[0012] In the above technical solution, the step of calculating the sensitivity score of each output channel in the deep learning model to the accuracy of the underlying mechanical control commands includes: Load the network layers of the target deep learning model and extract the output channel dimension array of each layer; calculate the gradient vector of the model loss function using the historical sampling dataset of the physical device's operating status, and calculate the main diagonal elements of the Fisher information matrix based on the gradient vector to approximate the Hessian second derivative matrix; traverse each output channel in the output channel dimension array, and calculate the impact of numerical truncation of a single output channel on the overall network output loss based on the approximate Fisher information matrix formula, which is used as the sensitivity score of the output channel to the accuracy of the underlying mechanical control commands.

[0013] In the above technical solution, the step of assigning differentiated numerical quantization bit widths to the weight tensors of different output channels includes: The sensitivity score is comprehensively evaluated in conjunction with the physical execution function attributes of the component, and divided into multiple priorities. The priorities are then mapped to the control bit width of specific physical channels. A first-precision floating-point bit width is allocated to high-sensitivity output channels that directly output underlying mechanical control electrical signals; a second-precision fixed-point bit width is allocated to medium-sensitivity output channels that handle intermediate state data transmission; and a third-precision bit width is allocated to low-sensitivity output channels that handle asynchronous edge logging system recording. The bit width of the first precision is greater than that of the second precision, and the bit width of the second precision is greater than that of the third precision.

[0014] In the above technical solution, the step of calling the general post-training quantization algorithm to perform matrix truncation and error compensation calculations for each computation group includes: Based on the differentiated numerical quantization bit widths allocated to different output channels, all output channels are divided into multiple independent computation groups. During channel-by-channel quantization iteration, a group-isolated update strategy is adopted. For channels within the currently processed computation group, after performing weight matrix numerical truncation, real-time error compensation calculation based on the inverse Hessian matrix is ​​performed to update the weights of other related channels within the same group. For channel weight tensors outside the currently processed computation group, their corresponding cumulative error update amounts are temporarily stored in a cache, and the memory write operation of the channel weight tensors is suspended. After all channel quantization iterations within the currently processed computation group are completed, a delayed batch processing mechanism is used to merge and update the error update amounts in the cache in a single batch.

[0015] In the above technical solution, the process of generating the computing power requirement constraints in the model quantization execution step includes: For the static physical memory requirements of the lightweight industrial software component, the linear layer weight tensor dimensions of the multi-head attention module and feedforward neural network module in the deep learning model included in the lightweight industrial software component are analyzed to estimate the memory usage of the parameter layer and obtain the number of bytes of static memory occupied. For the dynamic operating resource requirements of the lightweight industrial software component, the peak dynamic memory occupied by the key-value caching mechanism in the autoregressive generation task is calculated in combination with the preset or estimated maximum output sequence length. The tensor matrix multiplication path in the deep learning model is traced, and the total number of floating-point operations required for the pre-filling calculation stage and the decoding generation stage is accumulated. The number of bytes of static memory occupied, the peak dynamic memory, and the total number of floating-point operations are packaged and serialized to generate a rigid physical resource constraint input vector, and the rigid physical resource constraint input vector is used as the computing power requirement constraint condition.

[0016] In the above technical solution, the process of extracting the implicit state representation under discontinuous observations in the resource collaborative scheduling step includes: Obtain the observation state vector within the current time slot; wherein the observation state vector is concatenated with the remaining available CPU processing cycles of each edge physical node, task queue characteristics, and the current bandwidth throughput status of the network link, and is concatenated with the resource release identifier data of the recently allocated and closest to the estimated completion time of the running component on the target node; the observation state vector containing latency compensation information and the implicit state of the previous time slot are synchronously sent into the long short-term memory network, and the current implicit state representation with smooth local state transitions is calculated and output through the cyclic gating unit.

[0017] In the above technical solution, the process of generating routing deployment instructions for multiple cluster physical nodes in the resource collaborative scheduling step includes: The current implicit state representation, the computing power requirement constraints, the action vector list of historically deployed clusters, and the target reward adjustment factor representing the minimization of network throughput latency are input in parallel into the converter model, wherein the target reward adjustment factor is composed of the component data transmission latency and the physical resource occupancy ratio; the self-attention mechanism of the converter model is used to perform multi-dimensional correlation calculation; the globally optimal physical cluster identifier and local routing instructions to be allocated to the current industrial software component are output, and the target cluster controller is triggered to perform container image pull and deployment actions.

[0018] This invention provides an edge controller computing device for implementing edge-end collaborative management, comprising: The system includes a data bus, a cache, a non-volatile physical storage medium, and at least one multi-core central processing unit; the non-volatile physical storage medium contains a machine-readable computer program instruction sequence, and the at least one multi-core central processing unit triggers the edge controller computing device to execute the collaborative deployment method of industrial software components in the edge environment of the industrial cluster area as described above by reading and running the computer program instruction sequence.

[0019] The beneficial effects of this invention are as follows: This invention constructs a three-in-one collaborative mechanism of "workflow generation—model quantization—collaborative deployment" by organically linking three steps: semantically driven workflow blueprint construction, output channel sensitivity-aware hybrid precision quantization, and edge-cloud collaborative scheduling under a partially observable Markov decision process framework. The structured deployment blueprint output from the workflow blueprint construction step directly serves as the input to the model quantization execution step. The lightweight industrial software components and their computing power requirement constraints output from the model quantization execution step directly serve as the input to the resource collaborative scheduling step. The data flow between the three steps is completely interconnected, fundamentally solving the problem of the workflow generation, model quantization, and resource scheduling being independent and lacking logical connection in traditional methods. Experimental results show that this method achieved an average success rate of 93.04% in the automatic workflow generation experiment, and in the multi-cluster collaborative deployment experiment, the resource consumption of any cluster was greater than 0.8 and the resource consumption among the clusters remained balanced, fully verifying the overall effectiveness of the three-step collaborative mechanism.

[0020] In the workflow blueprint construction step, a three-level independent vector database system, including a task decomposition library, a component interface attribute library, and a historical orchestration library, is built in the physical storage medium. This anchors the retrieval processes of task decomposition, component matching, and workflow generation to their respective dedicated knowledge bases, avoiding the retrieval accuracy degradation problem of a single knowledge base when facing complex queries. The design of using the sub-task feature vectors to perform vector retrieval in the component interface attribute library ensures that the structured feature vectors generated by the language model are explicitly consumed as the input keys for retrieval, guaranteeing the complete transfer of task semantic information to the component matching results. Redundant natural language interpretation characters generated by the language model are filtered out according to an agreed-upon dimensionality reduction protocol, and a structured deployment blueprint in a specific compressed format is generated. This reduces the workflow representation from a lengthy natural language-level description to a fixed field format of "step-component ID-API input-data input-data output," significantly reducing the token usage of the language model and the data processing overhead of subsequent steps.

[0021] After generating a structured deployment blueprint in a specific compressed format and before translating it into a standard deployment description file, a workflow state machine verification mechanism is triggered to systematically verify the blueprint across five dimensions: component quantity, component name, event connection relationships, data connection relationships, and event-data association relationships. The technical effects of this five-dimensional verification mechanism are as follows: component quantity verification ensures that the total number of components in the blueprint matches the task breakdown result, preventing component omissions or redundancy; component name verification checks the validity of each component identifier in the knowledge base, excluding non-existent or obsolete component references; event connection relationship verification confirms that the trigger links between components form a valid directed acyclic graph structure, avoiding deadlocks caused by circular dependencies; data connection relationship verification checks the data type compatibility of upstream and downstream components, preventing runtime data format mismatch errors; and event-data association relationship verification ensures that each event trigger node has a corresponding data flow path, eliminating idle nodes with "triggers but no data." The synergistic effect of these five dimensions constitutes a comprehensive verification chain from structural integrity to semantic consistency, significantly reducing the failure rate of the workflow during the container instantiation and execution phase.

[0022] In the model quantization execution step, the network layers of the target deep learning model are loaded and the output channel dimension arrays of each layer are extracted. The gradient vector of the model loss function is calculated using the historical sampling dataset of the physical device's operating status. Based on the gradient vector, the main diagonal elements of the Fisher information matrix are calculated to approximate the Hessian second derivative matrix. This achieves a reduction in computational complexity from O(P0) of the complete Hessian matrix while maintaining evaluation accuracy. 2 The computational complexity is reduced to O(N·P) based on the diagonal approximation of the Fisher information matrix (where P is the total number of model parameters and N is the number of calibration samples). This method preserves both first-order and second-order factors in the Taylor expansion. Compared to existing commercialization works that ignore the first-order factor, this method achieves more accurate channel sensitivity assessment results in scenarios where the gradient values ​​of pre-trained industrial software components are small. The method iterates through each output channel in the output channel dimension array and calculates the impact of numerical truncation of a single output channel on the overall network output loss based on the approximate Fisher information matrix formula, providing a quantitative basis for subsequent differentiated bit-width allocation.

[0023] By combining sensitivity scores with the physical execution function attributes of components for comprehensive evaluation and dividing them into multiple priorities mapped to specific physical channel control bit widths, the quantization strategy no longer relies solely on mathematical sensitivity ranking but also considers the functional role of the channel in the industrial control chain. High-sensitivity output channels that directly output underlying mechanical control electrical signals are allocated a first-precision floating-point bit width to ensure that the accuracy of critical physical quantities such as CNC machine tool axial displacement commands and robotic arm joint angle control quantities is not degraded by quantization. Medium-sensitivity output channels that handle intermediate state data transmission are allocated a second-precision fixed-point bit width, compressing memory usage within an acceptable range of accuracy loss. Low-sensitivity output channels responsible for asynchronous edge logging are allocated a third-precision bit width, maximizing the release of physical memory resources occupied by non-critical channels. This differentiated allocation of three bit widths, while ensuring the safety of core mechanical control accuracy, compresses the overall memory usage of industrial software components to a level far lower than that of a uniform precision quantization scheme.

[0024] When using the general post-training quantization algorithm to perform matrix truncation and error compensation calculations on each computation group, all output channels are divided into multiple independent computation groups based on the differentiated numerical quantization bit width, and a group-isolated update strategy is adopted. For channels within the currently processed computation group, after performing weight matrix numerical truncation, real-time error compensation calculation based on the inverse Hessian matrix is ​​performed to update the weights of other related channels within the same group, ensuring that error propagation between channels with the same bit width is accurately compensated. For channel weight tensors outside the currently processed computation group, the accumulated error update is temporarily stored in the cache and memory writing is postponed. After the current group's quantization iteration is completed, a delayed batch processing mechanism is used to merge and update them in batches all at once. The technical effects of this group-isolation and delayed batch processing strategy are: on the one hand, it avoids cross-contamination of errors between computation groups with different bit widths, ensuring the accuracy consistency of mixed-precision quantization; on the other hand, it merges the memory write operation of channels outside the group from high-frequency random access per channel into a one-time sequential batch write, significantly reducing the memory access frequency and bus bandwidth occupation of edge nodes during the quantization reconstruction process.

[0025] When generating computational power requirement constraints, lightweight industrial software components are precisely quantified and calculated from two dimensions: static physical memory requirements and dynamic runtime resource requirements. The static dimension estimates the memory usage of the parameter layers by analyzing the linear layer weight tensors of the multi-head attention module and the feedforward neural network module, obtaining the number of bytes of static memory occupied. The dynamic dimension calculates the peak dynamic memory occupied by the key-value caching mechanism by combining the maximum output sequence length. The computational power dimension tracks the total number of floating-point operations required by the tensor matrix multiplication path accumulation pre-filling calculation stage and the decoding generation stage. Packaging and serializing these three elements into a rigid physical resource constraint input vector and using it as the computational power requirement constraint design allows subsequent resource collaborative scheduling steps to incorporate the actual resource requirements of the quantized model as a hard constraint into the routing decision, fundamentally avoiding memory overflow or inference timeouts caused by deploying components to resource-insufficient edge nodes, and achieving precise connection between the model quantization stage and the resource scheduling stage.

[0026] In the resource collaborative scheduling step, the observation state vector within the current time slot is obtained and concatenated with the remaining available CPU processing cycles of each edge physical node, task queue characteristics, and the current bandwidth throughput status of the network link. This is then linked with the resource release identifier data of the most recently allocated and closest-to-estimated-completion-time running component on the target node. The introduction of resource release identifier data is one of the key innovations of this scheme. It embeds the estimated resource release information of components about to be completed as a "soft future observation" into the current observation vector, effectively eliminating the illusion of momentary idleness of nodes caused by container cold start delays and preventing the scheduling system from dispatching too many components to the same node in a short period. The observation state vector containing delay compensation information and the implicit state of the previous time slot are synchronously fed into the Long Short-Term Memory (LSTM) network. A smooth local state transition representation is calculated and output through a cyclic gating unit. This allows the LSM network, within the framework of a partially observable Markov decision process, to integrate information from historical observation sequences, smooth discontinuous observation transitions caused by network latency and sensor sampling intervals, and effectively reflect the true physical state of the system. Experimental results show that, with a window size of 90, the resource consumption of this method is greater than 0.8 for any cluster, and the resource consumption among the clusters remains balanced. In contrast, the traditional CQL algorithm causes the average resource consumption of cluster 0 to exceed 0.9 due to state jumps, while it is only 0.4 to 0.7 for clusters 1 and 2. This results in a large number of tasks being piled up on a single cluster, making it difficult to adapt to heterogeneous environments.

[0027] The current implicit state representation, computing power requirement constraints, a list of action vectors for historically deployed clusters, and a target reward adjustment factor representing the minimization of network throughput latency are input in parallel into the converter model. A self-attention mechanism is used to calculate multi-dimensional correlations, enabling the converter model to simultaneously consider the global relationships between system state, resource constraints, historical decisions, and optimization objectives, rather than relying solely on local features for greedy selection. The target reward adjustment factor, designed as a ratio of component data transmission latency to physical resource occupancy, ensures that scheduling decisions simultaneously optimize both latency and load balancing—two competing objectives. The converter model outputs the globally optimal physical cluster identifier and local routing instructions in an autoregressive manner. Compared to the 1.2 to 1.6 seconds required for traditional heuristic algorithms to generate deployment schemes, the average latency of obtaining deployment schemes based on neural networks is less than 0.2 seconds, offering an order-of-magnitude advantage in decision-making speed. This avoids the problem of excessively long scheme generation times dragging down request execution time.

[0028] The edge controller computing device provided by this invention for implementing edge-to-edge collaborative management, through the coordinated operation of a system data bus, a cache, a non-volatile physical storage medium, and at least one multi-core CPU unit, solidifies the entire computational process of the aforementioned method into a hardware-executable sequence of program instructions. The multi-core CPU unit is responsible for executing the entire computational process of workflow blueprint construction, model quantization, and resource collaborative scheduling; the cache is used to store the cumulative error updates of each computation group during quantization and the implicit state vectors of the Long Short-Term Memory network, enabling efficient implementation of the group isolation and delayed batch processing quantization strategy at the hardware level; the non-volatile physical storage medium is used to persistently store the three-level vector database, pre-trained language model parameters, and historical orchestration records, ensuring that the knowledge base data is not lost after system power failure. This device can adopt various hardware forms such as industrial-grade servers, edge computing gateways, or embedded computing platforms with GPU acceleration capabilities, adapting to deployment scenarios of different scales and computing power requirements within industrial clusters.

[0029] The computer-readable storage medium provided by this invention enables all steps of the above method to be stored, distributed, and deployed in the form of a computer program. This computer program can be downloaded to an edge controller computing device for execution, or distributed from a remote server via a network to various computing devices within an industrial cluster for execution. This provides enterprises within the industrial cluster with a flexible software deployment method, supporting them in carrying out collaborative manufacturing and intelligent services efficiently, at low cost, and with high reliability. Attached Figure Description

[0030] Figure 1 This is a flowchart of the method of the present invention; Figure 2This is a flowchart of the automatic workflow generation mechanism for industrial software components in this invention; Figure 3 This is the knowledge base design diagram in this invention; Figure 4 This is a flowchart of the dynamic modular RAG in this invention; Figure 5 This is a flowchart illustrating the channel sensitivity assessment example in this invention. Figure 6 This is a flowchart illustrating an example of the channel-by-channel mixed precision quantization method based on GPTQ in this invention. Figure 7 This is a diagram of the hybrid deployment process based on offline value assessment in this invention; Figure 8 This is a diagram illustrating the adaptive multi-cluster deployment process based on LSTM and Transformer in this invention. Figure 9 This is a diagram showing the violation of Markov property during the execution of this invention; Figure 10 This is a complete module diagram of the collaborative work of LSTM and Transformer in this invention; Figure 11 This is a schematic diagram of the LSTM module in this invention; Figure 12 This is a schematic diagram of the Transformer module in this invention; Figure 13 This invention automatically generates experimental diagrams for robotic arm application scenarios; Figure 14 This invention automatically generates experimental diagrams for data analysis scenarios; Figure 15 This is an experimental diagram automatically generated from the collaborative scenario of the robotic arm and AGV in this invention; Figure 16 These are the runtime metrics and corresponding rewards observed by each algorithm during the hybrid deployment experiment based on offline value assessment in this invention. Figure 17 This refers to the proportion of decision time of each algorithm to the total response time in the hybrid deployment experiment based on offline value assessment in this invention; Figure 18 This is a graph showing the key performance indicators observed during the algorithm's runtime in the adaptive multi-cluster deployment experiment based on LSTM and Transformer in this invention. Figure 19 This is a diagram showing the load distribution of each cluster in the adaptive multi-cluster deployment experiment based on LSTM and Transformer in this invention. Figure 20This is a graph showing the reward distribution of each algorithm in the adaptive multi-cluster deployment experiment based on LSTM and Transformer in this invention. Detailed Implementation

[0031] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments to facilitate a clear understanding of the present invention, but these descriptions do not constitute a limitation on the present invention.

[0032] To make the objectives, technical solutions, and advantages of this invention clearer, the embodiments of this invention will be further described below with reference to the accompanying drawings. The following embodiments are for illustrative purposes only and are not intended to limit the scope of protection of this invention.

[0033] This embodiment provides a method for collaborative deployment of industrial software components in the edge environment of an industrial cluster, which is executed using the physical processor and memory of a deployment control system. The deployment control system can be deployed on a management server or cloud-based management platform within the industrial cluster, and communicates with multiple heterogeneous edge computing nodes within the industrial cluster via a network.

[0034] like Figure 1 As shown, the method includes the following steps: S1, workflow blueprint construction step; S2, model quantization execution step; S3, resource collaborative scheduling step. These three steps are sequentially linked. The structured deployment blueprint output by S1 serves as the input to S2, and the lightweight industrial software components and their computing power requirement constraints output by S2 serve as the input to S3. Finally, S3 distributes the lightweight industrial software components to the corresponding physical edge nodes via the network to complete container instantiation and execution. The specific implementation methods for each step are described in detail below.

[0035] S1, such as Figure 2 As shown, the workflow blueprint construction step receives industrial control task requests containing natural language character sequences, calls a pre-trained language model to parse natural language instructions, and combines a dynamic modular retrieval enhancement generation matching mechanism to map the industrial control task request into a structured deployment blueprint composed of multiple edge physical industrial software components. The data structure of the structured deployment blueprint includes a dimension-reduced and compressed component call sequence and its corresponding network data interaction interface and storage resource consumption characteristics. The technical effect of this step is that, through semantic-driven intelligent task decomposition and dynamic modular retrieval enhancement generation technology, it breaks through the limitations of traditional manual arrangement and fixed templates, and can accurately match the component requirements of diverse business scenarios in industrial clusters. At the same time, the dimension reduction and compression significantly reduce the context overhead and illusion rate when the language model processes long texts. This step specifically includes the following sub-steps: S111, such as Figure 3As shown, a three-tiered knowledge base system is constructed. Knowledge bases are built for three scenarios: task decomposition, components, and workflows, forming a three-tiered independent vector database system comprising a task decomposition library, a component interface attribute library, and a historical orchestration library. The task decomposition library stores task decomposition cases for different business scenarios (including both long and short task description formats); the component interface attribute library includes the API interfaces, data interaction specifications, resource consumption characteristics, and adaptation scenario information of all registered industrial software components within the industrial cluster area, with each record vectorized to support semantic retrieval; the historical orchestration library stores verified historical workflow orchestration cases (including event flow and data flow connection relationships), providing a reference for subsequent few-sample hints. The three-tiered knowledge bases are independent of each other, serving the three stages of task decomposition, component matching, and workflow generation respectively. This separate design avoids the problem of decreased retrieval accuracy when a single knowledge base faces complex queries.

[0036] S112, such as Figure 4 As shown, a dynamic and modular retrieval enhancement generation matching mechanism is configured. At the retrieval foundation level, multiple SBERT (Sentence-BERT) pre-trained models are deployed as vector encoders to map natural language text into dense semantic vectors. At the generation control level, a prompt text template integrating few-shot prompts and the thought chain method is designed, and a semantic similarity threshold (default 0.5), a context length protection factor (default 0.2), and a maximum case limit (default 3) are set. The introduction of the context length protection factor can automatically truncate the number of cases returned by retrieval when the language model context window is close to saturation, avoiding the degradation of generation quality caused by context overflow. At the same time, a workflow information transformation and compression rule is formulated, defining a compression format of "step - component ID - API input - data input - data output", which simplifies the redundant information of the original workflow and reduces the amount of language model tokens occupied. This compression format is the dimensionality reduction protocol described in the claims. Its technical effect is to reduce the dimensionality of the workflow blueprint from a verbose description at the natural language level to a structured fixed-field representation, which significantly reduces the data processing overhead of subsequent model quantization and scheduling steps.

[0037] S113. Intelligent Task Decomposition. This module receives natural language business tasks from users in industrial clusters (e.g., defect detection and sorting of automotive parts). Through a retrieval module, it extracts matching task sub-templates from the task decomposition library based on semantic vector similarity matching conditions. Specifically, the user's natural language task description is encoded into a semantic vector using the SBERT model. The semantic similarity with all stored cases in the task decomposition library is calculated. When the highest similarity score retrieved exceeds a preset threshold, the corresponding task decomposition sub-template is returned. A dynamic selection strategy filters highly relevant cases, and combined with prompt templates, guides the language model to decompose complex tasks into sub-tasks with matching granularity and components (e.g., image acquisition → defect recognition → robotic arm sorting), outputting a standardized sub-task list.

[0038] S114. Precise Component Matching. The extracted matching task sub-templates are input into the language model to generate structured sub-task feature vectors. During the generation process, the language model combines few-sample hint examples retrieved from the historical orchestration library and thought chain reasoning templates to gradually deduce the functional requirements and constraints of each sub-task. Then, the sub-task feature vectors are used to perform vector retrieval in the component interface attribute library to locate the corresponding physical software component application programming interface entity. The matching results are deduplicated, and the core component information (API interface, data format, resource requirements) is extracted to generate a component candidate set.

[0039] S115. Workflow Generation and Verification. After component matching is completed, a component call sequence is constructed based on the application programming interface entity of the physical software components. The original task, subtask list, and component candidate set information are integrated, and a language model is guided by a prompt template to generate a compressed workflow (including component execution order, event triggering relationships, and data interaction paths). Redundant natural language interpretation characters generated by the language model are filtered out according to an agreed-upon dimensionality reduction protocol, generating a structured deployment blueprint in a specific compressed format. Subsequently, the compressed format is converted into a standard format executable by the industrial platform, simultaneously triggering the workflow state machine verification mechanism. After generating a structured deployment blueprint in a specific compressed format and before translating it into a standard deployment description file, the structured deployment blueprint undergoes multi-dimensional verification: component quantity verification (checking whether the total number of components in the blueprint is consistent with the task breakdown result), component name verification (verifying whether each component identifier has a corresponding valid registration record in the component interface attribute library), event connection relationship verification (confirming whether the trigger event links between components form a valid directed acyclic graph structure), data connection relationship verification (checking whether the data output type of the upstream component is compatible with the data input type of the downstream component), and event-data association relationship verification (ensuring that each event trigger node has a corresponding data flow path to support it). If any abnormal information is found during verification, it is corrected and re-verified.

[0040] S116. Output Structured Deployment Blueprint. After successful verification, output the verified structured deployment blueprint and translate it into a standard deployment description file for use in subsequent container instantiation and execution. At this point, the S1 workflow blueprint construction step is complete, and the output structured deployment blueprint will serve as input for the subsequent S2 model quantization execution step.

[0041] For the S1 workflow blueprint construction steps, this embodiment conducted automatic generation experiments on 73 cases in the industrial software training platform dataset for verification. A comparative experiment was conducted using a mind chain combined with zero-shot hints and traditional retrieval-enhanced generation methods, along with the dynamic modular retrieval-enhanced generation method designed in this embodiment. Each case was generated 50 times, for a total of 10950 automatic generation experiments. The success rate curve for automatic generation in robotic arm application scenarios is shown below. Figure 13 As shown, in the 49 scenarios of the automatic generation experiment, the dynamic modular retrieval enhancement generation method achieved a success rate higher than or equal to that of the traditional retrieval enhancement generation method in 47 cases. The success rate curves for automatically generated data analysis scenarios are shown below. Figure 14 As shown, the dynamic modular retrieval enhancement generation method has significant advantages. The success rate curve for automatically generating collaborative scenarios between robotic arms and AGVs is shown in the figure. Figure 15 As shown.

[0042] As shown in Table 1, in robotic arm applications, the success rate of the thought chain combined with zero-shot hints method is 35.84%, the traditional retrieval enhancement generation method is 86.16%, and the dynamic modular retrieval enhancement generation method is 92.24%. In data analysis scenarios, the success rate of the thought chain combined with zero-shot hints method is 41.2%, the traditional retrieval enhancement generation method is 90.0%, and the dynamic modular retrieval enhancement generation method is 95.09%. In robotic arm-AGV collaboration scenarios, the success rate of the dynamic modular retrieval enhancement generation method is 91.78%, higher than the 84.67% of the traditional retrieval enhancement generation method and the 35.33% of the thought chain combined with zero-shot hints method. The combined average success rates of the three methods are: dynamic modular retrieval enhancement generation method 93.04%, traditional retrieval enhancement generation method 86.94%, and thought chain combined with zero-shot hints method 37.46%.

[0043] Table 1. Success Rate Data of Different Generation Methods for Industrial Software Component Workflows

[0044] S2. After outputting the structured deployment blueprint in S1, the model quantization execution step begins. This step parses the computationally intensive industrial software components containing deep learning models within the structured deployment blueprint. By calculating the sensitivity scores of each output channel in the deep learning model to the accuracy of the underlying mechanical control commands, differentiated numerical quantization bit widths are assigned to the weight tensors of different output channels. A general post-training quantization algorithm is called to perform matrix truncation and error compensation calculations on each computation group, generating lightweight industrial software components. The physical resource consumption of these lightweight industrial software components is then quantified and calculated to generate computing power requirement constraints. Specifically, this includes the following sub-steps: S211. Data Preparation. Collect data from typical business scenarios in the industrial cluster (such as production control commands, real-time sensor data, and equipment status feedback data), construct a quantitative evaluation sample set, and label key business indicators (such as control accuracy and detection accuracy).

[0045] S212. Model Preprocessing. Extract the network layers (such as feature extraction layer and decision output layer) of industrial software components containing deep learning models, and clarify the functional positioning of each layer's output channel (core business / auxiliary business).

[0046] S213, Output Channel Sensitivity Assessment. Parameter quantization significantly reduces storage requirements and improves inference speed by converting model weights from high-precision floating-point to low-bit-width numerical representations. Consider quantizing the model of an industrial software component with 0-bit floating-point precision to q-bit, using a uniform quantization method. The quantized weights are shown in the following formula.

[0047] (1) (2) in, This represents the original floating-point weights before quantization. Represents the quantized integer weights; Represents zero point, when A value of 0 indicates symmetric quantization; otherwise, it indicates asymmetric quantization. This indicates the quantization step size, used for scaling the data. Indicates the number of bits to be quantized. round indicates rounding operation. Formula (2) represents the dequantization process, which uses zero and quantization step size to restore the quantized integer weights to a value close to the original floating-point number.

[0048] The hybrid precision quantization method based on output channel sensitivity awareness used in this embodiment determines the output feature precision of all layers globally, rather than making decisions locally. This method evaluates the sensitivity of features based on their impact on the final model loss and allocates larger bit widths to features that contribute more to the loss. The sensitivity calculation for a specific channel c is shown in the following formula.

[0049] (3) in, This represents the quantized channel. Indicates the original channel. This represents the loss function for a single channel. This embodiment considers using Taylor expansion to estimate the loss function, similar to existing quantization work, and similarly ignores the influence of higher-order terms in the Taylor expansion.

[0050] (4) Where H represents the Hessian matrix, which is the second-order gradient of the loss function with respect to each channel. g represents the gradient of the single-channel loss, and its calculation formula is shown below.

[0051] (5) In reality, directly calculating the Hessian matrix is ​​usually impractical due to its extremely high computational and storage costs, especially in models of industrial software components, which leads to significant computational overhead and memory consumption. Therefore, this embodiment considers using the Fisher information matrix F on a standard dataset D to approximate the Hessian matrix, as shown in the following formula.

[0052] (6) in, This represents the gradient vector of the channel calculated on sample d. This represents the size of the dataset. Based on the approximation of formula (6), the second-order loss factor... It can be further simplified to a single vector product. Finally, the sensitivity of a particular channel can be further expressed by the following formula.

[0053] (7) In the calculation process of the above formula, this embodiment differs from Optimal Brain Damage (OBD) and many recent quantization works. This is because in the estimation of formula (7), the first-order factor is more important than the second-order factor. Considering that for well-trained industrial software components, the gradient g value is usually small and the model increment is also small, the first-order factor is not ignored here to simplify the calculation process. This sensitivity score comprehensively reflects the degree of influence of the intra-channel parameters on the overall network output loss when numerical truncation occurs, and is used as the sensitivity score of the output channel to the accuracy of the underlying mechanical control commands.

[0054] To better illustrate the technical details of the proposed sensitivity assessment method, let's take the input sample "robotic arm moveLine command: acceleration 0.5 m / s²" as an example. 2 Taking a target position (X=100mm, Y=200mm, Z=50mm), a velocity of 1m / s, and an execution time of 5s as an example, and applying 4-bit quantization to the overall model, the working principle of this sensitivity assessment method is analyzed. Figure 5 As shown, the workflow of the proposed channel sensitivity evaluation method can be summarized as follows: ① The input sample passes through an input processing layer and a decoder stack consisting of multiple decoders, each of which contains a self-attention sublayer and a feedforward network layer. ② When evaluating the sensitivity of a certain decoder output channel, the sensitivity score exhibited by different output channels under 4-bit quantization can be calculated according to formula (7). ③ A sensitivity distribution from high to low is obtained based on the different sensitivity scores. The results of this example analysis show that each channel performs differently under the same quantization precision. Therefore, the most suitable quantization precision can be selected for each channel to ensure that the loss of that channel is minimized, thereby minimizing the loss caused by quantization operation to the performance of the industrial component software model.

[0055] S214. Channel Priority and Precision Allocation. The sensitivity score is comprehensively evaluated in conjunction with the physical execution function attributes of the component, and divided into multiple priorities. These priorities are then mapped to specific physical channel control bit widths. Based on the physical execution function attributes of the component (real-time control / data processing / decision support), the sensitivity score is divided into high, medium, and low priorities. A first-precision floating-point bit width (8-bit precision in this embodiment) is allocated to high-sensitivity output channels that directly output underlying mechanical control electrical signals (e.g., the result output channel of a defect identification component). A second-precision fixed-point bit width (4-bit precision in this embodiment) is allocated to medium-sensitivity output channels that handle intermediate state data transmission. A third-precision bit width (2-bit precision in this embodiment) is allocated to low-sensitivity output channels that handle asynchronous edge logging system recording (e.g., log output channels). The first-precision bit width is greater than the second-precision bit width, and the second-precision bit width is greater than the third-precision bit width.

[0056] S215. A Channel-by-Channel Mixed-Precision Quantization Method Based on GPTQ. Through the channel sensitivity assessment in S213 and the differentiated bit width allocation in S214, the most suitable quantization precision can be determined for each channel. After completing the channel quantization precision allocation, a general post-training quantization algorithm is called to perform matrix truncation and error compensation calculations for each computation group. This embodiment considers using the generalized post-training quantization (GPTQ) method to quantize industrial software components channel by channel. The principle of GPTQ comes from another quantization method, Optimal Brain Quantization (OBQ). Compared to the OBQ quantization method, GPTQ first abandons the greedy selection of quantization weights when quantizing a row, instead using index-order selection of quantization weights. Because GPTQ uses index-order quantization for a weight row, the Hessian matrix corresponding to each row is the same, and therefore the inverse of the Hessian matrix is ​​also the same, thus reducing the overall complexity.

[0057] When quantifying the q-th weight in this row, the inverse of the Hessian matrix can be represented as: Therefore, the weight adjustment formula can be expressed as the following formula.

[0058] (8) (9) in, This represents the weights before quantization. This represents the weights after quantization using the RTN method. Meanwhile, the update formula for the Hessian inverse matrix can be expressed as follows.

[0059] (10) Since the GPTQ method quantizes all weighted rows in index order, a single iteration can quantize all rows in the same column. Therefore, the formula can be refactored into column-level operations: (11) (12) in, This represents the q-th column of the weight matrix, with a size of To address the numerical stability issue in the computation of the Hessian matrix inverse, GPTQ... Cholesky decomposition was used, as shown in the following formula.

[0060] (13) in, Let T represent a constant, and let T represent the upper triangular matrix obtained through Cholesky decomposition. Based on this, the weight update formula can be further modified as follows: (14) Finally, GPTQ uses batch quantization based on formula (14), dividing the weight columns into multiple groups. When quantizing a column, only the parameters within the current group are updated, and the parameters of subsequent columns in the group are only updated by the amount of change. Once all parameters in a group have been quantized, all subsequent parameters are updated at once.

[0061] like Figure 6 As shown, a 6×6 weight matrix example is used for analysis. The workflow of the example under the proposed quantization method can be summarized as follows: ① Through sensitivity analysis, the optimal quantization precision is dynamically assigned to each output channel. For example, different output channels in the example diagram are assigned two different quantization precisions: 2-bit and 4-bit. ② Channels are divided into multiple calculation groups according to the quantization precision. During channel-by-channel quantization iteration, a group-isolated update strategy is adopted. ③ For channels within the currently processed calculation group, after performing weight matrix numerical truncation, real-time error compensation calculation based on the inverse Hessian matrix is ​​performed to update the weights of other related channels within the same group. For channel weight tensors outside the currently processed calculation group, their corresponding cumulative error updates are temporarily stored in a cache, and the memory write operation of the channel weight tensor is suspended. After all channels in the currently processed calculation group have completed quantization iterations, a delayed batch processing mechanism is used to merge and update the error updates in the cache in a single batch. At this point, the quantization process in S2 is completed, generating a lightweight industrial software component.

[0062] S216. Analysis of Model Memory Usage and Computing Power Requirements in Industrial Software Components. After quantization and generation of lightweight industrial software components, the physical resource consumption of the lightweight industrial software components is quantified and calculated to generate computing power requirement constraints. The memory usage during model inference mainly consists of the memory usage of model parameters and key-value (KV) cache. To simplify the analysis, this embodiment assumes that the batch size during model inference is 1.

[0063] First, the static physical memory requirements are analyzed. The linear layer weight tensor dimensions of the multi-head attention module (MHA) and feedforward neural network module (FFN) in the deep learning model included in the lightweight industrial software component are analyzed to estimate the GPU memory usage of the parameter layers. It is assumed that the model consists of L identical decoder blocks stacked together, each decoder block containing one MHA, one FFN, and two normalization layers. The MHA part consists of four linear layers, corresponding to the weight matrices of Q, K, V, and the output mapping, respectively. And bias terms. Memory usage of MHA module parameters. It can be estimated using the following formula: (15) in, This indicates the average number of bytes occupied by each parameter. This represents the hidden layer dimension. The FFN module consists of two linear layers, with the corresponding weight matrix being... Therefore, the memory usage of the FFN module's parameters is as shown in the following formula: (16) Besides MHA and FFN, the normalization layer and embedding layer also contain a small number of parameters. To simplify the calculation, this embodiment considers ignoring the influence of the first-order terms in the parameter formula. Therefore, the total memory usage of the model parameters, i.e., the number of bytes of static video memory, can be expressed by the following formula: (17) in, L This indicates the number of stacked layers of decoder blocks in a deep learning model.

[0064] Next, we analyze the dynamic runtime resource requirements. KV caching is a mechanism introduced to accelerate autoregressive generation tasks, used to store the K and V vectors required by the decoder when generating each token. Combined with the preset or estimated maximum output sequence length, the dynamic memory peak is shown in the following formula: (18) in, Indicates the length of the input sequence. This represents the length of the output sequence, which can be obtained using the response length predictor. The total memory usage for model inference is: (19) Then, the tensor matrix multiplication path in the deep learning model is traced, and the total number of floating-point operations required for the pre-filling computation stage and the decoding generation stage is accumulated. For models in industrial software components, the number of floating-point operations (FLOPs) mainly comes from matrix multiplication operations in the MHA and FFN modules. Therefore, this section will focus on the computing power requirements of these two modules. In addition, it should be noted that this section also assumes that the batch size of the model inference task is 1 when calculating the number of floating-point operations.

[0065] In the pre-filling stage of model inference, MHA maps the input token sequence into Q, K, and V vectors through three independent linear projection layers, laying the foundation for subsequent attention computation. In the decoding stage, MHA calculates the attention score between the current token and the generated tokens, and uses a masking mechanism to progressively generate the output sequence, thereby capturing contextual information and ensuring the coherence and accuracy of the generation. The total number of floating-point operations for MHA is shown in the following formula: (20) In the pre-filling stage of model inference, the FFN module performs a nonlinear transformation on the input token sequence through a two-layer fully connected network, enhancing the model's representational power and providing richer features for the subsequent decoding stage. During the decoding stage, FFN performs a similar nonlinear transformation on each generated token to further optimize its representation. Overall, the FFN module improves the model's expressive power through nonlinear mapping, thereby more effectively capturing and processing complex patterns in the sequence. The total number of floating-point operations in the FFN module is shown in the following formula: (twenty one) Finally, the output layer is responsible for mapping the model's final representation to the target space and generating a probability distribution using the softmax function to determine the next token or the final output. This part has a computational cost of [missing information]. ,in This represents the vocabulary size. In summary, the total computational cost required for model inference is shown in the following formula: (twenty two) The calculated static memory usage (in bytes), peak dynamic memory usage, and total number of floating-point operations are packaged and serialized to generate a rigid physical resource constraint input vector. This rigid physical resource constraint input vector is then used as the computing power requirement constraint. At this point, the S2 model quantization execution step is complete, and its output—the lightweight industrial software components and computing power requirement constraints—will serve as the input for the subsequent S3 resource collaborative scheduling step.

[0066] S3. After outputting the lightweight industrial software components and their computing power requirement constraints in S2, the resource collaborative scheduling step begins. This step acquires the operational status data of heterogeneous edge computing nodes within the target industrial cluster area, constructs the operational status data into an observation state vector, and, within the framework of a partially observable Markov decision process, uses a cascaded long short-term memory network to extract the implicit state representation under discontinuous observations from the observation state vector. This representation is then input into the converter model. Based on the computing power requirement constraints, an autoregressive sequence calculation is performed to generate routing deployment instructions for multiple cluster physical nodes. The lightweight industrial software components are then distributed to the corresponding physical edge nodes via the network to complete container instantiation and execution. This step is implemented in two levels: S313 is the single-cluster level, employing a hybrid deployment method based on offline value assessment; S314 is the multi-cluster level, employing an adaptive multi-cluster deployment method based on long short-term memory networks and the converter model.

[0067] S313, such as Figure 7 As shown, a hybrid deployment method based on offline value assessment is employed. Within a single edge cluster, resource profiling information is utilized, combined with an MDP model and offline learning, to dynamically select the highest priority strategy (such as latency priority or load balancing) to achieve efficient utilization of local resources. Since the components requested by each workflow during deployment inherently have a sequential relationship in processing and deployment, this process is easily considered as an MDP, specifically defined as follows.

[0068] State: The deployment of industrial microservice components depends on the current system state. This includes the current system state, the deployment of preceding components in the current workflow, and resource requirements. Specifically, for requests... r Assuming I k This indicates the number of requests that have been processed so far. k Each component defines the complete system state. as follows: (twenty three) in, Indicates time Expected system computing resource status Indicates time The observed system network status. The third item in the status definition also takes into account information about previously deployed components, aiming to minimize the system's implicit information.

[0069] Action: The definition of an action can be simply recorded as the deployment node of the current component.

[0070] Rewards: To better evaluate the effectiveness of a deployment strategy, the actual reward for a component needs to be determined after the workflow is fully deployed, taking into account the execution status. However, simply assigning the effectiveness of the current deployment as the overall reward to the component deployed in the last step would result in overly sparse rewards, making it difficult to learn the evaluation of state value.

[0071] Therefore, this embodiment divides the reward calculation into two parts. The first part obtains the complete reward based on the execution status of the workflow, and the second part backtracks the reward based on the importance of the microservice instance in the overall workflow. The complete reward definition for the first part is as follows: (twenty four) in, : Indicates the number of microservice components (instances) contained in request r.

[0072] : Indicates the estimated total delay from request r to response to completion.

[0073] : Indicates the frequency of periodic requests. This refers to the maximum duration or latency of the request.

[0074] : This is a binary variable indicating whether the current request r has been successfully deployed (1 indicates successful deployment, 0 indicates unsuccessful deployment).

[0075] , and : These represent a set of periodic requests with a defined time limit, a set of normal periodic requests without a defined time limit, and a set of randomly arriving user requests, respectively.

[0076] : indicates in The load balancing status indicator of the overall system resources at a given moment (i.e., after the deployment cycle ends).

[0077] : Indicates the pure CPU computation latency required for request r to execute.

[0078] For normal periodic requests and user requests, a similar reward setting method is adopted. Normalization is used to ensure that the score obtained by each component is not too small, and the total score is amplified according to the number of components. Considering that the data transmission latency and resource consumption of each component have different proportions in the workflow execution, this embodiment uses... This is used to mark the proportion of data transfer and resource consumption of this instance in the workflow. Then, ultimately, for the current instance... I The second part of its deployment rewards R I r Defined as: (25) By transforming the deployment data of the prior algorithm into the above state-action-reward pairs, the data of the historical deployment process is transformed into data conforming to a Markov decision process. For the currently evaluated prior algorithm π, the expected reward for state s is predicted. Expanded using the Bellman equation, it becomes: (26) in, : indicates that under a given prior policy The mathematical expectation.

[0079] : Represents the discount factor, and ∈[0, 1], used to measure the weight of the impact of future long-term rewards on the value of the current state.

[0080] : Represents the immediate reward obtained after the system executes the deployment action at time t.

[0081] A: Represents the system's action space, that is, the set of all possible deployment actions.

[0082] : Represents the policy function, which is the probability of choosing action a in the current state s.

[0083] S: Represents the system's state space, i.e., the set of all possible system states.

[0084] s': Indicates the next new state that the system transitions to after taking action a in state s.

[0085] : Represents the state transition probability, that is, the probability of performing action a in state s and transitioning to state s'.

[0086] : Represents the expected immediate reward obtained by performing action a in state s.

[0087] Considering that most non-iterative deployment algorithms will adopt the same deployment scheme under the same system state (such as greedy algorithms, polling, and other manual prior algorithms), the policy's choice of action can be approximated as 1. Similarly, assuming that the deployment action takes a short time, and approximating the state transition probability as 1, we can further approximate it as follows: (27) Similarly, since the observed system state and the actual system state are not significantly different in the current deployment, and assuming the deployment process is short, it can be assumed that the next observed system state change must be caused by the previous deployment. P(s'|s,a) If we approximate it to 1, then we can further approximate it to: (28) Based on this, two neural networks are used to evaluate the state value function of the prior algorithm. By fitting the data using two networks, we can minimize the overestimation of state values ​​and enhance the stability of training. (29) in, Yes V π (s) The state value fitting function, and θ tar and θ These are the parameters of two neural networks.

[0088] Using two neural networks for training can minimize the impact on V π (s) This overestimates the target value while enhancing training stability. The difference between the target and the predicted value is calculated using Euclidean distance: (30) Finally, the prior algorithm is fitted using recorded historical deployment data. π State value function V π (s)。After fitting the state value functions for all prior algorithms, the deployer, for online deployment decisions, selects the most reasonable deployment algorithm from all prior algorithms based on the state value and generates a deployment strategy. Considering the accuracy of function fitting, to ensure that the currently selected deployment algorithm is indeed the optimal one, a custom leading ratio (kahead) is used. The prior algorithm is only selected if its state value exceeds the maximum leading ratio (kgead); otherwise, it is randomly selected within a certain range.

[0089] S314, such as Figure 8 As shown, an adaptive multi-cluster deployment method based on LSTM and Transformer is presented. In this scenario, the deployment problem can be divided into request sequence generation and deployment decision generation, both of which rely on model learning. Specifically, at the beginning of the current time slot, the deployer inputs all requests arriving in the previous time slot and the currently observed system state into the model. The model selects suitable requests and simultaneously generates deployment preferences and the selected clusters. Furthermore, the selected cluster, based on the deployment preferences, determines the appropriate combination of prior deployment algorithms under those preferences to complete the generation and deployment of the deployment scheme for that request. Finally, the deployer repeats this process to ultimately complete the generation of the request processing order and the deployment scheme in a multi-cluster environment.

[0090] Although the deployment process can be approximated as an MDP, the lag in deployment actions and the influence of previously deployed components can lead to the violation of Markov property. For example... Figure 9 As shown, changes in cluster state can be influenced by previously deployed components, thus violating Markov property. The longer the deployment scheme generation time and the larger the number of clusters, the more uncertain the system state becomes, making computationally time-consuming deployment algorithms more difficult to use in real-world environments. Therefore, this paper models it as a POMDP problem. POMDP is similar to MDP, but the state is not fully observable; the agent needs to learn by judging the quality of actions based on partial state observations and reward feedback. The following explains the state observations, actions, and rewards involved.

[0091] Observable State: The observable state will include three parts: the state of the current pending request sequence, the current cluster resource state, and the component information of each node with the nearest estimated execution time. The current cluster resource state can be viewed as the superposition of the states of various cluster resources. Assuming that in the current time slot τ, the request... r Processed, used Representing clusters i The corresponding node and network status information are then used to define the observation of system resources: (31) System resources mainly include node status and network status. The corresponding information from different clusters is superimposed together as the current system resource observation.

[0092] The second part consists of observations of the current sequence of requests to be processed, which can also be viewed as an overlay of information from individual requests: (32) The third part considers that the most recently completed component on each node will cause random changes to the corresponding node's resources, so information about the most recently completed component is also included in the observations to enhance the awareness of the environment: (33) in, Indicates for cluster i , t r The closest observed time to the estimated completion time T r estimate This provides information about one microservice component instance. This information enhances the model's awareness of historical deployments during subsequent observations of the current system's actual state, thus leading to better modeling of the current system's actual state.

[0093] Based on the above, the final observed state is: (34) The aforementioned observation state vector corresponds to the "acquiring the operating state data of heterogeneous edge computing nodes within the target industrial cluster area and constructing the operating state data into an observation state vector" as described in claim 1.

[0094] Actions: Observation of pre-deployment actions will no longer be simply the actual deployed machine, but rather the current deployment request, the corresponding deployment preference, and the cluster to which the deployment is selected. Assume there are currently m deployment preferences, denoted as... Each preference takes the value [0,1] and satisfies For the request being processed now r The actions recorded by the system will consist of three parts: the set of microservice applications selected for the request. sc r The currently selected deployment mode The cluster to be deployed is determined by the current request. Cl : (35) Reward: Considering that the deployer is processing a complete request rather than a single microservice component at this point, a single step can determine the quality of the request's deployment under the current observed state. Therefore, we can directly focus on the optimization goal, and the reward is set as follows: (36) For the implementation of the prior algorithm, two different types of prior algorithms were set up in the experiment. One is the dqn model with time preference (…). dqn_runtime_priority This prior algorithm is trained using the DQN algorithm, and only considers [the prior algorithm] during deployment. Minimize it, and the reward function used is: (37) in: The reward that the prior model with time preference receives when processing request r.

[0095] : Adjust the weight based on the time preference set for request r.

[0096] : Indicates the number of microservice component instances contained in request r.

[0097] : Indicates the pure computational latency required for request r to execute (i.e., the minimum computational time under ideal conditions).

[0098] : Indicates the total estimated completion time of the system for request r (including the combined time spent queuing, transmission, container startup, and computation).

[0099] DQN model with load preference ( dqn_res_priority This prior algorithm is trained using the DQN algorithm, and only considers [the prior algorithm] during deployment. Minimize it, and the reward function used is: (38) in: The reward that a prior model with load preference receives when processing request r.

[0100] : Indicates that the system is in The actual resource load balancing status indicator at a given time (i.e., before the deployment decision for request r).

[0101] : Indicates that the system has performed the deployment action (i.e. The expected resource load balance status indicator at a given time.

[0102] To address the aforementioned POMDP problem, this embodiment employs a Long Short-Term Memory (LSTM) network combined with a Transformer model. The LSTM is used as an extractor of the system's potential states, thus avoiding state jumps caused by incomplete observations of the system. For the request ordering and decision generation problem, it is treated as a sequence generation problem, using the Transformer model as the generator to produce the current optimal decision based on the currently specified expected future reward and state observations. The interaction process of the complete module is as follows: Figure 10 As shown.

[0103] like Figure 11 As shown, the LSTM module is designed as follows: Current observation state The implicit state of the previous time step The data is passed to the LSTM through a fully connected layer. The observation state vector, which includes delay compensation information, is synchronously fed into the Long Short-Term Memory network along with the hidden state from the previous time slot. The current hidden state representation, which smoothly transitions to local states, is calculated and output through a recurrent gating unit. This embodiment uses the hidden state obtained at each time step. This information represents the cumulative information of the system's historical operation at this moment. This information is then concatenated with the system state observations at the next time step and input into the fully connected layer to obtain the current hidden state representation.

[0104] After obtaining the current implicit state representation, such as Figure 12 As shown, this representation will be used as an estimate of the actual state of the system, along with historical actions and the target reward adjustment factor, as input into the converter model. The target reward adjustment factor (rtg) at time t is denoted as: (39) The role of RTG is similar to state value; this design allows the converter model to learn the relationship between each current action and the expected future reward from offline data. The target reward adjustment factor corresponds to the target reward adjustment factor representing the minimization of network throughput latency, which is composed of component data transmission latency and physical resource occupancy ratio. The converter model takes the current implicit state representation, the computing power requirement constraints, the action vector list of historically deployed clusters, and the target reward adjustment factor as parallel inputs, uses a self-attention mechanism to perform multi-dimensional correlation calculations, and finally outputs the globally optimal physical cluster identifier and local routing instructions that the current industrial software component should be allocated in an autoregressive manner, triggering the target cluster controller to execute container image pull and deployment actions.

[0105] Because the LSTM and converter modules have a clear upstream and downstream training relationship, end-to-end training can be achieved. Historical observation information is input into the LSTM and combined with the observations of the next time step to obtain the latent state representation, which is then input into the converter model to obtain the predicted action at that time. Finally, the predicted action and the actual action are used to calculate the cross-entropy loss, thereby allowing the model to learn the relationship between expected reward, state, and action.

[0106] To evaluate the technical effectiveness of the resource collaborative scheduling method proposed in this embodiment, the following two sets of experiments were conducted.

[0107] The first set of experiments validated the hybrid deployment method based on offline value assessment. The experiments used the CPU resources of the training machine as the available computing resources for each node, set to 3.3 GHz × 8. The network topology was set randomly, assuming a maximum available bandwidth range of 20 mb / s to 70 mb / s, and treated as a full-duplex link. Considering that actual industrial control processes may involve the processing of large amounts of audio and video signals, the experiments considered three different types of audio and video workflows: video streaming encoding (simulating the process of video signal compression encoding), image capture and recognition (simulating the process of keyframe capture and defect recognition), and video recognition (implementing video recognition and tracking tasks). Each instance ran on the training machine as a Docker container. `perf` was used to obtain the number of CPU cycles consumed during execution, and `docker inspect` was used to obtain the Docker container startup time and instance exit time. To simulate the hardware constraints of the instances, `fns` were generated randomly. Finally, this data was saved as known data, resulting in Data Table 2. As shown in Table 2, instance numbers 0 / 1 / 2 correspond to MP4 to H.265 video encoding (720p, 1 / 5 / 10 seconds), with a computational resource consumption of 2.64 G / s and a connection resource consumption of 0.132 G / s; instance numbers 10 / 11 correspond to video recognition (720p, 5 / 10 seconds), with a computational resource consumption of 3.32 G / s; instance numbers 12 / 13 correspond to keyframe extraction and recognition (720p, 5 / 10 seconds), with a computational resource consumption of 2.78 G / s; and instance number 20 corresponds to saving a single data entry in the database, with a computational resource consumption of 1.32 G / s.

[0108] Table 2. Data table of some microservice instances involved in the hybrid deployment experiment based on offline value assessment.

[0109] After configuring the microservice components, the next step is to configure the workflow information. Workflow topologies include 0→20, 1→20, and 3→10→20. Based on the characteristics of the workflow, some workflow requests are configured as periodic requests, and these are further divided into periodic requests with a latency cap and normal periodic requests without a latency cap. Ultimately, after this division, we have 6 types of periodic requests with latency caps, 9 types of normal periodic requests, and 9 types of user requests. The initiation period for periodic requests with latency caps ranges from 1.1 seconds to 4.8 seconds, the initiation period for normal periodic requests ranges from 5.3 seconds to 38.8 seconds, and the initiation frequency of user requests is set to 10 per second.

[0110] For the implementation of the prior algorithms, the experiment set up two different types of prior algorithms, each with its own focus. The DQN model with time preference (dqn_runtime_priority) was trained using the DQN algorithm, and only the estimated execution time was minimized during deployment. The DQN model with load preference (dqn_res_priority) was trained using the DQN algorithm, and only the load balancing metric was minimized during deployment. For the comparative analysis with other algorithms, the experiment set up comparisons with various algorithms, including heuristic algorithms (greedy+ga combined with greedy algorithm and greedy+particle combined with particle swarm optimization algorithm), prior algorithms (dqn_runtime_priority, dqn_res_priority), and online reinforcement learning algorithms (PPO). To minimize data fluctuations, each comparison algorithm was executed in the environment for 300 seconds, repeated 10 times, discarding the data from the first 10 seconds and averaging the corresponding metrics.

[0111] like Figure 16 As shown, the system status and corresponding rewards for each algorithm during runtime indicate that the HDD algorithm (i.e., the hybrid deployment method based on offline value assessment proposed in this embodiment) can approach the latency of a priori algorithm that only considers latency, and maintains a load balancing effect close to that of the heuristic algorithm, thus neutralizing the characteristics of the two different priori algorithms. Regarding the total reward value, the HDD algorithm performs better than the test algorithm. Next are the heuristic algorithm combining a greedy algorithm and the dqn_res_priority algorithm that only considers load; both of these show relatively stable performance. The worst performing algorithms are the PPO algorithm and the dqn_runtime_priority algorithm. It can be seen that, regardless of whether it is optimization objective one or optimization objective two, HDD can maintain the gap with the currently better-performing priori algorithm as much as possible, thereby maximizing the advantages of different priori algorithms to achieve optimization for multiple different objectives.

[0112] While the greedy algorithm combined with heuristics performs well in optimizing objectives one and three, its inherently long computation time still drags down the execution time of normal periodic requests and user request types. For example... Figure 17 As shown, both heuristic algorithms consumed nearly 30% of the time for generating deployment schemes, with the generation time ranging from 1.2 to 1.6 seconds. In contrast, the average latency of deployment schemes obtained using the neural network-based approach was less than 0.2 seconds.

[0113] The second set of experiments validated the adaptive multi-cluster deployment method based on Long Short-Term Memory (LSTM) networks and a converter model. Three virtual clusters with varying numbers of machines, computing power, and network structures were created, with no network paths between clusters. Two types of machines with different computing power were involved: high-power machines with a maximum computing power of 3.3 GHz × 8 cores per node, and low-power machines with a maximum computing power of 2.6 GHz × 6 cores per node. The high-power cluster (id=0) contained 10 nodes, each generated by a high-power machine; the medium-power cluster (id=1) contained 10 nodes, including 5 high-power machines and 5 low-power machines; and the low-power cluster (id=2) contained 15 nodes, each a low-power machine. Network connections for each cluster were initialized randomly, with a link connection rate of 0.15 and a bandwidth range of 3-7 MB / s.

[0114] For the request workflow, the corresponding persistent network bandwidth requirement needs to be calculated for each instance. The calculation method is as follows: (40) Considering that the task involves processing sequence generation and deployment mode selection, this embodiment keeps the deployer within each cluster unchanged and compares the algorithms used to generate processing sequences and deployment modes. Specifically, the comparison algorithms used are: a random algorithm (completely random selection of requests to be processed and deployment modes); a priority-based processing sequence and a deployment mode considering request type (priority+type), where a manual prior algorithm prioritizes requests with low average resource usage, selecting the cluster with the highest average idle resource ratio. The deployment mode is generated based on the characteristics of the current request—for periodic request types, if the computation time is less than 5 seconds, a per-lb deployment mode is selected; if it is greater, a per-time deployment mode is selected; for user request types, a per-balance deployment mode is used; and a priority-based processing sequence and a single deployment mode (priority+per-time / balance / Lb), similar to priority+type, except that the deployment modes are fixed as per-time, per-balance, and per-Lb.

[0115] CQL Algorithm: The system is trained directly using the CQL algorithm, with current system observations as the actual state. Considering the system state observation dimension is close to 500 (including available resources for each node, network bandwidth description, and information about components nearing completion on the current node), PCA is used to transform the state in the offline dataset and apply it to the online deployment. The network structure is set to 256→512→512→512→512→2, where the 256-dimensional input is the dimension after PCA dimensionality reduction. The learning rate is set to 0.005, the number of updates is set to 2,500,000, the initial alpha value is 0.8, and the target alpha is set to 1.

[0116] LSTM+Transformer (LsTr): The method proposed in this embodiment combines LSTM and Transformer. Similarly, PCA is introduced to reduce the initial dimensionality, also to 256 dimensions. In the LSTM module, a single LSTM layer and three fully connected 256-dimensional layers are used to achieve input dimensionality reduction and output transformation. The Transformer network consists of four stacked attention layers, each containing eight attention heads. The learning rate is set to 0.005, the batch size to 256, and the update frequency to 4000. Four different sequence lengths are selected based on the settings: 10 / 30 / 60 / 90, denoted as LsTr+10 / 30 / 60 / 90 respectively.

[0117] like Figure 18 As shown, when the current sequence length is stretched to 90, the LsTr algorithm, except for a nearly 0.5-second difference in cumulative average execution time compared to other algorithms, exhibits better performance in throughput and acceptance rate. The fact that different deployment algorithms based on prior knowledge show similar performance indicates that in this scenario, using a relatively fixed deployment scheme may not achieve optimal results. While the initial impact may be minimal, over a long period, unreasonable resource consumption can lead to a decrease in request acceptance rate. Therefore, the impact of different deployment schemes on the resources of the three clusters is plotted. Figure 19 .

[0118] like Figure 19As shown, most manually pre-defined algorithms have an average system resource utilization of around 0.7 across the three clusters. The CQL algorithm exhibits significant differences in resource utilization across clusters, with an average utilization exceeding 0.9 for cluster 0, while the average node resource utilization for clusters 1 and 2 is between 0.4 and 0.7. This indicates that the CQL algorithm tends to overload cluster 0 with a large number of tasks, making it difficult to adapt to heterogeneous node resources. In contrast, the algorithm LsTr proposed in this embodiment, with a window size of 90, achieves a resource utilization greater than 0.8 for any cluster, and maintains a similar level of resource utilization across different clusters. This further demonstrates the algorithm's excellent resource management capabilities, enabling it to select appropriate task sequences and corresponding deployment modes for different types of clusters, thus reducing resource waste.

[0119] Considering that the size of the observable historical data window may also affect performance, this experiment evaluates the impact of window size. The experiment tested four different window sizes, and the corresponding reward acquisition results are as follows: Figure 20 As shown. Figure 20 In the table, (a) and (b) show the recorded reward values ​​observed during the runtime of each algorithm, and (c) shows the distribution of rewards obtained by each algorithm. It can be seen that the window size does indeed affect the performance of LsTr. When the window size is set to 10, which is significantly smaller than the maximum number of requests initiated, a reasonable deployment scheme cannot be obtained, and its reward performance is the worst among all algorithms. When the window size is increased to 30 and 60, the reward gradually approaches the performance of a window size of 90, indicating that a larger window size can indeed improve the model's performance because longer-term system actions and historical information are taken into account. However, this comes at the cost of a longer single-step processing time, which may require a trade-off between different window sizes. Among all algorithms, the CQL algorithm exhibits the largest difference in the distance between the Q1 and Q3 quantiles, and also the largest difference between the maximum and minimum values, indicating significant fluctuations in the obtained rewards. This further illustrates the poor performance of the CQL algorithm in the POMDP problem due to state jumps. In summary, this algorithm enables the collaborative deployment of industrial software components. Based on the current system status observation and the optimization objective, it selects the most suitable pending request and corresponding deployment cluster and preference, and deploys the request through the deployment scheduler within the cluster.

[0120] This embodiment also provides an edge controller computing device for implementing edge-end collaborative management, comprising: a system data bus, a cache, a non-volatile physical storage medium, and at least one multi-core central processing unit. The non-volatile physical storage medium contains a machine-readable sequence of computer program instructions. The at least one multi-core central processing unit reads the computer program instruction sequence from the non-volatile physical storage medium via the system data bus and loads it into the cache for execution, thereby triggering the edge controller computing device to execute the industrial software component collaborative deployment method for edge computing environments in industrial clusters described in the above embodiment. In actual deployment, the edge controller computing device can adopt hardware forms such as industrial-grade servers, edge computing gateways, or embedded computing platforms with GPU acceleration capabilities.

[0121] This embodiment also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it implements all the steps of the collaborative deployment method for industrial software components in an edge computing environment for industrial clusters described in the above embodiment. The computer-readable storage medium includes, but is not limited to, read-only memory (ROM), random access memory (RAM), magnetic disk, optical disk, flash memory, or any medium capable of storing computer program code.

[0122] The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

[0123] The contents not described in detail in this specification are existing technologies known to those skilled in the art.

Claims

1. A method for collaborative deployment of industrial software components in the edge environment of an industrial cluster, characterized in that, The following steps are performed using the physical processors and memory of the deployment control system: Workflow blueprint construction steps: Receive an industrial control task request containing a natural language character sequence, call a pre-trained language model to parse the natural language instructions, and combine a dynamic modular retrieval enhancement generation matching mechanism to parse and map the industrial control task request into a structured deployment blueprint composed of multiple edge physical industrial software components. The data structure of the structured deployment blueprint includes a dimension-reduced and compressed component call sequence and its corresponding network data interaction interface and storage resource consumption characteristics. Model quantization execution steps: Analyze the computationally intensive industrial software components containing deep learning models in the structured deployment blueprint, calculate the sensitivity scores of each output channel in the deep learning model to the accuracy of the underlying mechanical control commands, assign differentiated numerical quantization bit widths to the weight tensors of different output channels, call the general post-training quantization algorithm to perform matrix truncation and error compensation calculations on each computation group, generate lightweight industrial software components, and perform physical resource consumption quantization calculations on the lightweight industrial software components to generate computing power requirement constraints; Resource collaborative scheduling steps: Obtain the operating status data of heterogeneous edge computing nodes within the target industry cluster area, construct the operating status data into an observation state vector, and, within the framework of a partially observable Markov decision process, use a cascaded long short-term memory network to extract the implicit state representation under discontinuous observations from the observation state vector and input it into the converter model. Based on the computing power requirement constraints, perform autoregressive sequence calculation to generate routing deployment instructions for multiple cluster physical nodes, and distribute the lightweight industrial software components to the corresponding physical edge nodes through the network to complete container instantiation and execution.

2. The method according to claim 1, characterized in that, The workflow blueprint construction steps include: A three-level independent vector database system, comprising a task splitting library, a component interface attribute library, and a historical orchestration library, is constructed in the physical storage medium. Upon receiving the industrial control task request, a retrieval module is invoked to extract matching task sub-templates from the task splitting library using semantic vector similarity matching conditions. The extracted matching task sub-templates are input into the language model to generate structured sub-task feature vectors, and vector retrieval is performed in the component interface attribute library using the sub-task feature vectors to locate the corresponding physical software component application programming interface entity. A component call sequence is constructed based on the physical software component application programming interface entity, and redundant natural language interpretation characters generated by the language model are filtered out according to an agreed dimensionality reduction protocol to generate the structured deployment blueprint in a specific compression format. The structured deployment blueprint is then translated into a standard deployment description file.

3. The method according to claim 2, characterized in that, After generating the structured deployment blueprint in a specific compressed format and before translating it into the standard deployment description file, the method further includes: Trigger the workflow state machine verification mechanism to perform multi-dimensional verification on the structured deployment blueprint; the dimensions of the multi-dimensional verification include the number of components, component names, event connection relationships, data connection relationships, and event-data association relationships; after the multi-dimensional verification is passed, the verified structured deployment blueprint is output.

4. The method according to claim 1, characterized in that, The steps for calculating the sensitivity score of each output channel in the deep learning model to the accuracy of the underlying mechanical control commands include: Load the network layers of the target deep learning model and extract the output channel dimension array of each layer; calculate the gradient vector of the model loss function using the historical sampling dataset of the physical device's operating status, and calculate the main diagonal elements of the Fisher information matrix based on the gradient vector to approximate the Hessian second derivative matrix; traverse each output channel in the output channel dimension array, and calculate the impact of numerical truncation of a single output channel on the overall network output loss based on the approximate Fisher information matrix formula, which is used as the sensitivity score of the output channel to the accuracy of the underlying mechanical control commands.

5. The method according to claim 4, characterized in that, The step of assigning differentiated numerical quantization bit widths to the weight tensors of different output channels includes: The sensitivity score is comprehensively evaluated in conjunction with the physical execution function attributes of the component, and divided into multiple priorities. The priorities are then mapped to the control bit width of specific physical channels. A first-precision floating-point bit width is allocated to high-sensitivity output channels that directly output underlying mechanical control electrical signals; a second-precision fixed-point bit width is allocated to medium-sensitivity output channels that handle intermediate state data transmission; and a third-precision bit width is allocated to low-sensitivity output channels that handle asynchronous edge logging system recording. The bit width of the first precision is greater than that of the second precision, and the bit width of the second precision is greater than that of the third precision.

6. The method according to claim 4, characterized in that, The steps of calling the general post-training quantization algorithm to perform matrix truncation and error compensation calculations for each computation group include: Based on the differentiated numerical quantization bit widths allocated to different output channels, all output channels are divided into multiple independent computation groups. During channel-by-channel quantization iteration, a group-isolated update strategy is adopted. For channels within the currently processed computation group, after performing weight matrix numerical truncation, real-time error compensation calculation based on the inverse Hessian matrix is ​​performed to update the weights of other related channels within the same group. For channel weight tensors outside the currently processed computation group, their corresponding cumulative error update amounts are temporarily stored in a cache, and the memory write operation of the channel weight tensors is suspended. After all channel quantization iterations within the currently processed computation group are completed, a delayed batch processing mechanism is used to merge and update the error update amounts in the cache in a single batch.

7. The method according to claim 1, characterized in that, In the model quantization execution step, the process of generating the computing power requirement constraints includes: For the static physical memory requirements of the lightweight industrial software component, the linear layer weight tensor dimensions of the multi-head attention module and feedforward neural network module in the deep learning model included in the lightweight industrial software component are analyzed to estimate the memory usage of the parameter layer and obtain the number of bytes of static memory occupied. For the dynamic operating resource requirements of the lightweight industrial software component, the peak dynamic memory occupied by the key-value caching mechanism in the autoregressive generation task is calculated in combination with the preset or estimated maximum output sequence length. The tensor matrix multiplication path in the deep learning model is traced, and the total number of floating-point operations required for the pre-filling calculation stage and the decoding generation stage is accumulated. The number of bytes of static memory occupied, the peak dynamic memory, and the total number of floating-point operations are packaged and serialized to generate a rigid physical resource constraint input vector, and the rigid physical resource constraint input vector is used as the computing power requirement constraint condition.

8. The method according to claim 1, characterized in that, The process of extracting the implicit state representation under discontinuous observations in the resource collaborative scheduling step includes: Obtain the observation state vector within the current time slot; wherein the observation state vector is concatenated with the remaining available CPU processing cycles of each edge physical node, task queue characteristics, and the current bandwidth throughput status of the network link, and is concatenated with the resource release identifier data of the recently allocated and closest to the estimated completion time of the running component on the target node; the observation state vector containing latency compensation information and the implicit state of the previous time slot are synchronously sent into the long short-term memory network, and the current implicit state representation with smooth local state transitions is calculated and output through the cyclic gating unit.

9. The method according to claim 8, characterized in that, The process of generating routing deployment instructions for multiple cluster physical nodes in the resource collaborative scheduling step includes: The current implicit state representation, the computing power requirement constraints, the action vector list of historically deployed clusters, and the target reward adjustment factor representing the minimization of network throughput latency are input in parallel into the converter model, wherein the target reward adjustment factor is composed of the component data transmission latency and the physical resource occupancy ratio; the self-attention mechanism of the converter model is used to perform multi-dimensional correlation calculation; the globally optimal physical cluster identifier and local routing instructions to be allocated to the current industrial software component are output, and the target cluster controller is triggered to perform container image pull and deployment actions.

10. An edge controller computing device for implementing edge-end collaborative management, characterized in that, include: The system includes a data bus, a cache, non-volatile physical storage media, and at least one multi-core central processing unit. The non-volatile physical storage medium contains a machine-readable sequence of computer program instructions. The at least one multi-core central processing unit reads and runs the sequence of computer program instructions, triggering the edge controller computing device to execute the collaborative deployment method of industrial software components for the edge environment of industrial clusters as described in any one of claims 1 to 9.