Federal incremental learning method in an air separation plant automatic variable load system
By optimizing the automatic variable load system of the air separation unit through federated incremental learning and dual attention mechanism, the problems of uneven oxygen consumption and data sharing were solved, the response speed and production stability were improved, and energy waste was reduced.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA NAT AIR SEPARATION ENG CO LTD
- Filing Date
- 2022-06-30
- Publication Date
- 2026-06-30
AI Technical Summary
In air separation units, the uneven oxygen consumption and the coupling of complex process parameters lead to large fluctuations in production load and slow response speed. Furthermore, existing technologies cannot effectively utilize data sharing among multiple companies, resulting in low efficiency of automatic load change systems, inability to adjust in a timely manner, and energy waste.
We employ a federated incremental learning approach, combining an incremental learning module and a dual attention mechanism. We pre-train the model using a self-built dataset, and optimize model aggregation by leveraging convolutional neural networks and feature attention mechanisms. This reduces the need for retraining and improves model adaptability and efficiency.
This enables the system to operate without retraining when faced with new tasks and data, improving the variable load response speed and production stability of the air separation unit, reducing energy waste, and enhancing the learning efficiency and accuracy of the automatic variable load system.
Smart Images

Figure CN115169591B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to a federated incremental learning method for an automatic load change system of an air separation unit, used to learn data from the existing automatic load change system of the air separation unit. Background Technology
[0002] In the steel industry, due to the unique nature of the processes (such as converter top blowing, intermittent oxygen use, continuous oxygen enrichment in blast furnaces, and pulverized coal injection), the instantaneous oxygen consumption is large and discontinuous. Furthermore, the varying sizes of converters result in different peak and trough periods for oxygen consumption, leading to highly uneven oxygen demand, characterized by periodicity, stages, and intermittent patterns. This necessitates significant fluctuations in the production load of air separation units (75%–105%) to adapt to these changes. Without automatic load-changing technology, production loads often cannot be adjusted in a timely manner, forcing the release of excess oxygen, resulting in high oxygen emissions and substantial energy consumption and economic losses. Statistics show that the oxygen emission rate in Chinese steel plants is generally between 5% and 12%, while a 1% reduction in emission rate can save 2 million RMB per year, representing a significant energy-saving potential. Therefore, how to comprehensively apply advanced process modeling, optimization, and control technologies to achieve an automatic load-changing system for air separation units has become an urgent need for stable control and energy conservation in the air separation industry.
[0003] Air separation units involve numerous devices and long processes, with significant coupling between various process parameters. Traditional DCS control, based on single-loop PID controllers, is ill-suited to handle the control requirements of such complex processes. Therefore, frequent operator intervention is necessary when the air separation unit load changes. Even then, due to varying individual experience and different operating conditions, load changes often result in large fluctuations and slow response times, leading to waste such as venting.
[0004] Existing MPC (Multivariate Model Prediction) technology is specifically designed for complex processes and advanced controllers are developed. These controllers can use models to reflect the relationships between various variables (dynamic changes & steady-state gains). By predicting the process variables of the equipment, corresponding control and adjustment measures can be formulated in a timely manner, thereby ensuring the stability of the production equipment, reducing fluctuations, and improving the variable load response rate of the air separation unit.
[0005] However, the problem is that this multivariate model prediction technology requires precise modeling, is affected by various variables and noise, and cannot be extended to new tasks; the model needs to be redesigned for new tasks. Compared to traditional model-based prediction techniques, new data mining-based predictive control methods can solve these problems when there is sufficient data, but this solution also has requirements regarding the amount of data generated. Waiting for effective data from a single plant is often lengthy. Utilizing similar data from multiple homogeneous plants can save this waiting time. However, homogeneous plants often belong to different companies, and for confidentiality reasons, the owners are often unwilling to share data directly, but they all hope to save this waiting time, improve learning efficiency, and quickly obtain a data mining-based predictive control method to achieve a relatively accurate automatic load change system for the air separation unit. In addition, production tasks are constantly changing with development; new tasks will be continuously generated over time, making this another incremental task.
[0006] In the field of artificial intelligence, public concern for privacy and data security is constantly increasing. Federated learning can provide a good solution to problems such as data silos, severe data fragmentation, data heterogeneity, and uneven data distribution. Currently, machine learning and deep learning have also achieved great success in various fields, laying the foundation for federated learning algorithms to achieve better performance.
[0007] However, conventional federated learning algorithms can only be trained in batch settings, where all object classes are known in advance and training data for all classes can be accessed simultaneously in any order. When faced with new tasks and data, retraining is required. Summary of the Invention
[0008] The purpose of this invention is to overcome the above-mentioned shortcomings in the prior art and to provide a federated incremental learning method in an automatic variable load system of an air separation device with a reasonable structural design. Incremental learning is introduced into federated learning to help each client train data when there are only a few classes initially. As more classes are added, they can be added gradually for learning, and there is no need to retrain when facing new tasks and data.
[0009] The technical solution adopted by this invention to solve the above problems is: a federated incremental learning method in an automatic variable load system of an air separation device, characterized by the following steps:
[0010] Step 1: Set up the federated incremental learning module, including the following steps:
[0011] Step 1.1: Build a self-built dataset. Randomly divide each class of the self-built dataset into two parts and send them to each local client incrementally according to the class. Each local client performs incremental learning on the self-built dataset according to the two classes.
[0012] Step 1.2: The client first trains the local model parameters using a fixed number of samples of class 1, then aggregates the local model parameters into the global model using the federated averaging algorithm, and sends the aggregated model update back to the client. This process is repeated until the model converges, reaches the maximum number of iterations, or reaches the maximum training time. The global model parameters at this time are saved, and this step is used as pre-training.
[0013] Step 1.3: Distribute the pre-trained global model parameters to the client, and the client trains using the remaining samples of class 1; after training, send the client model parameters at the time of the t-th communication to the server, and the server sends the fused client model parameters to the client.
[0014] Step 1.4: The client then uses class 2 samples and some old data to form a training set for training. It uses a feature extractor to extract feature vectors from the new and old data and calculates their respective average feature vectors. It calculates the predicted values of the new and old data using the nearest mean classification algorithm. The predicted values are then substituted into a loss function that combines distillation and classification loss for optimization. The client model parameters at communication t+1 are obtained and sent to the server. This process is repeated until all classes have completed incremental learning.
[0015] Step 2: Build a federated learning dual attention mechanism model, including the following steps:
[0016] Step 2.1: Construct convolutional layers using a convolutional neural network to extract features from the input image and obtain matrix U; add a channel attention mechanism locally.
[0017] Step 2.2: Perform stimulus operations to capture the correlation between channels;
[0018] Step 2.3: Apply the channel attention weights to the previous features one channel at a time using scale multiplication to complete the recalibration of the original feature map in the channel dimension;
[0019] Step 2.4: Introduce a feature attention mechanism during global model aggregation to perform client-side model aggregation. This improves model performance by capturing the importance of neural network layers in multiple local models.
[0020] Step 2.5: Use a hierarchical soft attention method to capture the hierarchical importance of neural networks in multiple local models, and aggregate them as feature attention into the global model;
[0021] Step 2.6: The importance weights of the client model are calculated using a hierarchical soft attention method, with the server-side model as a query value and the client model as a key value, and the attention score of each layer in the neural network is calculated.
[0022] Step 2.7: During incremental learning, the number of neurons output by the last fully connected layer of the model is the number of dataset categories, which is dynamically changing. This causes a weight mismatch problem when loading the local model parameters to the server and calculating the attention score for each layer together with the global model from the previous communication. Therefore, before calculating the attention score for each layer, the weights of the last fully connected layer in all client models need to be averaged and then assigned to the last layer parameters of the global model from the previous communication.
[0023] In step 1.1 of this invention, the dataset includes the learning rate, weight decay coefficient, training batch, and number of local clients.
[0024] In step 1.1 of this invention, stochastic gradient descent is used as the learning rate, the initial learning rate is 0.2, the weight decay coefficient is set to 0.00001, and the training batch size is 128.
[0025] In step 2.1 of this invention, the channel attention mechanism includes compression, which transforms each two-dimensional feature channel into a real number F. 压缩 (u c The specific formula is as follows:
[0026]
[0027] Among them, u c It is the c-th two-dimensional matrix in matrix U, where the subscript c represents the number of channels. Using global pooling, the input features of size H×W×C are compressed into a 1×1×C vector. The current number of channels is 16.
[0028] In step 2.2 of this invention, the correlation F between channels 激励 (z,W), specifically as shown in the following formula:
[0029] F 激励 (z,W)=σ(g(z,W))=σ(W2δ(W1z)),
[0030] Where δ is the ReLU function, σ is the Sigmoid function. The activation operation uses two fully connected layers. Using r=2 as the scaling parameter can reduce the number of channels and thus reduce the computational cost.
[0031] In step 2.4 of this invention, the feature attention mechanism can automatically consider the weight of the relationship between the server model and the client. During iterative training, by continuously updating the parameters, it reduces the weighted distance between the server and client models, minimizes the expected distance between the server and client models, and optimizes the expectation function of the target.
[0032] Expected function of the optimization objective The specific formula is as follows:
[0033]
[0034] in, These are the model parameters of the server during the t-th communication. Here, D(·,·) represents the model parameters of client k during the (t+1)th communication. D(·,·) is the distance between the two sets of neural parameters calculated using the Euclidean distance formula. k It refers to the importance weight of the client-side model.
[0035] In step 2.5 of this invention, the gradient is obtained by differentiating the expectation function, and K clients perform gradient descent to update the parameters of the global model, as shown in the following formula:
[0036]
[0037] In step 2.6 of this invention, the attention weights of the l-th layer The specific formula is shown below:
[0038]
[0039] Among them, w l For the model parameters of the l-th layer of the server, Let l represent the model parameters of the l-th layer of the k-th local client, l∈[1,L]. The p-norm of the difference between the matrices is used as the similarity value between the query and key values of the l-th layer. The softmax function is used to calculate the attention value of the l-th layer of the k-th client. The feature attention of the entire client is then calculated.
[0040] In step 2.7 of this invention,
[0041]
[0042] Where K is the number of clients, w L The average weight parameters of the last layer of the neural network on the client side; the number of local clients is set to 2, namely client a and client b. The average weight parameters of the last layer of the neural network for client a. The average weight parameters of the last layer of the neural network for client b.
[0043] Compared with existing technologies, this invention has the following advantages and effects: It introduces incremental learning into federated learning, helping each client to train data initially with only a small number of classes. As more classes are added, training can proceed gradually, eliminating the need for retraining when facing new tasks and data. It can use a federated averaging model to train samples extracted from each client with the same number of samples, ensuring the balance of pre-trained samples and obtaining a global model on the server. Then, it combines the ICARL strategy with the federated learning framework to help cope with dynamic changes in training tasks and ensure data confidentiality. A dual attention mechanism is added to the federated incremental learning module, and a channel attention neural network model is designed on the client side and used as a local model in the federated learning. This helps the model obtain the importance of features of each client's overall samples during training, reducing the impact of noise. Finally, a federated aggregation algorithm based on a feature attention mechanism is designed in the global model to enhance the global model's performance in capturing key feature information. Attached Figure Description
[0044] Figure 1 This is the parameter data interface of the relevant controlled object in this embodiment of the invention;
[0045] Figure 2 This is a schematic diagram of the structure of the federated incremental learning module in an embodiment of the present invention;
[0046] Figure 3 This is a schematic diagram of the dual attention mechanism structure according to an embodiment of the present invention;
[0047] Figure 4 This is a schematic diagram of the client-side neural network structure according to an embodiment of the present invention;
[0048] Figure 5 This is a flowchart of the client-side neural network mechanism according to an embodiment of the present invention;
[0049] Figure 6 This is a schematic diagram of the global aggregation structure based on feature attention in an embodiment of the present invention; Detailed Implementation
[0050] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. The following embodiments are explanations of the present invention, but the present invention is not limited to the following embodiments.
[0051] This invention mainly comprises two modules: the first module is a federated incremental learning module, which first extracts the same number of samples from each client for pre-training. Considering the cost of retraining, an iCaRL strategy is added to the traditional federated learning framework, which can cope with dynamic changes in training tasks and ensure the confidentiality of user data. The second module is a dual attention mechanism. Considering the impact of large sample clients on the final model training results and the difficulty in compensating for feature imbalance during model aggregation, a channel attention neural network model is designed on the client side, and a federated aggregation algorithm based on the feature attention mechanism is designed on the global model.
[0052] Module 1: Federated Incremental Learning Module, including the following steps:
[0053] Step 1: Build the federated incremental learning module. The structure diagram of the federated incremental learning module is shown below. Figure 2 As shown, it includes the following steps:
[0054] Step 1.1: Stochastic gradient descent is used as the learning rate, with an initial learning rate of 0.2, a weight decay coefficient of 0.00001, a training batch size of 128, and two local clients (client a and client b). The self-built dataset is randomly divided into two classes, and the data is incrementally distributed to each local client according to the class distribution. Each local client performs incremental learning on the self-built dataset using the two classes.
[0055] Step 1.2: Each client first trains the model using a fixed number of samples from class 1 to obtain local model parameters. The federated averaging algorithm is then used to aggregate the local model parameters into the global model. The aggregated model update is then sent back to the client. This process is repeated until the model converges, reaches the maximum number of iterations, or reaches the maximum training time. The global model parameters at this point are then saved. Use this step as pre-training.
[0056] Step 1.3: Transfer the pre-trained global model parameters The samples are distributed to the client, which then trains using the remaining samples from class 1. After training, the client model parameters at the t-th communication are... The parameters are sent to the server, which then merges the model parameters. Send to the client.
[0057] Step 1.4: The client then uses class 2 samples and some old data to form a training set for training. A feature extractor is used to extract feature vectors from both the new and old data, and their respective average feature vectors are calculated. The predicted values for the new and old data are calculated using the nearest mean classification algorithm. These predicted values are then optimized using a loss function combining distillation and classification loss to obtain the client model parameters at communication t+1. Send it to the server, and repeat this process until all classes have been incrementally learned.
[0058] Module 2: Dual Attention Mechanism, including the following steps:
[0059] Step 2: Build a federated learning dual attention mechanism model. A schematic diagram of the dual attention mechanism structure is shown below. Figure 3 As shown, it includes the following steps:
[0060] Step 2.1: Construct convolutional layers using a Convolutional Neural Network (CNN). The CNN extracts features from the input image to obtain matrix U. A channel attention mechanism is added locally, consisting of compression and activation. Compression transforms each two-dimensional feature channel into a real number. This real number has a global receptive field to some extent, and the output dimension matches the number of input feature channels, as shown in the following formula:
[0061]
[0062] Among them, u c It is the c-th two-dimensional matrix in matrix U, where the subscript c represents the number of channels. Using global pooling, the input features of size H×W×C are compressed into a 1×1×C vector. The current number of channels is 16.
[0063] A schematic diagram of the client-side neural network structure is shown below. Figure 4 As shown.
[0064] Step 2.2: Perform the stimulus operation to capture the correlation between channels, as shown in the following formula:
[0065] F 激励 (z,W)=σ(g(z,W))=σ(W2δ(W1z)),
[0066] Where δ is the ReLU function, σ is the Sigmoid function. The activation operation uses two fully connected layers. Using r=2 as the scaling parameter can reduce the number of channels and thus reduce the computational cost.
[0067] Step 2.3: The channel attention weights *s* are applied channel-by-channel to the previous features using a scale multiplication method, completing the recalibration of the original feature map along the channel dimension. The flowchart of the client-side neural network mechanism is as follows: Figure 5 As shown.
[0068] Step 2.4: Introduce a feature attention mechanism during global model aggregation to aggregate client-side models. By capturing the importance of neural network layers in multiple local models, model performance is improved. This mechanism automatically considers the weights of the relationship between the server model and the client model. During iterative training, it continuously updates parameters to reduce the weighted distance between the server and client models, minimize the expected distance between them, and optimize the target's expectation function, as shown in the following equation:
[0069]
[0070] in, These are the model parameters of the server during the t-th communication. Here, D(·,·) represents the model parameters of client k during the (t+1)th communication. D(·,·) is the distance between the two sets of neural parameters calculated using the Euclidean distance formula. k It refers to the importance weight of the client-side model.
[0071] Step 2.5: Capture the hierarchical importance of neural networks in multiple local models using a hierarchical soft attention method, and aggregate them into the global model as feature attention to minimize the distance between the server and client models. Take the derivative of the expectation function in the previous equation to obtain the gradient. K clients perform gradient descent to update the parameters of the global model, as shown in the following equation:
[0072]
[0073]
[0074] Step 2.6: Importance weights of the client-side model (att) k The attention score is calculated using a hierarchical soft attention method, where the server-side model is used as the query value and the client-side model as the key value. The attention weight formula for the l-th layer is shown below:
[0075]
[0076] Among them, w l For the model parameters of the l-th layer of the server, Let l represent the model parameters of the l-th layer of the k-th local client, l∈[1,L]. The p-norm of the difference between the matrices is used as the similarity value between the query and key values of the l-th layer. The softmax function is used to calculate the attention value of the l-th layer of the k-th client. The feature attention of the entire client is then calculated.
[0077] Step 2.7: During incremental learning, the number of neurons output by the last fully connected layer of the model is dynamically changing, which is the number of dataset categories. This causes a weight mismatch when loading the local model parameters to the server and calculating the attention score for each layer together with the global model from the previous communication. Therefore, before calculating the attention score for each layer, the weights of the last fully connected layer in all client models need to be averaged and then assigned to the last layer parameters of the global model from the previous communication, as shown in the following formula.
[0078]
[0079] Where K is the number of clients, K = 2.
[0080] Where K is the number of clients, w L The average weight parameters of the last layer of the neural network on the client side; the number of local clients is set to 2, namely client a and client b. The average weight parameters of the last layer of the neural network for client a. The average weight parameters of the last layer of the neural network for client b.
[0081] A schematic diagram of the global aggregation structure based on feature attention is shown below. Figure 6 As shown. Specific Implementation
[0083] In this embodiment, the relevant data interface is as follows: Figure 1 As shown. The data source for this embodiment is existing MPC (Multivariable Model Prediction) technology, such as the "Advanced Control System Technology for the Upgrade and Retrofit Project of Xingcheng Special Steel 6000 Oxygen Plant" by the applicant, China Air Separation Engineering Co., Ltd. The advanced control system software is based on the Windows 2016 Server operating system and communicates with ABB's Freelance 2019 distributed control system through OPC Server. In this embodiment, the air separation unit of the oxygen plant of Xingcheng Special Steel ASU mainly includes: 1) a precooling system, 2) a purification system, 3) a pressurization and expansion system, 4) an oxygen and nitrogen distillation system, and 5) an argon distillation system, etc. These are the control objects of this embodiment and the data inputs of this embodiment. These data are output to the above control objects. This embodiment continuously records the original advanced control process under different tasks, such as the above five types of equipment in... Figure 1The parameters shown are used as data input, and the oxygen emission rate during the safe production process at the end of the entire production task is used as the output, realizing a fully automated, unmanned recording dataset in the Xingcheng Special Steel 6000 oxygen generator plant. Furthermore, the same method is used to construct fully automated, unmanned recording datasets for each of the same type of plant provided by the applicant, China Air Separation Engineering Co., Ltd., to other clients. Thus, in all the datasets, Figure 1 The parameters in the dataset are the input, and the oxygen emission rate during safe production is the output, which is the optimization objective. Due to the inconsistency of production time, different factories have different amounts of data, and with the continuous increase in tasks, we need incremental learning.
[0084] Furthermore, it should be noted that the specific embodiments described in this specification may differ in the shape and name of their components, etc. The above description is merely illustrative of the structure of the present invention. All equivalent or simple variations made based on the structure, features, and principles described in this patent concept are included within the protection scope of this patent. Those skilled in the art can make various modifications or additions to the described specific embodiments or use similar methods to substitute them, as long as they do not deviate from the structure of the present invention or exceed the scope defined by the claims, all of which should fall within the protection scope of this invention.
Claims
1. A federated incremental learning method in an air separation plant automatic load varying system, characterized by: Includes the following steps: Step 1: Set up the federated incremental learning module, including the following steps: Step 1.1: Build a self-built dataset, using the control parameters of the air separation unit as the input; randomly divide each class of the self-built dataset into two parts, and send them to each local client incrementally according to the class; each local client performs incremental learning on the self-built dataset according to the two classes; Step 1.2: The client first trains the local model parameters using a fixed number of samples of class 1, then aggregates the local model parameters into the global model using the federated averaging algorithm, and sends the aggregated model update back to the client. This process is repeated until the model converges, reaches the maximum number of iterations, or reaches the maximum training time. The global model parameters at this time are saved, and this step is used as pre-training. Step 1.3: Distribute the pre-trained global model parameters to the client, and the client trains using the remaining samples of class 1; after training, send the client model parameters at the time of the t-th communication to the server, and the server sends the fused client model parameters to the client. Step 1.4: The client then uses class 2 samples and some old data to form a training set for training. It uses a feature extractor to extract feature vectors from the new and old data and calculates their respective average feature vectors. It calculates the predicted values of the new and old data using the nearest mean classification algorithm. The predicted values are then substituted into a loss function that combines distillation and classification loss for optimization. The client model parameters at communication t+1 are obtained and sent to the server. This process is repeated until all classes have completed incremental learning. Step 2: Build a federated learning dual attention mechanism model, including the following steps: Step 2.1: Construct convolutional layers using a convolutional neural network to extract features from the input image and obtain matrix U; add a channel attention mechanism locally. Step 2.2: Perform stimulus operations to capture the correlation between channels; Step 2.3: Apply the channel attention weights to the previous features one channel at a time using scale multiplication to complete the recalibration of the original feature map in the channel dimension; Step 2.4: Introduce a feature attention mechanism during global model aggregation to perform client-side model aggregation. This improves model performance by capturing the importance of neural network layers in multiple local models. Step 2.5: Use a hierarchical soft attention method to capture the hierarchical importance of neural networks in multiple local models, and aggregate them as feature attention into the global model; Step 2.6: The importance weights of the client model are calculated using a hierarchical soft attention method, with the server-side model as a query value and the client model as a key value, and the attention score of each layer in the neural network is calculated. Step 2.7: During incremental learning, the number of neurons output by the last fully connected layer of the model is the number of dataset categories, which is dynamically changing. This causes a weight mismatch problem when loading the local model parameters to the server and calculating the attention score for each layer together with the global model from the previous communication. Therefore, before calculating the attention score for each layer, the weights of the last fully connected layer in all client models need to be averaged and then assigned to the last layer parameters of the global model from the previous communication.
2. The method of federated incremental learning in an air separation plant automatic load varying system of claim 1, wherein: In step 1.1, the dataset includes the learning rate, weight decay coefficient, training batch, and number of local clients.
3. The federated incremental learning method in the automatic variable load system of the air separation device according to claim 2, characterized in that: In step 1.1, stochastic gradient descent is used as the learning rate, with an initial learning rate of 0.2, a weight decay coefficient of 0.00001, and a training batch size of 128.
4. The federated incremental learning method in the automatic variable load system of the air separation device according to claim 1, characterized in that: In step 2.1, the channel attention mechanism includes compression, which transforms each two-dimensional feature channel into a real number. The specific formula is as follows: , wherein u c is the c-th two-dimensional matrix in matrix U, the subscript c denotes the number of channels, and the global pooling is used to compress the input feature with a size of HxWxC into a 1x1xC vector, and the current number of channels is 16.
5. The federated incremental learning method in the automatic variable load system of the air separation device according to claim 1, characterized in that: In step 2.2, the correlation between channels The specific formula is as follows: , in, It is the ReLU function. , It is a sigmoid function. The activation operation uses two fully connected layers. Using r=2 as a scaling parameter can reduce the number of channels and thus reduce the amount of computation.
6. The federated incremental learning method in the automatic variable load system of the air separation device according to claim 1, characterized in that: In step 2.4, this feature attention mechanism can automatically consider the weight of the relationship between the server model and the client. During iterative training, it continuously updates the parameters to reduce the weighted distance between the server and client models, minimize the expected distance between the server and client models, and optimize the expectation function of the objective.
7. The federated incremental learning method in the automatic variable load system of the air separation device according to claim 6, characterized in that: Expected function of the optimization objective The specific formula is as follows: , in, These are the model parameters of the server during the t-th communication. Here, D(·, ·) represents the model parameters of client k during the (t+1)th communication. D(·, ·) is the distance between the two sets of neural parameters calculated using the Euclidean distance formula. att k It refers to the importance weight of the client-side model.
8. The federated incremental learning method in the automatic variable load system of the air separation device according to claim 7, characterized in that: In step 2.5, the gradient is obtained by differentiating the expectation function, and K clients perform gradient descent to update the parameters of the global model, as shown in the following formula: , .
9. The federated incremental learning method in the automatic variable load system of the air separation device according to claim 1, characterized in that: In step 2.6, the attention weights of layer l The specific formula is shown below: , Among them, w l For the model parameters of the l-th layer of the server, Let l represent the model parameters of the l-th layer of the k-th local client, l∈[1,L]. The p-norm of the difference between the matrices is used as the similarity value between the query and key values of the l-th layer. The softmax function is used to calculate the attention value of the l-th layer of the k-th client. The feature attention of the entire client is then calculated. .
10. The federated incremental learning method in the automatic variable load system of the air separation device according to claim 1, characterized in that: In step 2.7, , Where K is the number of clients, w L The average weight parameters of the last layer of the neural network on the client side; the number of local clients is set to 2, namely client a and client b. The average weight parameters of the last layer of the neural network for client a. The average weight parameters of the last layer of the neural network for client b.