A non-independent and identically distributed industrial big data joint modeling method

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By optimizing the global model update and dynamic weighting algorithm, and selecting local model parameters that are close to the global optimal model, the problems of model bias and poor accuracy caused by non-independent and identically distributed data in industrial big data are solved, improving learning accuracy and reducing communication costs.

CN114676765BActive Publication Date: 2026-06-19HEBEI UNIV OF TECH

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HEBEI UNIV OF TECH
Filing Date: 2022-03-15
Publication Date: 2026-06-19

AI Technical Summary

Technical Problem

In the industrial sector, due to the different business operations and data standards of various factories, data silos are formed. Existing federated learning methods suffer from severe local model bias when dealing with non-independent and identically distributed data, which affects the optimization efficiency of the global model.

Method used

By optimizing the global model update, a dynamic weighted algorithm calculates the training weights and probabilities of local factories, selects local model parameters that are close to the global optimal model for uploading, and uses a federated distance algorithm to limit the local model offset, and performs multiple rounds of training until the model training ends.

Benefits of technology

It effectively solves the problems of model bias and poor global model accuracy caused by non-independent and identically distributed data, improves the learning accuracy of joint modeling of industrial big data, and reduces the communication cost between local factories and central factories.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN114676765B_ABST

Patent Text Reader

Abstract

This invention provides a method for joint modeling of non-independent and identically distributed (ISD) industrial big data, belonging to the technical field of industrial big data classification and joint modeling. It solves the technical problem of model shift and poor global model accuracy during training caused by the large amount of non-independent and identically distributed data in the industrial field. The steps include: each local factory uploads its local non-independent and identically distributed dataset to the central factory; the central factory performs a global model update, calculates the training weights and probabilities of each local factory, and selects the local factories that are beneficial to global model optimization to distribute global model parameters; the selected local factories perform local updates, selecting local model parameters that are close to the current optimal global model from the models of the local factories that have shifted, and uploads them to the central factory; the optimization is repeated until model training is completed.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of classification and joint modeling of industrial big data, and in particular to a joint modeling method for non-independent and identically distributed industrial big data. Background Technology

[0002] The Industrial Internet of Things (IIOT) has significantly increased the speed at which industrial data is transmitted to the supply chain, enabling the widespread application of machine learning methods for handling large amounts of centralized data in industrial manufacturing and production. Efficient machine learning relies on high-quality and abundant training data. However, data privacy issues arising from competition and regulatory policies in the industrial sector prevent companies from sharing data resources, leading to severe fragmentation of training data and hindering the formation of "industry-level" model improvements. Therefore, finding efficient learning methods without exchanging local data becomes particularly important.

[0003] Federated learning is a multi-party collaborative modeling learning scheme that balances efficiency and privacy protection. It allows multiple parties to jointly train the same model while maintaining the local distribution of data. Currently, researchers have applied federated learning technology to various fields to solve data privacy issues between enterprises, achieving good results. The article [LUYunlong, HUANG Xiaohong, DAI Yueyue, et al. Differentially Private Asynchronous Federated Learning for Mobile Edge Computing in Urban Informatics[J].IEEE Transactions on Industrial Informatics, 2020, 16(3): 2134-2143.] developed an asynchronous federated learning framework based on edge computing, which uses distributed peer-to-peer updates instead of centralized updates, providing stronger privacy protection for sensitive users; The article [Yang W, Zhang Y, YeK, et al. FFD: a federated learning based method for credit card fraud detection[C] / / International Conference on BigData. Springer, Cham, 2019: 18-32.] proposed a privacy-preserving credit card fraud detection method based on federated learning to address the problem of large-scale collaborative training due to data privacy protection. It uses oversampling technology to balance extremely skewed credit card transaction records, and uses federated learning to build a globally shared fraud detection method.

[0004] The aforementioned literature has addressed the issues of data silos and privacy requirements among enterprises to some extent. However, in practical applications, especially in the industrial sector, due to differences in business operations and standards among factories, data silos caused by competition and regulation result in most factories possessing data that is not independent and identically distributed. The federated average algorithm's efficient processing capability even with massive amounts of industrial data has made it a traditional method for joint modeling in the industrial field. Although this method shows good learning performance when applied to non-independent and identically distributed data, multiple local updates can cause local factory model shifts during local training. These shifted local models can also affect the optimization efficiency of the global model when the central factory performs global model updates. To address the aforementioned issues, the paper [Li T, SahuAK, Zaheer M, et al. Federated Optimization in Heterogeneous Networks[EB / OL].[2020-4-21]https: / / arxiv.org / abs / 1812.06127V5.] improves the local objective based on the federated averaging algorithm by introducing an additional L2 regularization term in the local objective function to constrain local updates, thereby limiting the degree of local model deviation, and then introducing a hyperparameter μ to control the weight of L2 regularization; the paper [Wang J, Liu Q, Liang H, et al. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization[J].2020.] improves the federated averaging algorithm in the global update stage by normalizing and scaling the local updates based on the number of local steps of each party before updating the global model. The above solutions have all addressed the problem of poor training performance of traditional joint modeling methods when faced with non-independent and identically distributed data to some extent. However, they only consider the impact of non-independent and identically distributed data at a certain stage. In the industrial field, since industrial data comes from different local factories, non-independent and identically distributed data runs through the entire industrial process, so it is necessary to consider both the local update stage and the global update stage. Summary of the Invention

[0005] The purpose of this invention is to provide a joint modeling method for non-independent and identically distributed industrial big data, which solves the technical problem of model bias and poor global model accuracy during training caused by a large amount of non-independent and identically distributed data in the industrial field.

[0006] In a first aspect, the present invention provides a method for joint modeling of non-independent and identically distributed industrial big data, comprising the following steps:

[0007] Upload local datasets; each local factory uploads its local non-independent, identically distributed datasets to the central factory.

[0008] To optimize the global model, the central factory updates the global model, calculates the training weights and probabilities of each local factory, and selects the local factories that are beneficial to the optimization of the global model to distribute the global model parameters.

[0009] Upload local model parameters, the selected local factory performs local updates, and selects the local model parameters that are close to the current optimal global model from the local factory models that have shifted and uploads them to the central factory.

[0010] Repeat the steps of optimizing the global model and uploading local model parameters until the model training is complete.

[0011] Furthermore, the non-independent and identically distributed dataset includes non-independent and identically distributed datasets with unevenly distributed label types and non-independent and identically distributed datasets with unevenly distributed sample numbers:

[0012] A non-independent identically distributed dataset with unevenly distributed label types, where each local factory has a dataset with sample label types that are not completely identical but the number of samples is the same;

[0013] The non-independent identically distributed dataset has an uneven distribution of sample size, meaning that each local factory has a different number of samples but the same type of sample labels.

[0014] Furthermore, the step of optimizing the global model specifically includes:

[0015] Perform a global model update;

[0016] The training weights and training probabilities of each local factory are calculated and updated using a dynamic weighting algorithm.

[0017] Based on the training probability, select some local factories to distribute global model parameters.

[0018] Furthermore, the step of uploading local model parameters specifically includes:

[0019] The selected local factory will be updated locally;

[0020] Multi-class training is completed by using a loss function to compare labels, causing the local models of each local factory to shift.

[0021] The federated distance algorithm is used to select a subset of local model parameters that deviate less from the current optimal global model of the central factory from the local factory models that have shifted, and then uploads them to the central factory.

[0022] Furthermore, the step of calculating and updating the training weights and training probabilities of each local factory using a dynamic weighting algorithm specifically includes:

[0023] Before the first round of training, initialize the initial training probabilities of each local factory. and weight increment F;

[0024]

[0025]

[0026] in, Let n represent the initial training probability of local factory k. k Let represent the total amount of data for factory k, M represent the total amount of data for all factories; F represent the weight increment, N represent the total number of local factories participating in this round of training, μ represent the proportion of local factories that upload local model parameters, and ∑ k n k (k∈N×μ) represents the total amount of data of all local factories that upload local model parameters to the central factory, and N×μ represents the total number of all local factories that upload local model parameters to the central factory;

[0027] During the t-th round of training, the training effect of the central factory in this round of global model training is compared with the training effect of the global model in the previous round. If the training effect of the global model in this round is better than that in the previous round, the training weight of the factory participating in this round is increased; if the training effect of the global model in this round is worse than that in the previous round, the training weight of the factory is not changed.

[0028] f(w t )<f(w t-1 )

[0029] f(w t )≥f(w t-1 )

[0030]

[0031] M = ∑ k n k (k∈N×μ)

[0032]

[0033]

[0034] Where, f(w) t Let M represent the loss value of the global model in round t, and M represent the total amount of data from all factories. * This represents the total amount of factory data participating in the global model update. This represents the probability that a factory with increased training weights will participate in the next round of training. This represents the probability that a factory whose training weight has not been increased will participate in the next round of training.

[0035] Furthermore, the computational steps of the federated algorithm include:

[0036] Calculate the distance between the local plant model that has shifted and the global model of the central plant.

[0037]

[0038] Where d represents two n-dimensional vectors (x 11 x 12 , ...x 1n ) and (x 21 x 22 , ......, x 2n The standard Euclidean distance between them. Let w represent the local model after factory k completes its local update in round t. t-1 This represents the global model in round t-1. Representing a local model With global model w t-1 Standard European distance;

[0039] The distance between each local factory model and the global model is calculated and sorted. The hyperparameter μ controls the proportion of local factory model parameters uploaded to the central factory, and local factory models that meet the requirements are selected and uploaded to the central factory based on μ.

[0040]

[0041]

[0042]

[0043] In specific calculations w t-1 =(b 11 b 12 , ......, b 1n ) Indicates sample variance This indicates that the distances between different local factory models and the central factory global model are sorted. This indicates the selected partial factory model parameters that were uploaded to the central factory.

[0044] Secondly, the present invention also provides a non-independent, identically distributed industrial big data joint modeling device, comprising:

[0045] The local dataset upload module is used by each local factory to upload its local non-independent and identically distributed dataset to the central factory;

[0046] The global model module is optimized and used by the central factory to update the global model, calculate the training weights and training probabilities of each local factory, and select the local factories that are conducive to global model optimization to distribute global model parameters.

[0047] The module for uploading local model parameters is used for local updates of selected local factories. It selects local model parameters that are close to the current optimal global model from the local factory models that have shifted and uploads them to the central factory.

[0048] This invention provides a method for joint modeling of non-independent and identically distributed industrial big data, which has the following advantages:

[0049] To address the issue of local factory model bias and poor global model accuracy caused by the abundance of non-independent and identically distributed (i.i.d.) data in industrial settings, this paper improves traditional joint modeling methods in two aspects: local updates and global aggregation. Under this new model approach, both the problems of local factory model bias and low global model accuracy are simultaneously resolved, meeting the needs of joint training on large amounts of non-independent and identically distributed datasets in industrial settings.

[0050] Compared with traditional joint modeling methods, this invention:

[0051] (1) Traditional joint modeling methods use the federated averaging algorithm, which performs multiple local updates locally. A large number of local model updates can cause varying degrees of shift in the local models of different factories. The average model may also deviate significantly from the globally optimal model due to the influence of factories with large shifts. Therefore, it is unsuitable for joint training in industrial environments with a large amount of non-independent, identically distributed data. This invention improves the target algorithm by using a federated distance algorithm to reselect the local factory models to be uploaded to the central factory before the updated local factory models are uploaded. This ensures that the selected local factory models are not too far from the optimal global model of the central factory, thereby limiting the model shift caused by updates from different factories.

[0052] (2) Traditional joint modeling methods can negatively impact the optimization of the global model when faced with extreme cases of local data from different factories (e.g., a factory has samples with two different labels, A and B, but the number of samples with label A only accounts for 5%). Therefore, it is unreasonable to randomly select some local factories for each round of training with a fixed probability during the update phase of the central factory. In this invention, the algorithm is improved during the global aggregation phase of the central factory by initializing equal training weights for each factory. The dynamic weighting algorithm always retains the global model w of the central factory from the previous training round during each round of training. t-1 After this update, the global model of the central factory is w. t When f(w) t )<f(w t-1 When f(w) is uploaded, the factory for uploading the local model in this round increases the training weights F; when ... t )≥f(w t-1 When training a factory in a given round, the training weights of that factory remain unchanged. Through multiple joint training iterations, the probability weights of different factories being selected continuously change. Local factories that are more suitable for global model optimization will see their training weights increase during the training process, thereby promoting the learning model to continuously approach the optimal global model.

[0053] The proposed method for joint modeling of non-independent and identically distributed industrial big data was applied to joint modeling of non-independent and identically distributed industrial big data. Through experimental analysis, the method showed good results when dealing with non-independent and identically distributed industrial data, effectively solving the problem of joint modeling of data with non-independent and identically distributed characteristics due to the fact that industrial data comes from different factories.

[0054] Correspondingly, the non-independent and identically distributed industrial big data joint modeling device provided in this embodiment of the invention also has the above-mentioned technical effects. Attached Figure Description

[0055] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0056] Figure 1 A flowchart of the non-independent and identically distributed industrial big data joint modeling method provided in Embodiment 1 of the present invention;

[0057] Figure 2 This is a schematic diagram of the non-independent and identically distributed industrial big data joint modeling device provided in Embodiment 2 of the present invention.

[0058] Figure 3 This is a framework diagram of the non-independent and identically distributed industrial big data joint modeling method provided in Embodiment 3 of the present invention;

[0059] Figure 4 This is a schematic diagram of the dynamic weighting algorithm of the method provided in Embodiment 3 of the present invention;

[0060] Figure 5 This is a graph showing the changes in factory training weights in the method provided in Embodiment 3 of the present invention;

[0061] Figure 6 This is a training probability change diagram of the method provided in Embodiment 3 of the present invention;

[0062] Figures 7a to 7f The accuracy curves of the method provided in Embodiment 3 of the present invention under different MNIST partitioning methods;

[0063] Figures 8a to 8f This is a comparison of the accuracy of different datasets under the same partitioning method provided in Embodiment 3 of the present invention. Detailed Implementation

[0064] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0065] The terms "comprising" and "having," and any variations thereof, used in the embodiments of this invention are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the steps or units listed, but may optionally include other steps or units not listed, or may optionally include other steps or units inherent to such processes, methods, products, or devices.

[0066] Example 1:

[0067] like Figure 1 As shown in the figure, an embodiment of the present invention provides a method for joint modeling of non-independent and identically distributed industrial big data, which includes the following steps:

[0068] S101: Upload local dataset

[0069] Each local factory uploads its local non-independent, identically distributed dataset to the central factory;

[0070] S102: Optimize the global model

[0071] The central factory performs global model updates, calculates the training weights and training probabilities of each local factory, and selects the local factories that are beneficial to global model optimization to distribute global model parameters.

[0072] S103: Upload local model parameters. Selected local factories perform local updates. Among the local factory models that have shifted, select the local model parameters that are closest to the current optimal global model and upload them to the central factory.

[0073] Repeat steps S102 and S103 until the model training is complete.

[0074] Traditional joint modeling methods can negatively impact the optimization of the global model when faced with extreme cases of local data from different factories (e.g., a factory has samples with two different labels, A and B, but samples with label A only account for 5%). Therefore, randomly selecting some local factories for each round of training with a fixed probability during the central factory's update phase is unreasonable. This invention improves the algorithm during the global aggregation phase of the central factory by initializing equal training weights for each factory. Through multiple joint training sessions, the probability weights of different factories being selected continuously change. Local factories more suitable for global model optimization receive increasing training weights during the training process, thereby promoting the learning model to continuously approach the optimal global model.

[0075] In one possible implementation, the independent and identically distributed dataset includes both a non-independent and identically distributed dataset with unevenly distributed label categories and a non-independent and identically distributed dataset with unevenly distributed sample numbers.

[0076] The non-independent identically distributed dataset with unevenly distributed label types is a dataset in which each local factory has a sample with different label types but the same number of samples. That is, let factory A have k data samples with λ types of labels; factory B have k data samples with λ-1 types of labels; and factory C also has k data samples with λ+1 types of labels.

[0077] The non-independent identically distributed dataset with uneven sample size distribution is a dataset in which each local factory has a different number of samples but the same number of label types. That is, let factory A have k data samples with λ types of labels; factory B have 2k data samples with λ types of labels; and factory C have 3k data samples with λ types of labels.

[0078] In one possible implementation, the step of optimizing the global model specifically includes: updating the global model; calculating and updating the training weights and training probabilities of each local factory using a dynamic weighting algorithm; and selecting some local factories to distribute global model parameters based on the training probabilities. During the global aggregation phase of the central factory, the algorithm is improved by initializing equal training weights for each factory. The dynamic weighting algorithm always retains the global model w of the central factory from the previous training round during each training cycle. t-1 After this update, the global model of the central factory is w. t When f(w) t )<f(w t-1 When f(w) is uploaded, the factory for uploading the local model in this round increases the training weights F; when ... t )≥f(w t-1 When ), the training weights of the training factory in this round do not change.

[0079] In one possible implementation, the step of uploading local model parameters specifically includes: the selected local factory performing local updates; using a loss function to perform label comparison to complete multi-class training, causing the local models of each local factory to shift; and using a federated distance algorithm to select a subset of local model parameters that deviate less from the current optimal global model of the central factory from the shifted local factory models, and uploading them to the central factory.

[0080] In one possible implementation, the step of calculating and updating the training weights and training probabilities of each local factory using a dynamic weighting algorithm specifically includes:

[0081] Before the first round of training, initialize the initial training probabilities of each local factory. and weight increment F;

[0082]

[0083]

[0084] in, Let n represent the initial training probability of local factory k. k Let represent the total amount of data for factory k, M represent the total amount of data for all factories; F represent the weight increment, N represent the total number of local factories participating in this round of training, μ represent the proportion of local factories that upload local model parameters, and ∑ k n k (k∈N×μ) represents the total amount of data of all local factories that upload local model parameters to the central factory, and N×μ represents the total number of all local factories that upload local model parameters to the central factory;

[0085] During the t-th round of training, the training effect of the central factory in this round of global model training is compared with the training effect of the global model in the previous round. If the training effect of the global model in this round is better than that in the previous round, the training weight of the factory participating in this round is increased; if the training effect of the global model in this round is worse than that in the previous round, the training weight of the factory is not changed.

[0086] f(w t )<f(w t-1 )

[0087] f(w t )≥f(w t-1 )

[0088]

[0089] M = ∑ k n k (k∈N×μ)

[0090]

[0091]

[0092] Where, f(w) t Let M represent the loss value of the global model in round t, and M represent the total amount of data from all factories. * This represents the total amount of factory data participating in the global model update. This represents the probability that a factory with increased training weights will participate in the next round of training. This represents the probability that a factory whose training weights have not been increased will participate in the next round of training. The computational steps of the federated algorithm include:

[0093] Calculate the distance between the local plant model that has shifted and the global model of the central plant.

[0094]

[0095]

[0096] Where d represents two n-dimensional vectors (x 11 x 12 , ...x 1n ) and (x 21 x 22 , ......, x 2n The standard Euclidean distance between them. Let w represent the local model after factory k completes its local update in round t. t-1 This represents the global model in round t-1. Representing a local model With global model w t-1 Standard European distance;

[0097] The distance between each local factory model and the global model is calculated and sorted. The hyperparameter μ controls the proportion of local factory model parameters uploaded to the central factory, and local factory models that meet the requirements are selected and uploaded to the central factory based on μ.

[0098]

[0099]

[0100]

[0101] In specific calculations w t-1 =(b 11 b 12 , ......, b 1n ) Indicates sample variance This indicates that the distances between different local factory models and the central factory global model are sorted. This indicates the selected partial factory model parameters that were uploaded to the central factory.

[0102] Example 2:

[0103] This invention also provides a non-independent, identically distributed industrial big data joint modeling device, comprising:

[0104] The local dataset upload module is used by each local factory to upload its local non-independent and identically distributed dataset to the central factory;

[0105] The global model module is optimized and used by the central factory to update the global model, calculate the training weights and training probabilities of each local factory, and select the local factories that are conducive to global model optimization to distribute global model parameters.

[0106] The module for uploading local model parameters is used for local updates of selected local factories. It selects local model parameters that are close to the current optimal global model from the local factory models that have shifted and uploads them to the central factory.

[0107] Example 3:

[0108] This experiment uses three different image datasets as validation datasets. MNIST is a dataset of handwritten digit images, containing 28×28 grayscale images from 250 different individuals, including 60,000 training samples, 10,000 test samples, and 10 different labels. FMNIST is a product image dataset, including 70,000 front-view images of products from 10 different labels, with 60,000 training samples, 10,000 test samples, 784 features, and 10 different labels. The CIFAR-10 dataset is an image dataset for object recognition, containing 32×32 color images, including 50,000 training samples and 10,000 test samples across 10 categories. Specific data features are shown in Table 1.

[0109] Table 1 Experimental Dataset Information

[0110]

[0111] To obtain a non-independent and identically distributed dataset that meets the experimental requirements, the MNIST, FMNIST, and CIFAR-10 datasets were partitioned. The amount of data in the local dataset of each local factory was fixed to be the same. Then, each local client was assigned λ different sample labels. Each factory has exactly λ corresponding sample labels, thus obtaining a non-independent and identically distributed dataset with uneven label distribution. The Dirichlet distribution was used to distribute different numbers of samples to each client. q ~ D(γ) represents the partitioning method, where the parameter γ can control the degree of unevenness. Each client has a different number of data but the same distribution method, thus obtaining a non-independent and identically distributed dataset with uneven sample number distribution.

[0112] Experiment 1: Determination of hyperparameter μ

[0113] like Figure 3 , Figure 4As shown, the hyperparameters in the federated distance algorithm control the selection of local factory models with appropriate distances to the global model for uploading to the central factory for updates, thereby limiting the offset impact of local factory updates. The value of μ was determined through experimental results comparison. The experiment was set to λ = 2, with a total of 50 rounds. The dataset used was MNIST, and the number of factories participating in training in each round was 10 (c = 0.1), μ ∈ (0.1, 0.2, ..., 1). μ = 0.k indicates selecting the k local models closest to the global model for uploading to the central factory; for example, μ = 0.5 indicates selecting the 5 local models closest to the global model of the central factory for uploading. In the entire experiment, experiments with different μ values were repeated 10 times. The average, variance, and range of the accuracy under different settings were the average of the 10 experiments, and the results are shown in Table 2.

[0114] Table 2. The impact of different values on model accuracy

[0115]

[0116] Table 2 shows the impact of different μ values on model accuracy. The comparison reveals that, with 10 repeated experiments, μ=0.8 yields the best average accuracy and variance. Although the average accuracy and variance are not significantly different from those with μ=0.9 and μ=1, μ=0.8 means that only 8 different local factory models need to be uploaded to the central factory per training round, reducing the communication cost between the local and central factories during each training round. Therefore, μ=0.8 will be used in the following experiments.

[0117] Experiment 2: Local Factory Training with Weight and Probability Variations

[0118] In the experiment, λ = 2, T = 50, c = 0.1, m = 100, and the initial training weight for each factory was 1, with an initial training probability of 0.01. The changes in the training weights for each factory are as follows: Figure 5 As shown in the figure (horizontal axis represents the number of training iterations, and vertical axis represents the training weights), the changes in training probabilities for each factory are as follows: Figure 6 As shown (the horizontal axis represents the number of training iterations, and the vertical axis represents the training probability; five different factories were randomly selected from 1040 factories for demonstration: factory 86, factory 92, factory 64, factory 31, and factory 13, labeled c_86, c_92, c_64, c_31, and c_13 respectively). It can be observed that the training weights and probabilities of different factories change with the experiment, with the training weights and probabilities of factories that are more beneficial to the global model continuously increasing.

[0119] Experiment 3: Comparison of the Invention Method with Traditional Joint Modeling Methods

[0120] In the experiment, the accuracy of the method of this invention and the traditional joint modeling method were compared on different datasets with different non-independent identically distributed settings. The classification settings were λ=1, λ=2, λ=3, λ=4, λ=5, and q~D (γ=0.5). As shown in Table 3.

[0121] Table 3. Accuracy Comparison of Different Non-Independent Identical Distribution Settings

[0122]

[0123]

[0124] We will extract a portion of the data from the table, display it graphically, and then analyze it. Figures 7a to 7f (The horizontal axis represents the number of training iterations, and the vertical axis represents accuracy.) This graph shows the experimental results for the same dataset using different partitioning methods. Figure 7a The experimental results are shown when λ = 1. Figure 7b The experimental results are shown when λ = 2. Figure 7c The experimental results are shown when λ = 3. Figure 7d The experimental results are shown when λ = 4. Figure 7e The experimental results are shown when λ = 5. Figure 7f The figure shows the experimental results for q~D (γ=0.5). In the figure, Fedavg refers to the traditional joint modeling method, and Fdw-Hd refers to the method of this invention. Taking the MNIST dataset as an example, and comparing the data in Table 3, we find that when facing the MNIST dataset, when λ=1, the accuracies of the traditional joint modeling method and the method of this invention are 0.5967 and 0.6736, respectively, an improvement of 0.0769; when λ=2, the accuracies are 0.9253 and 0.9608, respectively, an improvement of 0.0355; when λ=3, the accuracies are 0.9586 and 0.9778, respectively, an improvement of 0.0192; when λ=4, the accuracies are 0.9767 and 0.9789, respectively, an improvement of 0.0022; and when λ=5, the accuracies are 0.9808 and 0.9811, respectively. The accuracy improved by 0.0003. The fewer types of sample labels randomly assigned to the client, the more significant the improvement effect of the method of the present invention. When facing the FMNIST dataset, the method of the present invention performs best when λ=2. However, when facing the CIFAR-10 dataset, the improvement effect of the method of the present invention is not very significant. Figures 8a to 8f (The horizontal axis represents different datasets, and the vertical axis represents accuracy.) This graph shows the experimental results for different datasets using the same partitioning method. Figure 8a The experimental results are shown when λ = 1. Figure 8b The experimental results are shown when λ = 2. Figure 8cThe experimental results are shown when λ = 3. Figure 8d The experimental results are shown when λ = 4. Figure 8e The experimental results are shown when λ = 5. Figure 8f The experimental results (showing q ~ D (γ = 0.5)) are illustrated in Table 3. A comparison with the data in Table 3 reveals significant differences in accuracy across different datasets under the same partitioning method. For example, with λ = 2, the accuracies for MNIST, FMNIST, and CIFAR-10 using the traditional joint modeling method are 0.9253, 0.6772, and 0.4256, respectively; while the accuracies using the method of this invention are 0.9608, 0.7343, and 0.4291, respectively. This invention's method shows a significant improvement when dealing with non-independent, identically distributed datasets like MNIST and FMNIST; however, the improvement is not very significant when dealing with the CIFAR-10 dataset.

[0125] In summary, the proposed joint modeling method for non-independent and identically distributed industrial big data demonstrates superior advantages over traditional joint modeling methods when handling such data across various comparative perspectives. This invention better addresses the learning accuracy issues arising from non-independent and identically distributed data by directly mitigating model shift problems caused by local model updates and dynamically weighted selection of local factories that are more suitable for the global model. Furthermore, the fact that it avoids uploading all local factory models reduces communication costs between local and central factories.

[0126] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.

[0127] In the description of this invention, it should be noted that the terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, or the orientation or positional relationship commonly used when the product of this invention is in use. They are only for the convenience of describing this invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this invention. In addition, the terms "first," "second," and "third," etc., are only used to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0128] In the description of this invention, it should also be noted that, unless otherwise explicitly specified and limited, the terms "set," "install," "connect," and "link" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal communication between two components. Those skilled in the art can understand the specific meaning of the above terms in this invention based on the specific circumstances.

[0129] The apparatus provided in this embodiment of the invention can be specific hardware on a device or software or firmware installed on the device. The implementation principle and technical effects of the apparatus provided in this embodiment of the invention are the same as those in the foregoing method embodiments. For the sake of brevity, any parts not mentioned in the apparatus embodiments can be referred to the corresponding content in the foregoing method embodiments. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, apparatuses, and units described above can all be referred to the corresponding processes in the above method embodiments, and will not be repeated here.

[0130] In the several embodiments provided by this invention, it should be understood that the disclosed apparatus and methods can also be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.

[0131] For example, the division of units is merely a logical functional division; in actual implementation, there may be other division methods. Furthermore, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Additionally, the displayed or discussed mutual couplings, direct couplings, or communication connections may be indirect couplings or communication connections through some communication interfaces, devices, or units, and may be electrical, mechanical, or other forms.

[0132] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0133] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0134] Finally, it should be noted that the above-described embodiments are merely specific implementations of the present invention, used to illustrate the technical solutions of the present invention, and not to limit it. The scope of protection of the present invention is not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that any person skilled in the art can still modify or easily conceive of changes to the technical solutions described in the foregoing embodiments within the scope of the technology disclosed in the present invention, or make equivalent substitutions for some of the technical features; and these modifications, changes, or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention. All should be covered within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for joint modeling of non-independent and identically distributed industrial big data, characterized in that, Includes the following steps: Upload local datasets; each local factory uploads its local non-independent, identically distributed datasets to the central factory. To optimize the global model, the central factory updates the global model, calculates the training weights and training probabilities of each local factory, and selects the local factories that are beneficial to the optimization of the global model to distribute the global model parameters. Upload local model parameters, the selected local factory performs local updates, and selects the local model parameters that are close to the current optimal global model from the local factory models that have shifted and uploads them to the central factory. Repeat the steps of optimizing the global model and uploading local model parameters until the model training is complete; The steps for optimizing the global model specifically include: Perform a global model update; The training weights and training probabilities of each local factory are calculated and updated using a dynamic weighting algorithm. Based on the training probability, select some local factories to distribute global model parameters; The step of calculating and updating the training weights and training probabilities of each local factory using a dynamic weighting algorithm specifically includes: Before the first round of training, initialize the initial training probabilities of each local factory. and weight increment F; in, Let n represent the initial training probability of local factory k. k Let M represent the total amount of data for factory k, F represent the total amount of data for all factories, N represent the total number of local factories participating in this training round, and μ represent the proportion of local factories that upload the model parameters. k n k (k∈N×μ) represents the total amount of data of all local factories that upload local model parameters to the central factory, and N×μ represents the total number of all local factories that upload local model parameters to the central factory; During the t-th round of training, the training effect of the central factory in this round of global model training is compared with the training effect of the global model in the previous round. If the training effect of the global model in this round is better than that in the previous round, the training weight of the factory participating in this round is increased; if the training effect of the global model in this round is worse than that in the previous round, the training weight of the factory is not changed. f(w t ) < f(w t-1 ) f(w t )≥f(w t-1 ) M=∑ k n k (k∈N×μ) Where, f(w) t Let M represent the loss value of the global model in round t, and M represent the total amount of data from all factories. * This represents the total amount of factory data participating in the global model update. This represents the probability that a factory with increased training weights will participate in the next round of training. This represents the probability that a factory whose training weight has not been increased will participate in the next round of training.

2. The method according to claim 1, characterized in that, The non-independent and identically distributed datasets include non-independent and identically distributed datasets with unevenly distributed label categories and non-independent and identically distributed datasets with unevenly distributed sample numbers: A non-independent identically distributed dataset with unevenly distributed label types, where each local factory has a dataset with sample label types that are not completely identical but the number of samples is the same; The non-independent identically distributed dataset has an uneven distribution of sample size, meaning that each local factory has a different number of samples but the same type of sample labels.

3. The method according to claim 1, characterized in that, The step of uploading local model parameters specifically includes: The selected local factory will be updated locally; Multi-class training is completed by using a loss function to compare labels, causing the local models of each local factory to shift. The federated distance algorithm is used to select a subset of local model parameters that deviate less from the current optimal global model of the central factory from the local factory models that have shifted, and then uploads them to the central factory.

4. The method according to claim 3, characterized in that, The computation steps of the federated distance algorithm include: Calculate the distance between the local plant model that has shifted and the global model of the central plant. Where d represents two n-dimensional vectors (x 11 x 12 , ...x 1n ) and (x 21 x 22 , ......, x 2n The standard Euclidean distance between them. Let w represent the local model after factory k completes its local update in round t. t-1 This represents the global model in round t-1. Representing a local model With global model w t-1 Standard European distance; The distance between each local factory model and the global model is calculated and sorted. The hyperparameter μ controls the proportion of local factory model parameters uploaded to the central factory, and local factory models that meet the requirements are selected and uploaded to the central factory based on μ. In specific calculations w t-1 =(b 11 b 12 , ......, b 1n ), Indicates sample variance This indicates that the distances between different local factory models and the central factory global model are sorted. This indicates the selected partial factory model parameters that were uploaded to the central factory.

5. A non-independent, identically distributed industrial big data joint modeling device, characterized in that, include: The local dataset upload module is used by each local factory to upload its local non-independent and identically distributed dataset to the central factory; The global model optimization module is used by the central factory to update the global model, calculate the training weights and training probabilities of each local factory, and select the local factories that are conducive to global model optimization to distribute global model parameters. The module for uploading local model parameters is used for local updates of selected local factories. It selects local model parameters that are close to the current optimal global model from the local factory models that have shifted and uploads them to the central factory. The global model optimization module is specifically used for: Perform a global model update; The training weights and training probabilities of each local factory are calculated and updated using a dynamic weighting algorithm. Based on the training probability, select some local factories to distribute global model parameters; Specifically, the calculation and updating of the training weights and training probabilities of each local factory using a dynamic weighting algorithm includes: Before the first round of training, initialize the initial training probabilities of each local factory. and weight increment F; in, Let n represent the initial training probability of local factory k. k Let M represent the total amount of data for factory k, F represent the total amount of data for all factories, N represent the total number of local factories participating in this training round, and μ represent the proportion of local factories that upload the model parameters. k n k (k∈N×μ) represents the total amount of data of all local factories that upload local model parameters to the central factory, and N×μ represents the total number of all local factories that upload local model parameters to the central factory; During the t-th round of training, the training effect of the central factory in this round of global model training is compared with the training effect of the global model in the previous round. If the training effect of the global model in this round is better than that in the previous round, the training weight of the factory participating in this round is increased; if the training effect of the global model in this round is worse than that in the previous round, the training weight of the factory is not changed. f(w t )<f(w t-1 ) f(w t )≥f(w t-1 ) M=∑ k n k (k∈N×μ) Where, f(w) t Let M represent the loss value of the global model in round t, and M represent the total amount of data from all factories. * This represents the total amount of factory data participating in the global model update. This represents the probability that a factory with increased training weights will participate in the next round of training. This represents the probability that a factory whose training weight has not been increased will participate in the next round of training.