FL's
machine learning environment uses multiple entities to collaborate to solve
machine learning problems under the coordination of a central server or
service provider. During the process, the
original data of each client is still stored locally and does not participate in exchange or transmission, such as figure 1 The shown FL architecture can implement
machine learning tasks without data concentration. Therefore, this model meets the client’s demand for personal data privacy, and is especially suitable for
edge computing applications; FL is used on the edge side The application can effectively use the
edge device data, but there is also the problem that the
edge device data needs to be optimized for the edge environment because the edge environment is relatively complex
[0004] The central environment of the
edge device data satisfies the characteristics of Independently Identical Distribution (IID), that is, the edge device data conforms to the same probability distribution and is independent of each other. Using IID data to
train the model in the
test set can show better results; while in In FL, since the data on the client is non-independent and identically distributed data (Non-IID), that is, heterogeneous data, and the data is usually not evenly distributed among the clients, this will lead to directly using the
client data training The model may be very different from the overall model of the central environment. The local data directly extracted from the edge device client cannot meet the sample requirements for extracting data from the overall distribution. This has a huge
impact on model training. At the same time, in FL, participants in the training Typical clients are mobile devices powered by independent batteries. Compared with the
energy storage requirements for model training, energy efficiency is also a key challenge that cannot be ignored.
[0005] One of the effective means to deal with this problem is to adopt a basic
algorithm called federated averaging (FedAvg) in FL. FedAvg randomly selects a subset of clients in each round of learning and runs the
global model on its local data. The local copy of , when the local data is weighted by running
stochastic gradient descent and sent back to the FL server, the FL server then updates the client's model weights to a weighted sum; this
algorithm can
train high-quality models with relatively few communication rounds, At the same time, it has shown a strong ability to overcome the
unbalanced data distribution between devices that is common in FL; McMahan and others have confirmed that the FedAvg
algorithm works in a communication-constrained heterogeneous environment and Non-IID data distribution. , but its proven FedAvg algorithm lacks theoretical convergence guarantees
[0006] The reason for the lack of theoretical guarantee for the convergence of the FedAvg algorithm is that the assumption used for experimental analysis and theoretical proof of the convergence of the FedAvg algorithm is that data is shared between devices or distributed in the form of IID, and all devices participate in each round of learning communication; The assumption simplifies the analysis, but the FL environment violates the real edge side, and the scene built by the environment is not realistic
Aiming at this unreal problem, the experiment of Smith et al. carried out the
simulation of the real scene, and the FedAvg algorithm performed well when the local
client data set was relatively large and the data Non-IID distribution was mild; however, when The performance of the FedAvg algorithm drops significantly when the non-IID data with severe offset and the local
data set of the client is relatively small.
[0007] The main reason for the obvious decline in the performance of the FedAvg algorithm is that the traditional method of randomly selecting clients to participate in training is not suitable for the FedAvg algorithm. Due to the heterogeneity of data and devices, the number of clients participating in training usually exceeds the actual training requirements. The number of clients, which not only greatly increases
energy consumption, but also causes poor performance of the model due to the traditional random
selection method of clients or data, and the test results are biased