A secure and robust federated learning training method

By employing adaptive gradient compression, residual gradient accumulation, and differential privacy noise reduction, combined with malicious attack detection and reputation assessment, the federated learning process is optimized. This addresses the security and communication efficiency issues of federated learning under Non-IID data and malicious attacks, thereby improving the accuracy and efficiency of model training.

CN122248420APending Publication Date: 2026-06-19SHANDONG NORMAL UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANDONG NORMAL UNIV
Filing Date
2026-04-02
Publication Date
2026-06-19

Smart Images

  • Figure CN122248420A_ABST
    Figure CN122248420A_ABST
Patent Text Reader

Abstract

This invention discloses a secure and robust federated learning training method, belonging to the field of machine learning technology. The method includes a control center initializing a global model and client reputation values ​​and distributing them to each client. After training on their local datasets, clients generate secure updates through adaptive gradient compression, residual gradient accumulation, and differential privacy noise enhancement, significantly reducing communication overhead and ensuring data privacy. These updates are then transmitted to the control center using a sparse transmission mechanism. Upon receiving the updates, the control center combines gradient norm anomaly detection and cosine similarity analysis to detect attacks, dynamically adjusts client reputation scores, and updates the global model through weighted aggregation based on reputation weights, effectively suppressing the negative impact of malicious clients on the model. This invention is applicable to model collaborative training in scenarios such as communication networks, the Internet of Things, and edge computing, ensuring communication efficiency, detecting and resisting malicious attacks, and adapting to Non-IID data distributions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of machine learning technology, and in particular to a secure and robust federated learning training method. Background Technology

[0002] Traditional centralized machine learning methods require uploading data from various edge devices (such as base stations, sensors, and mobile terminals) to a central server for unified training. This approach not only consumes a large amount of bandwidth but also poses a risk of data privacy breaches. Especially under regulations such as the General Data Protection Regulation (GDPR), local data processing has become an inevitable trend.

[0003] Federated Learning (FL), as a distributed machine learning paradigm, allows edge devices to train models locally and only upload model updates to the server for aggregation, thereby avoiding the transmission of raw data and significantly improving data privacy.

[0004] However, in traditional federated learning training, as the number of participating clients expands, a large amount of model parameter updates need to be transmitted frequently, consuming significant network resources. Furthermore, in terms of security, federated learning systems are vulnerable to Byzantine attacks launched by malicious clients (such as gradient manipulation and label pollution), which can guide the model to update in the wrong direction. Regarding privacy, some malicious attackers may infer the participating clients and the data content of their training based on model updates, posing new challenges to user data privacy. Finally, because the training data held by participating clients differs, the model parameters from different clients vary in their update direction, making simple average aggregation difficult to adapt to this non-independent and identically distributed (Non-IID) data. Summary of the Invention

[0005] The purpose of this invention is to provide a secure and robust federated learning training method that can ensure communication efficiency, detect and resist malicious attacks, and adapt to Non-IID data distribution.

[0006] To achieve the above objectives, this invention provides a secure and robust federated learning training method, comprising the following steps: S1. Control Center initializes global model. and the model The data is sent to client i, resulting in the initial client model set { , ,…, The control center initializes the reputation values ​​of all clients. S2. After receiving the model, client i selects a subset of clients to participate in each training round. The selected clients train on their local dataset to obtain the updated local model. ; S3, Client i calculates model update amount The algorithm then performs adaptive gradient compression, residual gradient accumulation, and differential privacy noise enhancement to obtain a secure update. ; S4. The client will compress the update. The data is transmitted to the control center using a sparse transmission mechanism, i.e., value + index. S5. After receiving updates from all clients, the control center performs malicious attack detection and reputation assessment, and generates a new generation global model by weighting and aggregating the client reputation scores. ; S6. The control center will distribute the updated global model and reputation score to each client. S7. Repeat S2 to S6 until the model converges or the maximum number of training rounds is reached.

[0007] Preferably, in S1, the control center performs the following operations before pushing the model to be trained to the client: S11. Select the specific form of the model: Choose from linear and nonlinear models based on task complexity. Linear models include logistic regression and support vector machines, while nonlinear models include convolutional neural networks, recurrent neural networks, and Transformers. For image classification tasks, choose convolutional neural networks (CNNs). For time series prediction, choose long short-term memory networks (LSTMs) and temporal convolutional networks (TCNs). S12. Dataset processing: For client data, it is divided into training dataset and test dataset according to a certain ratio; for time series data, samples are constructed according to the sliding window mechanism, and the window size p is set according to the periodic characteristics of the sequence to generate training sample dataset and test sample dataset.

[0008] Preferably, the adaptive gradient compression mechanism in S3 dynamically adjusts the gradient compression ratio based on client reputation and training epochs; the compression ratio refers to the proportion of model parameters retained after compression, used to retain key gradient information while reducing communication volume; the adaptive gradient compression mechanism specifically includes the following steps: S301, Based on the reputation value of client i and the current training round Dynamically calculate compression ratio : ; in , β is a hyperparameter. ∈[0,1] represents the client's current reputation value; S302, According to the compression ratio Determine the number of gradients to retain, k, and take the compression ratio multiplied by the total number of gradients as the value of k; S303. Perform Top-K filtering on the gradient tensor, retaining the k gradient values ​​with the largest absolute values ​​and setting the rest to zero; then, only transmit parameters whose change exceeds a preset threshold to further reduce communication overhead; the compressed update is denoted as... ; S304, Unretained gradient values ​​are treated as residuals. Store the data and add it to the gradient in the next training iteration.

[0009] Preferably, in S3, residual gradient accumulation means that the gradient residuals that are not propagated in each round are saved and added to the gradients in the next round of training, thereby avoiding information loss and accelerating model convergence; specifically expressed as: ; in It is a compressed gradient update. It is the residual from the previous round of preservation.

[0010] Preferably, the differential privacy noise-adding mechanism in S3 adaptively adds Gaussian noise based on the client's reputation value, thereby perturbing the updates of low-reputation clients and reducing the adverse effects of their abnormal updates on the global model while protecting privacy. ; Among them, the standard deviation of noise for: ; For the privacy budget, sensitivity is used to determine the amount of noise that needs to be added to protect data privacy. δ is a very small positive value to prevent division by zero. The higher the reputation, the weaker the noise and the smaller the impact on model performance.

[0011] Preferably, the sparse transmission mechanism in S4 refers to transmitting only the values ​​of non-zero gradients and their index positions, which significantly reduces communication overhead; after receiving the data, the control center reconstructs the complete gradient based on the values ​​and indexes.

[0012] Preferably, in S5, malicious attack detection and reputation assessment are combined with gradient norm anomaly detection, cosine similarity analysis, and historical behavior records to dynamically assess client reputation, identify and suppress malicious updates, and the control center performs weighted aggregation based on the reputation weights of participating clients, specifically including: S51. Calculate the gradient norm of each client update. And the mean cosine similarity with updates from other clients ; S52, Anomaly Detection Update: If the gradient norm of client i satisfies: ; Or the cosine similarity satisfies: ; Then it is marked as an exception, where and These are the mean and standard deviation of gradient updates for all clients, respectively. and Let be the mean and standard deviation of the cosine similarity between a client and other clients; S53, Dynamically update client reputation value: ; S54. Use reputation weights for weighted aggregation: ; in, This represents the global model before the current round of updates, where i and j are the client IDs participating in training in the current round. and η represents the reputation values ​​of clients i and j, respectively; N represents the number of clients participating in training in the current round; and η represents the server learning rate.

[0013] Preferably, the training method supports a periodic model retraining mechanism, which dynamically triggers global updates based on changes in data distribution and model performance degradation to maintain the model's predictive ability.

[0014] Therefore, this invention employs a secure and robust federated learning training method, which optimizes the federated learning process to achieve secure and efficient model training in a non-independent, identically distributed data environment. During the federated learning training process, malicious gradient detection, dynamic gradient compression, and reputation evaluation mechanisms are introduced, enabling the system to distinguish between reliable and malicious clients and adaptively reduce the communication overhead generated by reliable and malicious clients. In the aggregation phase, the control center adjusts the aggregation weights of each client based on reputation, reducing the impact of malicious clients on the global model and improving the accuracy of the global model.

[0015] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0016] Figure 1 This is a schematic diagram of the system model structure according to an embodiment of the present invention; Figure 2 This is a comparison chart showing the change in global model accuracy with rounds between Embodiment 1 of the present invention and the traditional method; Figure 3 This is a comparison chart of the loss values ​​of Embodiment 1 of the present invention and the traditional method as a function of rounds; Figure 4 This is a comparison chart of the cumulative communication volume changing with rounds in Embodiment 1 of the present invention and the traditional method; Figure 5 This is a comparison chart of the prediction results of Embodiment 2 of the present invention and the traditional method in the traffic prediction task; Figure 6 This is a comparison chart showing the change of loss values ​​with rounds between Embodiment 2 of the present invention and the traditional method; Figure 7 This is a comparison chart of the cumulative communication volume changing with rounds in Embodiment 2 of the present invention and the traditional method. Detailed Implementation

[0017] The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.

[0018] Unless otherwise defined, the technical or scientific terms used in this invention shall have the ordinary meaning understood by one of ordinary skill in the art to which this invention pertains. The terms "first," "second," and similar terms used in this invention do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are used only to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

[0019] Example 1 This invention provides a secure and robust federated learning training method to address the challenges of security, communication efficiency, and data heterogeneity in practical deployment scenarios of federated learning. During training, the control center pushes the pre-trained model and reputation score to the client-side participating in collaborative training. The client-side trains the model based on local data and transmits the compressed gradient update information to the control center. The control center re-evaluates the reputation score and performs a global model update by weighted aggregation based on the reputation score. Finally, the updated model and reputation score are fed back to the client. Specifically, the client-side uses adaptive gradient compression, residual gradient accumulation, and sparsified gradient updates based on the reputation score issued by the control center to reduce communication overhead; differential privacy is used to obscure the impact of individual data points on the final aggregation update, ensuring that attackers cannot determine with high confidence whether a specific user's specific data record is included in the training dataset; after receiving the gradient update information from the client, the control center performs malicious attack detection, detects abnormal gradient update values, identifies and marks malicious clients, and dynamically adjusts the client's reputation score; finally, in the control center, the reputation score controls the aggregation weight, reducing the impact of malicious clients on model training.

[0020] In this embodiment, a control center (server) and 10 clients are included, and its system model is as follows: Figure 1 As shown. Each client holds a local dataset and maintains a local model, while the control center is responsible for the aggregation and distribution of global models. The specific steps are as follows: S1. Control Center initializes global model. and the model The data is sent to client i, resulting in the initial client model set { , ,…, The control center initializes the reputation values ​​of all clients; in this embodiment, the initial reputation of all clients is set to 1.

[0021] Before pushing the model to be trained to the client, the control center performs the following operations: S11. Select the specific form of the model: Choose from linear and nonlinear models based on task complexity. Linear models include logistic regression and support vector machines, while nonlinear models include convolutional neural networks, recurrent neural networks, and Transformers. For image classification tasks, choose convolutional neural networks (CNNs). For time series prediction, choose long short-term memory networks (LSTMs) and temporal convolutional networks (TCNs).

[0022] S12. Dataset processing: For client data, it is divided into training dataset and test dataset according to a certain ratio; for time series data, samples are constructed according to the sliding window mechanism, and the window size p is set according to the periodic characteristics of the sequence to generate training sample dataset and test sample dataset.

[0023] In this embodiment, the MNIST handwritten digit image recognition task is performed. A convolutional neural network (CNN) is selected as the basic model structure, consisting of a feature extractor and a classifier. The feature extractor contains two convolutional blocks and an adaptive average pooling layer; the classifier consists of fully connected layers, responsible for mapping the extracted high-level features to the final class probability distribution. The dataset used is the MNIST handwritten digit recognition dataset, which contains 70,000 images. Each client is allocated no less than 1,000 samples, and the data distribution is generated using a Dirichlet distribution (α=0.5) to generate a Non-IID partition, simulating the data heterogeneity in real-world edge environments.

[0024] S2. After receiving the model, client i selects a subset of clients to participate in each training round. The selected clients train on their local dataset to obtain the updated local model. In this embodiment, the control center randomly selects 6 clients to participate in training in each round.

[0025] S3, Client i calculates model update amount The algorithm then performs adaptive gradient compression, residual gradient accumulation, and differential privacy noise enhancement to obtain a secure update. .

[0026] In S3, the adaptive gradient compression mechanism dynamically adjusts the gradient compression ratio based on client reputation and training epochs. The compression ratio refers to the proportion of model parameters retained after compression, used to reduce communication while preserving key gradient information. The adaptive gradient compression mechanism specifically includes the following steps: S301, Based on the reputation value of client i and the current training round Dynamically calculate compression ratio : ; in , β is a hyperparameter. ∈[0,1] represents the client's current reputation value; in this embodiment... =0.3、 With β=0.05, the compression rate gradually increases from 30% to 60% with each round, while the compression rate decreases by 10-20% for low-reputation clients.

[0027] S302, According to the compression ratio Determine the number of gradients to retain, k, and take the compression ratio multiplied by the total number of gradients as the value of k.

[0028] S303. Perform Top-K filtering on the gradient tensor, retaining the k gradient values ​​with the largest absolute values ​​and setting the rest to zero; then, only transmit parameters whose change exceeds a preset threshold to further reduce communication overhead. In this embodiment, only parameters whose change exceeds 0.01 are transmitted; the compressed update is denoted as... .

[0029] S304, Unretained gradient values ​​are treated as residuals. Store the data and add it to the gradient in the next training iteration.

[0030] Residual gradient accumulation refers to the process where unpropagated gradient residuals from each round are saved and added to the gradients in the next training round, thus avoiding information loss and accelerating model convergence; specifically, it is expressed as: ; in It is a compressed gradient update. It is the residual from the previous round of preservation.

[0031] The differential privacy noise-adding mechanism adaptively adds Gaussian noise based on the client's reputation value, thereby perturbing the updates of low-reputation clients and reducing the adverse effects of their abnormal updates on the global model while protecting privacy. ; Among them, the standard deviation of noise for: ; For the privacy budget, sensitivity is used to determine the amount of noise needed to protect data privacy. δ is a very small positive value to prevent division by zero; the higher the reputation, the weaker the noise and the smaller the impact on model performance. In this embodiment, ϵ is set to 10 and sensitivity=1. Under this setting, the noise intensity of a malicious client is 3-5 times that of a normal client.

[0032] S4. The client will compress the update. The gradient is transmitted to the control center using a sparse transmission mechanism, which is value + index. The sparse transmission mechanism means that only the values ​​of non-zero gradients and their index positions are transmitted, which significantly reduces communication overhead. After receiving the gradient, the control center reconstructs the complete gradient based on the value and index.

[0033] S5. After receiving updates from all clients, the control center performs malicious attack detection and reputation assessment, and generates a new generation global model by weighting and aggregating the client reputation scores. Malicious attack detection and reputation assessment combine gradient norm anomaly detection, cosine similarity analysis, and historical behavior records to dynamically evaluate client reputation, identify and suppress malicious updates, and the control center performs weighted aggregation based on the reputation weights of participating clients, specifically including: S51. Calculate the gradient norm of each client update. And the mean cosine similarity with updates from other clients ; S52, Anomaly Detection Update: If the gradient norm of client i satisfies: ; Or the cosine similarity satisfies: ; Then it is marked as an exception, where and These are the mean and standard deviation of gradient updates for all clients, respectively. and Let be the mean and standard deviation of the cosine similarity between a client and other clients; S53, Dynamically update client reputation value: ; S54. Use reputation weights for weighted aggregation: ; in, This represents the global model before the current round of updates, where i and j are the client IDs participating in training in the current round. and Here, i and j are the reputation values ​​of clients i and j, respectively; N is the number of clients participating in training in the current round; and η is the server learning rate. In this embodiment, the server learning rate η is initially set to 0.3 and decreases by 5% every 5 rounds.

[0034] Figure 2 and Figure 3 This paper demonstrates how the accuracy and loss values ​​of the proposed method and the traditional method change with training rounds in scenarios with malicious clients. Experimental results show that the global model trained by the proposed method outperforms the traditional method in all training rounds. After the same number of training rounds, the accuracy of the proposed method is improved by 13.79% compared to traditional method 1 and by 1.63% compared to traditional method 2. The loss function curves over training rounds indicate that the model converges faster when trained using the proposed method. These results demonstrate that, in the presence of malicious clients, the accuracy of the model trained by the proposed method steadily improves with each training round, significantly outperforming the traditional robust aggregation algorithm.

[0035] Comparison of communication efficiency Figure 4As shown, compared with traditional methods, this invention reduces the average communication volume by 82% through gradient compression and sparsification transmission, significantly reducing bandwidth requirements.

[0036] During training, the control center detected malicious attacks in rounds involving malicious clients. For example, malicious client 2 was detected in round 3 and malicious client 7 in round 5, and their reputation values ​​were ultimately reduced to 0.49 and 0.44 respectively, effectively suppressing the impact of malicious updates on the model. This embodiment verifies the effectiveness of the invention method, which can maintain model performance under the influence of malicious clients while significantly reducing communication overhead, making it suitable for scenarios with malicious client attacks and limited communication.

[0037] S6. The control center will distribute the updated global model and reputation score to each client. S7. Repeat S2 to S6 until the model converges or the maximum number of training rounds is reached.

[0038] In addition, this training method supports a periodic model retraining mechanism, which dynamically triggers global updates based on changes in data distribution and model performance degradation to maintain the model's predictive ability.

[0039] Example 2 This embodiment performs time series prediction, using a Long Short-Term Memory (LSTM) network as the base model to perform single-step prediction of wireless traffic sequences in a certain region. The dataset used is a telecommunications wireless traffic dataset from a foreign city, which collects detailed call records generated by cellular networks from November 1, 2013 to January 1, 2014. Time series traffic records from 10 regions were selected, with a time granularity of 10 minutes. The data from the 10 regions were distributed to 10 clients. For the client data, a sliding window method was used to construct training and test samples. Malicious clients were numbered 2 and 7 to simulate a federated prediction scenario with Byzantine attacks. The system model is the same as in Embodiment 1, including a control center (server) and 10 clients, as follows... Figure 1 As shown. Each client holds a local dataset and maintains a local model, while the control center is responsible for the aggregation and distribution of global models. The specific implementation process is generally consistent with Example 1, except for the differences in training tasks, model structure, and data construction methods. The specific adjustments to the parameter settings are as follows: In S2, the control center randomly selects 7 clients to participate in training in each round; S303 only transmits parameters whose change exceeds 0.005; In S53, the server learning rate η is initially set to 0.2, and decreases by 5% every 5 rounds.

[0040] Figure 5This paper presents a comparison of the prediction results of our method and traditional methods in traffic prediction tasks. Figure 6 This demonstrates how the loss function of our method and the traditional method changes with the number of training epochs. Figure 7 This demonstrates how the cumulative communication volume of this method and the traditional method changes with training rounds. Figure 5 As can be seen, within the test interval, the prediction curve obtained by this method is generally closer to the actual traffic curve and can better follow the traffic change trend. In contrast, the prediction curve of the traditional method deviates from the actual value more significantly, indicating that the traditional robust aggregation method is insufficient in retaining effective update information under the time series prediction, malicious attack, and highly heterogeneous data scenarios corresponding to this embodiment. This method also performs well in MSE, RMSE, and R... 2 It outperforms the comparison method in all indicators, especially in the coefficient of determination R. 2 The positive value is obtained, while the R values ​​of the two comparison methods are... 2 A negative value indicates that this method can still effectively fit the actual traffic change trend under attack scenarios, while the fitting ability of the comparison method is significantly weaker. Figure 6 As can be seen, the global loss values ​​of all three methods gradually decrease with increasing training epochs, indicating that the models all possess a certain degree of convergence. However, the loss value of this method is lower than that of the traditional method in most training epochs, and remains at an even lower level in the later stages of training, indicating that this method can still achieve a more stable optimization process and faster effective convergence even in the presence of malicious client interference. Figure 7 As can be seen, the cumulative communication volume of this method is significantly lower than that of traditional methods. Traditional methods upload the complete model update in each training round, so the cumulative communication volume increases approximately linearly with each round. This method, through mechanisms such as adaptive gradient compression, threshold filtering, and sparsification, only uploads the updates of key parameters, reducing the average communication volume by 24%, effectively reducing the communication burden while ensuring prediction performance.

[0041] During training, the control center detected malicious attacks in rounds involving malicious clients, ultimately reducing the reputation scores of malicious client 2 and malicious client 7 to 0.102 and 0.189 respectively, effectively suppressing the impact of malicious updates on the model. This embodiment verifies the effectiveness of the method, demonstrating its ability to maintain good prediction performance under malicious attack scenarios while maintaining communication efficiency. It is suitable for distributed time series prediction scenarios with malicious attacks and limited communication resources.

[0042] Therefore, this invention employs a secure and robust federated learning training method, which optimizes the federated learning process to achieve secure and efficient model training in a non-independent, identically distributed data environment. During the federated learning training process, malicious gradient detection, dynamic gradient compression, and reputation evaluation mechanisms are introduced, enabling the system to distinguish between reliable and malicious clients and adaptively reduce the communication overhead generated by reliable and malicious clients. In the aggregation phase, the control center adjusts the aggregation weights of each client based on reputation, reducing the impact of malicious clients on the global model and improving the accuracy of the global model.

[0043] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. A secure and robust federated learning training method, characterized in that, Includes the following steps: S1. Control Center initializes global model. and the model Send to client i to obtain the initial client model set { , ,…, The control center initializes the reputation values ​​of all clients. S2. After receiving the model, client i selects a subset of clients to participate in each training round. The selected clients train on their local dataset to obtain the updated local model. ; S3, Client i calculates model update amount The algorithm then performs adaptive gradient compression, residual gradient accumulation, and differential privacy noise enhancement to obtain a secure update. ; S4. The client will compress the update. The data is transmitted to the control center using a sparse transmission mechanism, i.e., value + index. S5. After receiving updates from all clients, the control center performs malicious attack detection and reputation assessment, and generates a new generation global model by weighting and aggregating the client reputation scores. ; S6. The control center will distribute the updated global model and reputation score to each client. S7. Repeat S2 to S6 until the model converges or the maximum number of training rounds is reached.

2. The secure and robust federated learning training method according to claim 1, characterized in that, In S1, the control center performs the following operations before pushing the model to be trained to the client: S11. Select the specific form of the model: Choose from linear and nonlinear models based on task complexity. Linear models include logistic regression and support vector machines, while nonlinear models include convolutional neural networks, recurrent neural networks, and Transformers. For image classification tasks, choose convolutional neural networks (CNNs). For time series prediction, choose long short-term memory networks (LSTMs) and temporal convolutional networks (TCNs). S12. Dataset processing: The client's data is divided into training dataset and test dataset according to a certain ratio; For time-series data, samples are constructed using a sliding window mechanism, with the window size p set according to the periodic characteristics of the sequence, to generate training and test sample datasets.

3. The secure and robust federated learning training method according to claim 2, characterized in that, In S3, the adaptive gradient compression mechanism dynamically adjusts the gradient compression rate based on the client's reputation and the training epoch. The compression rate refers to the proportion of model parameters retained after compression, which is used to reduce communication while preserving key gradient information. The adaptive gradient compression mechanism specifically includes the following steps: S301, Based on the reputation value of client i and the current training round Dynamically calculate compression ratio : ; in , β is a hyperparameter. ∈[0,1] represents the client's current reputation value; S302, According to the compression ratio Determine the number of gradients to retain, k, and take the compression ratio multiplied by the total number of gradients as the value of k; S303. Perform Top-K filtering on the gradient tensor, retain the k gradient values ​​with the largest absolute values, and set the rest to zero; Subsequently, only parameters whose changes exceed a preset threshold are transmitted, further reducing communication overhead; the compressed update is denoted as... ; S304, Unretained gradient values ​​are treated as residuals. Store the gradients and add them to the new gradients in the next training round.

4. The secure and robust federated learning training method according to claim 3, characterized in that, In S3, residual gradient accumulation means that the gradient residuals that are not propagated in each round are saved and added to the gradients in the next round of training, thereby avoiding information loss and accelerating model convergence; specifically, it is expressed as: ; in It is a compressed gradient update. It is the residual from the previous round of preservation.

5. The secure and robust federated learning training method according to claim 4, characterized in that, The differential privacy noise-adding mechanism in S3 adaptively adds Gaussian noise based on the client's reputation value, thereby perturbing the updates of low-reputation clients and reducing the adverse effects of their abnormal updates on the global model while protecting privacy. ; Among them, the standard deviation of noise for: ; For the privacy budget, sensitivity is used to determine the amount of noise that needs to be added to protect data privacy. δ is a very small positive value to prevent division by zero. The higher the reputation, the weaker the noise and the smaller the impact on model performance.

6. The secure and robust federated learning training method according to claim 5, characterized in that, In S4, the sparse transport mechanism means that only the values ​​of non-zero gradients and their index positions are transmitted, which significantly reduces communication overhead; after receiving the data, the control center reconstructs the complete gradient based on the values ​​and indexes.

7. The secure and robust federated learning training method according to claim 6, characterized in that, In S5, malicious attack detection and reputation assessment combine gradient norm anomaly detection, cosine similarity analysis, and historical behavior records to dynamically evaluate client reputation, identify and suppress malicious updates, and the control center performs weighted aggregation based on the reputation weights of participating clients, specifically including: S51. Calculate the gradient norm of each client update. And the mean cosine similarity with updates from other clients ; S52. Detect abnormal updates. Mark an update as abnormal if any of the following conditions exist: The gradient norm of client i satisfies: ; Or the cosine similarity satisfies: ; in and These are the mean and standard deviation of gradient updates for all clients, respectively. and Let be the mean and standard deviation of the cosine similarity between a client and other clients; S53, Dynamically update client reputation value: ; S54. Use reputation weights for weighted aggregation: ; in, This represents the global model before the current round of updates, where i and j are the client IDs participating in training in the current round. and η represents the reputation values ​​of clients i and j, respectively; N represents the number of clients participating in training in the current round; and η represents the server learning rate.

8. The secure and robust federated learning training method according to claim 1, characterized in that, The training method supports a periodic model retraining mechanism, which dynamically triggers global updates based on changes in data distribution and model performance degradation to maintain the model's predictive ability.