A federated learning communication optimization method and device based on local and global double clipping

CN122293705APending Publication Date: 2026-06-26JIANGSU FUTURE NETWORKS INNOVATION +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JIANGSU FUTURE NETWORKS INNOVATION
Filing Date
2026-05-11
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Federated learning faces challenges in mobile terminals and IoT edge scenarios, including communication constraints and deterioration of communication burden, convergence, and stability caused by non-independent and identically distributed (Non-IID) data distribution. Existing optimization methods have failed to effectively balance communication efficiency and training stability.

Method used

A federated learning method based on local and global dual pruning is adopted. The pruning stochastic gradient descent suppresses directional drift, and cosine similarity is combined with adaptive bit width quantization. Robust aggregation and weighted fusion between groups are performed on the server side according to bit width to form a closed-loop control framework for edge-cloud linkage.

Benefits of technology

In scenarios involving non-independent and identically distributed models and local multi-step models, this approach reduces communication overhead, alleviates aggregation bias and convergence oscillations, improves convergence stability and accuracy, and achieves a comprehensive improvement in both communication efficiency and model accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122293705A_ABST
    Figure CN122293705A_ABST
Patent Text Reader

Abstract

This invention discloses a federated learning communication optimization method and apparatus based on local and global dual pruning. The method includes: calculating the cosine similarity between the global parameters from the previous iteration and the parameters after local iteration; making adaptive bit-width decisions based on the similarity, updating parameters locally using pruned stochastic gradient descent, and reporting the bit-width label and quantization gradient to the server; grouping and aggregating clients according to the bit-width label on the server side, using robust aggregation within each group, and then performing weighted global fusion of the results from each group to obtain the global gradient; and updating model parameters using pruned stochastic gradient descent. Compared with existing technologies, this invention, through a local similarity adaptive quantization-group robust aggregation-global pruning end-cloud linkage mechanism, effectively reduces communication overhead, suppresses aggregation bias and oscillation, and improves convergence stability and accuracy in scenarios with non-independent and identically distributed systems and local multi-step processes.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of digital information transmission technology, and relates to artificial intelligence and distributed machine learning technology, specifically to a federated learning communication optimization method based on local and global dual pruning. Background Technology

[0002] Federated Learning (FL) achieves privacy-friendly distributed training by retaining data at each participating end (client) and exchanging only model updates, ensuring "data never leaves the end." This is particularly beneficial in edge / IoT scenarios, where distributed training can unlock the potential value of data at each end while mitigating privacy and compliance pressures. However, in edge scenarios such as mobile terminals and IoT, FL faces two core bottlenecks: firstly, communication limitations leading to upload burdens; and secondly, the non-independent and identically distributed (Non-IID) data distribution and local multi-step updates (…). These factors together cause a deterioration in convergence and stability, and are prone to problems such as directional drift being amplified by compressed noise, and the coexistence of convergence bias and convergence oscillation. It is difficult to balance communication efficiency and training stability under limited bandwidth and energy consumption budget.

[0003] To address the aforementioned issues, industry and academia have developed several typical technical approaches:

[0004] (1) Quantization: Uploading local updates in low-bit representation (such as fixed-point quantization, random unbiased quantization, symbol / level quantization, etc.) can significantly reduce communication volume. However, fixed bit width makes it difficult to take into account the differences in "update reliability" among different clients; simply increasing the compression ratio will amplify quantization error, which is more unfavorable for Non-IID scenarios.

[0005] (2) Sparsification: Only a portion of the coordinates are uploaded by truncating the Top-k / threshold and the lost information is compensated by residual caching to reduce uplink bits.

[0006] (3) Client selection / weighting and similarity measurement: Some works attempt to weight or sample based on the consistency between the client and the global direction (such as similarity, gradient variance, and historical performance) to alleviate the Non-IID bias. However, common practices mostly remain at the level of aggregated weights and have not yet formed a closed loop collaboration with the client-side compressed bit width and the server-side aggregated structure.

[0007] (4) Grouping and aggregation: In order to reduce the impact of heterogeneity, some studies have grouped clients and then aggregated them. However, the grouping is often based on data statistics or geographical and task attributes, and there is a lack of a grouping mechanism that is directly bound to the communication bit width and confidence level. At the same time, the linkage control with the global pruning step size is also insufficient.

[0008] Existing federated learning communication optimization and robust convergence methods operate independently and each has its shortcomings: while quantization and sparsification can reduce bit width, they easily amplify directional drift and compress noise under strong Non-IID and local multi-step conditions; error feedback and robust aggregation can suppress outlier updates, but lack linkage with edge-side compression strategies, making it difficult to maintain stable convergence under low bit width conditions; similarity weighting and client selection mostly remain at the aggregation end weight level, failing to form an edge-cloud closed loop of "similarity, bit width, and aggregation weight," and cannot systematically amplify high-confidence information of the "correct direction"; group aggregation is mostly divided according to data / geographical attributes, without explicitly considering the quantization error and heterogeneity coupling reflected by bit width; at the same time, the lack of engineering details such as communication budget and bit width hysteresis control, local iteration discussion and joint constraints of quantization parameters leads to frequent policy jitter and over-budget occurrences, convergence oscillations and accuracy degradation, making it difficult to simultaneously achieve communication efficiency and convergence steady state in complex edge scenarios. Summary of the Invention

[0009] Purpose of the invention: To address the problems existing in the prior art, this invention proposes a federated learning communication optimization method and apparatus based on local and global dual pruning. It suppresses directional drift caused by non-independent identically distributed systems through end-side pruning stochastic gradient descent, performs adaptive bit-width quantization by combining the cosine similarity between the global parameters from the previous round and the parameters after multiple local iterations, and robustly aggregates and weightedly fuses the data by bit-width grouping on the server side. Finally, it updates the step size using the global pruning step size. This approach reduces communication overhead, alleviates aggregation bias and convergence oscillations, and improves convergence stability and accuracy in scenarios with limited communication and multiple local steps.

[0010] Technical solution: To achieve the above-mentioned objectives, the present invention adopts the following technical solution:

[0011] A federated learning communication optimization method based on local and global dual pruning includes the following steps:

[0012] Step S1: Obtain and distribute the global parameters for the current training round to each client;

[0013] In step S2, the client uses pruned stochastic gradient descent to iteratively update the local data, calculates and updates the global parameters according to the local pruning step size, and obtains the cumulative local update vector.

[0014] Step S3: The client calculates the cosine similarity between the global parameters obtained in step S1 and the global parameters after multiple iterations in step S2, and makes an adaptive bit width decision based on the cosine similarity, selecting the local upload quantization level or bit width.

[0015] Step S4: The client quantizes the cumulative local update vector obtained in step S2 according to the local upload quantization level or bit width selected in step S2, generates a quantized payload containing cumulative update statistics, symbol and amplitude index, and bit width label, and uploads it to the server.

[0016] Step S5: The server groups the clients according to the bit width labels uploaded by the clients; reconstructs the update vector for each client, and performs intra-group robust aggregation on the reconstructed update vector within each bit width group to obtain the intra-group aggregation result.

[0017] Step S6: The server calculates the weights based on the confidence index of each wide group, performs inter-group weighted fusion on the aggregation results within the group, and obtains the global gradient.

[0018] Step S7: The server calculates the global pruning step size based on the global gradient obtained in step S6 and performs a global model update.

[0019] In step S8, the server sends the updated global parameters and strategy parameters from step S7 to each client, and proceeds to the next training round until the stopping condition is met.

[0020] Specifically, step S2 includes the following process:

[0021] Client Receive global parameters sent in step S1 Use this as the initial value for local updates, based on a preset number of local steps. Perform mini-batch training and apply pruning constraints to each update step; each update step includes:

[0022] (1) The following formula is used to evaluate the first... Sub-batch samples Calculate gradient :

[0023] ,

[0024] in, For global iteration rounds, For local objective function, For local parameters;

[0025] (2) Set the clipping step size for the current step based on the gradient norm. :

[0026] ,

[0027] in For learning rate, This is the cutting factor;

[0028] (3) Update the local global parameters according to the pruning step size, and accumulate the local update vector:

[0029] ,

[0030] After iteration, we obtain the global parameters and the cumulative local update vector. .

[0031] Specifically, in step S3, the cosine similarity is calculated using the following formula. :

[0032] ,

[0033] in, These are the global parameters in step S1. This is the global parameter increment;

[0034] The adaptive bit-width decision based on cosine similarity includes:

[0035] Similarity Input monotonic mapping function To obtain the local upload quantization level or bit width ;in, It is a piecewise monotonic function. , For quantization series or bit width candidate sets.

[0036] Furthermore, in step S3, upper and lower similarity thresholds are set. With hysteresis band ,when Upgrade to a higher width range, when Drop to a lower width range; fall into Maintain or gradually change gears in the middle zone.

[0037] Furthermore, step S3 involves communication budget. Perform budget projection on bit-width allocation under constraints, so that , For client i based on similarity The upload bit width obtained from the decision.

[0038] Specifically, step S4 includes the following processes:

[0039] Calculate the cumulative local update vector 2-norm Normalize the vector to , It is the numerical stability constant;

[0040] Set width The corresponding signed uniform quantization series is , For each coordinate remember With probability Rounding to or , and retain the symbols Quantization generates a symbol and amplitude index. :

[0041] ,

[0042] in, The dimension of the model parameters;

[0043] Quantization operator Satisfies both unbiasedness and bounded variance:

[0044] ,

[0045] in, The quantization error coefficient varies with the bit width. Monotonically decreasing;

[0046] Generate and package quantized payloads :

[0047] ,

[0048] in, For symbol and magnitude index, For the client The upload bit width value is determined based on similarity decision.

[0049] Furthermore, in step S5, norm truncation is performed on the reconstructed vector before robust aggregation within the group.

[0050] Furthermore, step S5 also includes: calculating the statistics for each bit width group. :

[0051] ,

[0052] in, This indicates the number of clients in the corresponding bit-width group. This represents the upload update norm statistics for each client within the corresponding bit-width group. The average value; For estimation of variance within the bit width group;

[0053] Step S6 specifically includes the following process:

[0054] Based on statistics With bit width Construction group confidence score:

[0055] ,

[0056] in This is an adjustable coefficient. It is the numerical stability constant. Group size;

[0057] The confidence scores are weighted using temperature-based Softmax normalization.

[0058] ,

[0059] in, Temperature parameter , ;

[0060] The obtained weights are used to linearly fuse the aggregation results within each group to obtain the global gradient for this round. :

[0061] ,

[0062] in This is the intra-group aggregation result obtained in step S5.

[0063] Furthermore, if there was a shock or [unclear] in the previous round of training Fluctuations exceeding the threshold led to increased weights for high-bit-width groups, increased bit-width limits for highly similar clients, and decreased weights for some clients in this training round. If the previous training session converged smoothly and the budget was tight, then reduce the bit width or shrink the bit depth. The proportion of high bit width, or keeping the bit width unchanged while reducing the reporting frequency; among them, For the global gradient in this round, Local steps This is the candidate bit-width set.

[0064] The present invention also provides a federated learning communication optimization device based on local and global dual pruning, comprising:

[0065] The parameter distribution module is used to obtain and distribute the global parameters of the current training round to each client.

[0066] The local training and pruning module is used to iteratively update local data on the client using pruning stochastic gradient descent, calculate and update global parameters according to the local pruning step size, and obtain the cumulative local update vector.

[0067] The similarity assessment and bit width decision module is used to calculate the cosine similarity between the global parameters obtained in step S1 and the global parameters after multiple iterations in step S2 on the client side, and to make an adaptive bit width decision based on the cosine similarity, selecting the local upload quantization level or bit width.

[0068] The quantization and upload module is used to quantize the cumulative local update vector obtained in step S2 on the client side according to the local upload quantization level or bit width selected in step S2, generate a quantized payload containing cumulative update statistics, sign and amplitude index, and bit width label, and upload it to the server side.

[0069] The bit-width grouping and aggregation module is used to group clients on the server side according to the bit-width tags uploaded by the clients; reconstruct the update vector for each client; and perform robust intra-group aggregation on the reconstructed update vector within each bit-width group to obtain the intra-group aggregation result.

[0070] The inter-group weighted fusion module is used to calculate weights on the server side based on the confidence index of each wide group, perform inter-group weighted fusion on the aggregation results within the group, and obtain the global gradient.

[0071] The global pruning and update module is used to calculate the global pruning step size on the server side based on the global gradient obtained in step S6, and to perform global model updates.

[0072] The strategy coordination and feedback module is used to send the global parameters updated in step S7, along with the strategy parameters, to each client on the server side, and proceed to the next training round until the stopping condition is met.

[0073] Beneficial effects:

[0074] This invention employs adaptive bit-width quantization by introducing the cosine similarity between the global parameters from the previous iteration and the parameter increments after multiple local iterations on the client side. On the server side, robust aggregation is performed on different bit-width groups, followed by weighted fusion between groups. This ensures that quantization errors are primarily concentrated on key update direction information, resulting in higher-precision information retention. Simultaneously, robust aggregation within groups suppresses abnormal and noisy updates, while weighted fusion between groups further reduces the interference of low-bit-width groups on the global model update direction. Thus, under the same communication budget, this effectively weakens the superposition effect of direction deviations caused by non-independent and identically distributed data and multiple local iterations, improving the stability of the global update direction and reducing the risk of convergence oscillations.

[0075] By using a similarity-adaptive bit-width mechanism to correlate communication bit resources with the direction of local updates, limited bandwidth is prioritized for expressing information that contributes more to the global update direction. Combined with robust aggregation by bit-width grouping on the server side and weighted fusion strategy between groups, a hierarchical aggregation mode is formed that suppresses anomalies within groups and fuses data between groups based on credibility. This reduces the overall communication burden and prevents low-quality updates from polluting the global model in a high-precision form, thus achieving a comprehensive improvement in communication efficiency, model accuracy, and robustness, even when terminal computing power and bandwidth are heterogeneous and data distribution is heterogeneous.

[0076] Compared with existing technologies, this invention effectively reduces communication overhead, suppresses aggregation bias and oscillation, and improves convergence stability and accuracy in scenarios with non-independent and identically distributed distribution and local multi-step processing by combining local similarity adaptive quantization with robust grouping aggregation, weighted fusion between groups, and global pruning in an end-to-cloud linkage mechanism. Attached Figure Description

[0077] Figure 1 This is a schematic diagram of the end-to-cloud linkage process of the present invention. Detailed Implementation

[0078] The technical solutions provided by the present invention will be described in detail below with reference to specific embodiments. It should be understood that the following specific embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention.

[0079] Example 1: The federated learning communication optimization method based on local and global dual pruning proposed in this invention limits the local update amplitude by pruning stochastic gradient descent on the edge side to suppress orientation distortion caused by Non-IID; and performs adaptive bit-width quantization based on the cosine similarity between the global parameters of the previous round and the parameters after multiple local iterations, so that the local updates that are more consistent with the global orientation and have higher confidence are uploaded with higher bit-width fine encoding, thereby reducing the quantization damage to key information from the source.

[0080] On the server side, this invention performs robust aggregation of uploaded updates in groups based on bit-width labels. Within each group, robust operators such as weighted mean, truncated mean, and median coordinates are used to suppress abnormal updates. Subsequently, weighted fusion between groups is implemented according to the principle of "high bit-width (high similarity) high weight, low bit-width low weight" to obtain a global gradient with lower bias. Finally, global pruning step size is used to update model parameters to offset compression errors and further stabilize the convergence process. This edge-cloud linkage mechanism constitutes a closed-loop control framework of "local pruning, similarity-adaptive bit-width, two-level aggregation, and global pruning," which has good scalability: when new clients or heterogeneous devices are added, the bit-width only needs to be dynamically selected based on their similarity to the global gradient and connected to the existing group fusion process. This allows for continuous benefits from reduced communication overhead, reduced aggregation bias, and improved convergence stability without changing the overall training pipeline.

[0081] Reference Figure 1 This embodiment proposes a federated learning communication optimization method based on local and global dual pruning. Each training round consists of a client-side processing stage and a server aggregation stage in sequence: In the client-side processing stage, each client receives the global parameters from the previous round. The system performs multi-step pruning SGD (Stochastic Gradient Descent) updates on the local data, calculates the local cumulative update vector, and calculates the cosine similarity between the global parameters from the previous round and the parameters after the local multi-step updates. Perform bit width decision Subsequently, the local update is unbiasedly quantized according to the selected bit width and reported with a bit width label; during the server aggregation phase, the server divides the uploaded update into several bit width groups based on the bit width label, and performs robust aggregation within each group to obtain the results of each group. Then, weighted fusion between groups is performed according to the principle of "high bit width (high similarity) high weight, low bit width low weight" to obtain the global gradient. And calculate the global clipping step size based on its norm. Complete global parameter update Finally, the updated global parameters and optional policy thresholds are sent back to the edge to enter the next round. Figure 1 It intuitively demonstrates the end-to-cloud collaborative process of "local cropping, similarity-adaptive bit width, robust aggregation by bit width grouping, and global cropping update".

[0082] Specifically, the federated learning communication optimization method based on local and global dual pruning proposed in this invention includes the following steps:

[0083] Step S1: Obtain and distribute the global model parameters for the current round. The process extends to participating client nodes to complete the initialization and resource constraint settings for this round of federated training.

[0084] The server first determines the participating sets. (Proportional random sampling or availability-based filtering can be used), synchronize the hyperparameters and policy parameters for this round, including the learning rate. Cutting factor , local step limit Communication budget Candidate bit width set (Preferred values ​​are {4, 8, 16}), and similarity threshold. And the hysteresis interval. Meanwhile, to ensure reproducibility and consistency, a random number seed, batch size, data shuffling method, and aggregation round identifier are distributed. Each client receives... Then, complete the local state reset: clear the previous round's residuals / cache (if any), load or reset the optimizer state, and calculate the local sample size. With available computing resources, and based on device capabilities and data volume in the range Internal settings for local multi-step count After completing the above preparations, the client enters the local training phase (S2), and the server waits for the quantization payloads from each client in this round to be transmitted back in order to start the aggregation process.

[0085] In step S2, each client performs local multi-step training and local pruning, generating local cumulative update vectors and suppressing directional drift caused by non-independent identically distributed vectors without uploading the original data.

[0086] Specifically, the client Receive the global parameters sent in step S1 Then, use it as the local initial value. ,in Indicates the global iteration round, based on a preset local step count. Mini-batch training is performed, and pruning constraints are applied to each update step, as follows:

[0087] 1. Mini-batch gradient calculation: For the first... Sub-batch samples Calculate gradient

[0088] ,

[0089] in For the local objective function (such as cross-entropy / mean squared error), mini-batch samples are generated according to the batch size and data shuffling method agreed upon in step S1.

[0090] 2. Calculation of clipping step size (local clipping SGD): Set the clipping step size for the current step based on the gradient norm.

[0091] ,

[0092] in For learning rate, is the pruning factor, used to limit the magnitude of single-step updates and suppress perturbations of abnormally large gradient pairs in the direction.

[0093] 3. Parameter Update and Accumulation: Update local parameters according to the clipping step size, and accumulate the amount of unquantized local updates:

[0094] ,

[0095] The cumulative amount is initialized as follows: .

[0096] 4. Output after multi-step iteration: When After the update is complete, the parameters after local multi-step updates are obtained. and cumulative local update vector To proceed to the similarity evaluation in step S3, the client simultaneously calculates or retains the parameter increment. .

[0097] Step S3: The client performs similarity assessment and bit width decision-making to control uplink communication bits while ensuring the fidelity of key information. Specifically, this includes:

[0098] Based on the local multi-step post-parameters obtained in step S2 and the global parameters, calculate the cosine similarity:

[0099] ,

[0100] in, It reflects the consistency between the local update direction and the global update direction. The larger the value, the more "close" it is to the global update direction.

[0101] Similarity Input the monotonic mapping function to obtain the local upload quantization level / bit width:

[0102] ,

[0103] It is a piecewise monotonic function that satisfies the principle of "higher similarity - larger bit width" to improve the numerical fidelity of high confidence updates; This is a candidate set for quantization levels / bit widths.

[0104] Set upper / lower thresholds and hysteresis band To avoid frequent gear shifting near the boundary, when Upgrade to a higher width level; when It drops to a lower range; it falls into a lower range. Maintain or gradually shift gears in the middle zone. If the upper wheel width is... Only when Crossing hysteresis boundaries (e.g., setting S respectively) L - S L + S H - S H + Switch to the new gear only when it serves as the boundary for shifting between different gears. Meanwhile, in terms of communication budget Budget projection of bit-width allocation under constraints, making , For client i based on similarity The upload bit width obtained from the decision.

[0105] Determine the bit width Local update statistics accumulated with step S2 (e.g.) All of these are retained and proceed to step S4 for unbiased quantization and load reporting to generate the load. To support subsequent bit-width grouping and aggregation, For bit-width labels.

[0106] Step S4: The client performs unbiased quantization and payload reporting, uploading the locally accumulated update vector to the server in low-bit format, provided that the error is controllable. Specifically, this includes:

[0107] Calculate the L2 norm of the cumulative vector Normalize the vector to ( (where the numerical stability constant is used) to ensure that each coordinate falls within... The dynamic range facilitates bit width Symmetric quantization under the following conditions.

[0108] Let the signed uniform quantization series corresponding to this bit width be . ( For each coordinate remember (in This represents the vector coordinate index, i.e., the cumulative locally updated vector. or normalized vector The One component; Indicates the current coordinate magnitude The quantization interval number that falls into, which is also the lower bound index in the corresponding quantization level, is expressed with probability. Rounding to or , and retain the symbols After quantization, two types of indexes are formed:

[0109] ,

[0110] in, Indicates the dimension of the model parameters.

[0111] Quantization operator Satisfies both unbiasedness and bounded variance:

[0112] ,

[0113] in Represents the quantization error coefficient, which varies with bit width. Monotonically decreasing, used to characterize on the client side Use bit width Update vector locally During quantization, the upper bound of the mean square error is introduced to amplify the energy of the original vector. The quantization payload is then generated and packaged.

[0114]

[0115] in Used for server-side amplitude reconstruction For symbol and magnitude index, For the client The upload bit width value is determined based on similarity decision.

[0116] The client will transmit via the uplink The data is sent back to the server; the server reconstructs the data at the receiving end according to the agreed-upon protocol.

[0117] ,

[0118] in This is the dequantization mapping corresponding to step S2. Reconstruct the vector. and its bit width label The bit-width grouping and in-group robust aggregation used in step S5 will be performed.

[0119] In step S5, the server performs bit-width grouping and robust aggregation within the group, using the bit-width labels reported by the end side along with the load to decouple the effects of quantization noise and heterogeneity, and obtain robust aggregation results for each bit-width group.

[0120] The server receives uploads from each client. Divide the client index set by bit width label , 𝑏 represents the candidate bit-width set A specific bit width value is used, i.e., the "bit width level" used by the server when grouping data. The update vector is reconstructed for each client. Within each bit-width group, for Perform robust aggregation to obtain the within-group aggregation result. ,in This represents the vector after norm truncation of the reconstructed vector. It can take the weighted mean, the Trimmed-mean, or the median, and can perform norm truncation on the reconstructed vector before intra-group aggregation to suppress the impact of anomalous updates:

[0121] ,

[0122] in, This is the truncation threshold for this bit-width group.

[0123] For subsequent weighted fusion between groups, the statistics for each bit width group are calculated:

[0124] ,

[0125] in, This indicates the number of clients in the corresponding bit-width group, i.e., the group size. This represents the upload update norm statistics for each client within the corresponding bit-width group. The average value; This is an estimate of the variance within the bit-width group.

[0126] Step S6: The server performs weighted inter-group fusion, weighting and synthesizing the aggregation results within each wide group based on the confidence level of each wide group to obtain the global gradient for this round. Based on the statistics in step S5 With bit width Construction group confidence score:

[0127] ,

[0128] in This is an adjustable coefficient. It is the numerical stability constant. Group size. This format reflects the principle that higher confidence levels are associated with higher bit width, smaller intra-group variance, and larger group size. Scores are weighted using temperature-based Softmax normalization.

[0129] ,

[0130] Among them, temperature parameter Control the sharpness of the weight distribution. Weight The weights monotonically increase with bit width / similarity and can be determined by combining within-group variance and group size to achieve high-weighted high-bit width (high similarity) and low-weighted inter-group aggregation. The obtained weights are then used to linearly fuse the results of each group to obtain the global gradient for this round.

[0131] .

[0132] Step S7: The server performs global pruning and parameter updates to suppress aggregation oscillations caused by quantization compression and heterogeneity, and generates the global parameters for the next round of distribution. Based on the global gradient obtained in step S6... Calculate the step length of the current wheel according to the cutting rules:

[0133] ,

[0134] in, For learning rate, This represents the clipping factor. The global parameters are updated once using the clipping step size.

[0135] ,

[0136] It can suppress the impact of non-independent identically distributed (i.i.d.) and quantization errors on convergence. If necessary, it can be decoupled from the momentum / adaptive optimizer state at the implementation level.

[0137] Step S8 involves executing strategy coordination and entering the next round of training. This includes disseminating the results from the previous round and adaptively tuning the control parameters for the current round, thereby initiating a new round of end-cloud collaboration. The server will participate in the aggregation... Updated global parameters This round of communications budget Bit width candidate set Similarity threshold and hysteresis interval Intergroup fusion temperature and coefficient And local step count suggestions for each client. Distribute to each client.

[0138] Client receives Afterwards, the optimizer state and cache are reset, and the application server issues the following: Configure the bit width hysteresis rule of S3, according to Execute the bit-width budget projection strategy and in the interval Select the actual local number of steps. .

[0139] If the previous round of fluctuations or Fluctuations exceeding the threshold will lead to increased weighting of high-bit-width groups, a moderate increase in the upper limit of bit-width for highly similar clients, and a decrease in the bit-width of some clients in this round. If the previous convergence was smooth and the budget was tight: the bit width could be slightly reduced or the bandwidth shortened. The proportion of high bit width, or keep the bit width unchanged while reducing the reporting frequency.

[0140] The process terminates when a preset convergence / performance threshold is met or the maximum number of training rounds is reached; otherwise, it continues. The updated policy parameters are used as the initial conditions for a new round, and execution returns to S1 to continue.

[0141] Through the synergistic effect of steps S1 to S8 above, the client-side bit-width adaptive quantization based on cosine similarity enables the uploaded information to exhibit a precision distribution hierarchically according to contribution. The server-side first performs robust aggregation within each bit-width group to eliminate abnormal updates, and then suppresses the excessive influence of low-bit-width groups on the global direction through weighted inter-group fusion. This results in global updates possessing both low quantization distortion and strong resistance to anomalies / heterogeneities. Compared to schemes that only employ uniform bit-width quantization or only use a single aggregation rule, this invention can achieve a more stable global update direction and better cross-terminal robustness with the same or lower communication overhead.

[0142] Example 2: Based on the same technical concept as the method example, this example provides a federated learning communication optimization device based on local and global dual pruning, including:

[0143] The parameter distribution module is used to obtain and distribute the global model parameters for the current round. The learning rate is simultaneously distributed to each client node. Cutting factor , local step limit Communication budget Candidate bit width set and similarity threshold and hysteresis interval .

[0144] The local training and pruning module is used to perform training and pruning on local data on the client side. In the next iteration, calculate the mini-batch gradient. , and according to Perform pruned stochastic gradient descent to limit the magnitude of single-step updates, accumulating locally updated vectors. .

[0145] The similarity assessment and bit-width decision module is used on the client side to perform similarity assessment based on the previous round of global parameters and multiple local iterations. Increment of parameters Calculate cosine similarity And based on monotonic mapping Select the quantization level / bit width for local upload. This module allows configuration of upper and lower thresholds. hysteresis interval To avoid frequent gear shifting and in communication budget Budget projection of the next bit width allocation.

[0146] The quantization and upload module is used for local updates of vector pairs. Perform unbiased random quantization and generate quantized payloads. Quantization operator satisfy:

[0147] ,

[0148] and As the bit width decreases, this module transmits the payload back to the server via the uplink.

[0149] The bit-width grouping and aggregation module is used on the server side to divide the client into several bit-width groups based on bit-width labels. Robust aggregation is then performed on the reconstructed update vectors of each group to obtain the within-group results. in The mean can be one of the weighted mean, the truncated mean, or the median; alternatively, a norm truncation can be applied to the vectors within the group to suppress anomalous updates.

[0150] The inter-group weighted fusion module is used to calculate weights based on confidence indicators such as bit width, within-group variance, and group size. The results from each group are then weighted and fused to obtain the global gradient. , The preferred method is temperature-based Softmax normalization.

[0151] ,

[0152] in .

[0153] The global clipping and update module is used to... Calculate the global clipping step size , and according to Update global model parameters.

[0154] The strategy coordination and feedback module is used to transmit the updated global parameters. and strategy parameters (such as) and local step count suggestions The data is then transmitted back to the end device to begin the next round of training; and the parameters are adaptively tuned based on the steady-state indices from the previous round.

[0155] It should be understood that the federated learning communication optimization device in the embodiments of the present invention is implemented by software and can realize all the technical solutions in the method embodiments (S1 to S8). For the parts not described in detail, please refer to the relevant descriptions of the method embodiments, which will not be repeated here.

[0156] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0157] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0158] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0159] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0160] It should be noted that the above content merely illustrates the technical concept of the present invention and should not be construed as limiting the scope of protection of the present invention. For those skilled in the art, various improvements and modifications can be made without departing from the principle of the present invention, and all such improvements and modifications fall within the scope of protection of the claims of the present invention.

Claims

1. A federated learning communication optimization method based on local and global dual pruning, characterized in that, Includes the following steps: Step S1: Obtain and distribute the global parameters for the current training round to each client; In step S2, the client uses pruned stochastic gradient descent to iteratively update the local data, calculates and updates the global parameters according to the local pruning step size, and obtains the cumulative local update vector. Step S3: The client calculates the cosine similarity between the global parameters obtained in step S1 and the global parameters after multiple iterations in step S2, and makes an adaptive bit width decision based on the cosine similarity, selecting the local upload quantization level or bit width. Step S4: The client quantizes the cumulative local update vector obtained in step S2 according to the local upload quantization level or bit width selected in step S2, generates a quantized payload containing cumulative update statistics, symbol and amplitude index, and bit width label, and uploads it to the server. Step S5: The server groups the clients according to the bit width tags uploaded by the clients; For each client, reconstruct the update vector, and perform intra-group robust aggregation on the reconstructed update vector within each wide group to obtain the intra-group aggregation result. Step S6: The server calculates the weights based on the confidence index of each wide group, performs inter-group weighted fusion on the aggregation results within the group, and obtains the global gradient. Step S7: The server calculates the global pruning step size based on the global gradient obtained in step S6 and performs a global model update. In step S8, the server sends the updated global parameters and strategy parameters from step S7 to each client, and proceeds to the next training round until the stopping condition is met.

2. The federated learning communication optimization method based on local and global dual pruning as described in claim 1, characterized in that, The specific steps S2 are as follows The process includes the following: Client Receive global parameters sent in step S1 Use this as the initial value for local updates, based on a preset number of local steps. Perform mini-batch training and apply pruning constraints for each update step; The training process includes: (1) The following formula is used to evaluate the first... Sub-batch samples Calculate gradient : , in, For global iteration rounds, For local objective function, For local parameters; (2) Set the clipping step size for the current step based on the gradient norm. : , in For learning rate, This is the cutting factor; (3) Update the local global parameters according to the pruning step size, and accumulate the local update vector: , After iteration, we obtain the global parameters and the cumulative local update vector. .

3. The federated learning communication optimization method based on local and global dual pruning as described in claim 1, characterized in that, In step S3, the cosine similarity is calculated using the following formula. : , in, These are the global parameters in step S1. This is the global parameter increment; The adaptive bit-width decision based on cosine similarity includes: Similarity Input monotonic mapping function To obtain the local upload quantization level or bit width ;in, It is a piecewise monotonic function. , For quantization series or bit width candidate sets.

4. The federated learning communication optimization method based on local and global dual pruning according to claim 1 or 3, characterized in that, In step S3, upper and lower thresholds for cosine similarity are set. With hysteresis band ,when Upgrade to a higher width range, when Reduced to a lower bandwidth; when Falling Maintain or gradually change gears in the middle zone.

5. The federated learning communication optimization method based on local and global dual pruning according to claim 1 or 3, characterized in that, Step S3 is in the communication budget Perform budget projection on bit-width allocation under constraints, so that , For client i based on similarity The upload bit width obtained from the decision.

6. The federated learning communication optimization method based on local and global dual pruning according to claim 1, characterized in that, Step S4 specifically includes the following process: Calculate the cumulative local update vector 2-norm Normalize the vector to , It is the numerical stability constant; Set width The corresponding signed uniform quantization series is , For each coordinate remember With probability Rounding to or , and retain the symbols Quantization generates a symbol and amplitude index. : , in, The dimension of the model parameters; Quantization operator Satisfies both unbiasedness and bounded variance: , in, The quantization error coefficient varies with the bit width. Monotonically decreasing; Generate and package quantized payloads : , in, For symbol and magnitude index, For the client The upload bit width value is determined based on similarity decision.

7. The federated learning communication optimization method based on local and global dual pruning according to claim 1, characterized in that, In step S5, norm truncation is performed on the reconstructed vector before robust aggregation within the group.

8. The federated learning communication optimization method based on local and global dual pruning according to claim 1, characterized in that, Step S5 further includes: calculating the statistics for each bit width group. : , in, This indicates the number of clients in the corresponding bit-width group. This represents the upload update norm statistics for each client within the corresponding bit-width group. The average value; Estimating the variance within the bit width group; Step S6 specifically includes the following process: Based on statistics With bit width Construction group confidence score: , in This is an adjustable coefficient. It is the numerical stability constant. Group size; The confidence scores are weighted using temperature-based Softmax normalization. , in, Temperature parameter , ; The obtained weights are used to perform linear fusion of the intra-group aggregation results for each group to obtain the global gradient for this round. : , in This is the intra-group aggregation result obtained in step S5.

9. The federated learning communication optimization method based on local and global dual pruning according to claim 1, characterized in that, If there was a shock or Fluctuations exceeding the threshold led to increased weights for high-bit-width groups, increased bit-width limits for highly similar clients, and decreased weights for some clients in this training round. If the previous training session converged smoothly and the budget was tight, then reduce the bit width or shrink the bit depth. The proportion of high bit width, or keeping the bit width unchanged while reducing the reporting frequency; among them, For the global gradient in this round, Local steps This is the candidate bit-width set.

10. A federated learning communication optimization device based on local and global dual pruning, characterized in that, include: The parameter distribution module is used to obtain and distribute the global parameters of the current training round to each client. The local training and pruning module is used to iteratively update local data on the client using pruning stochastic gradient descent, calculate and update global parameters according to the local pruning step size, and obtain the cumulative local update vector. The similarity assessment and bit width decision module is used to calculate the cosine similarity between the global parameters obtained in step S1 and the global parameters after multiple iterations in step S2 on the client side, and to make an adaptive bit width decision based on the cosine similarity, selecting the local upload quantization level or bit width. The quantization and upload module is used to quantize the cumulative local update vector obtained in step S2 on the client side according to the local upload quantization level or bit width selected in step S2, generate a quantized payload containing cumulative update statistics, sign and amplitude index, and bit width label, and upload it to the server side. The bit-width grouping and aggregation module is used to group clients on the server side according to the bit-width tags uploaded by the clients; For each client, reconstruct the update vector, and perform intra-group robust aggregation on the reconstructed update vector within each wide group to obtain the intra-group aggregation result. The inter-group weighted fusion module is used to calculate weights on the server side based on the confidence index of each wide group, perform inter-group weighted fusion on the aggregation results within the group, and obtain the global gradient. The global pruning and update module is used to calculate the global pruning step size on the server side based on the global gradient obtained in step S6, and to perform global model updates. The strategy coordination and feedback module is used to send the global parameters updated in step S7, along with the strategy parameters, to each client on the server side, and proceed to the next training round until the stopping condition is met.