Federated learning model compression method, user end, server and system
By selecting appropriate compression strategies based on application performance metrics, federated learning models are compressed and aggregated in a personalized manner, solving the problem of unsatisfactory model compression results in existing technologies. This enables the deployment of federated learning models with high accuracy and low cost, thereby improving the user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2023-05-10
- Publication Date
- 2026-06-26
AI Technical Summary
In existing technologies, all application models running on the client use the same compression method, resulting in unsatisfactory model compression effects, low accuracy, and negatively impacting user experience.
Based on the performance metrics of the target application, a suitable compression strategy is selected to compress the local model, and the model is uploaded to the server for model aggregation processing via compression identifier to obtain global model parameters and update the local model.
By employing personalized compression strategies, communication costs are reduced while maximizing the accuracy of federated learning model parameters, significantly improving the user experience.
Smart Images

Figure CN116610950B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of communication technology, and in particular to a federated learning model compression method, a client, a server, and a system. Background Technology
[0002] With the rapid development of digital technology, users use a wide variety of applications such as video players, audio players, readers, and social software. During the use of these applications, various types of data urgently need to be uploaded or downloaded.
[0003] Federated learning is a learning approach where data is distributed across different entities. Clients train local models using local data and then transmit their model parameters to a central server. The central server aggregates the local model parameters from all clients and distributes the updated parameters back to the clients, enabling each client to train its local model using these updated parameters. However, due to limited storage resources on both the client and central server sides, and the unstable network connection speeds caused by a large number of clients participating in federated learning, local models may not be fully transmitted to the central server for global aggregation. Therefore, compressing the transmitted local models improves communication efficiency between the client and central server.
[0004] However, the data types, data sizes, and data processing requirements of various applications differ. In existing technologies, all application models running on the client are usually compressed using the same method, which results in less than ideal model compression and relatively low accuracy, thus affecting the user experience. Summary of the Invention
[0005] This application provides a federated learning model compression method, client, server, and system to at least solve the problem in the prior art that the model compression effect is not ideal and the accuracy is relatively low when all application models running on the client are compressed using the same compression method.
[0006] To solve the above-mentioned technical problems, this application is implemented as follows:
[0007] In a first aspect, embodiments of this application provide a federated learning model compression method, applied to a user terminal, including:
[0008] Obtain at least one local model corresponding to the target application;
[0009] Based on the performance metrics of the target application, determine the compression strategy for the local model;
[0010] The local model is compressed based on the compression strategy to obtain a local compressed model and a compression identifier corresponding to the local compressed model; the compression identifier is used to indicate the compression strategy corresponding to the local compressed model.
[0011] The local compressed model and compression identifier are uploaded to the server for model aggregation processing to obtain the global model parameters issued by the server.
[0012] The local model is updated based on the global model parameters.
[0013] In one possible implementation, obtaining the local model corresponding to at least one target application includes:
[0014] Obtain the local dataset of at least one target application;
[0015] A preset initialization model is trained based on the local dataset to obtain the local model corresponding to the target application.
[0016] In one possible implementation, the performance metrics include at least one of the following: performance metrics for characterizing real-time performance, performance metrics for characterizing accuracy, and performance metrics for characterizing fault tolerance.
[0017] In one possible implementation, determining the compression strategy of the local model based on the performance metrics of the target application includes:
[0018] Obtain multiple preset compression strategies;
[0019] Based on the performance metrics of the target application, one of the multiple compression strategies is selected as the compression strategy for the local model.
[0020] Secondly, embodiments of this application provide a machine learning model compression method, applied to a server, comprising:
[0021] Obtain the local compression model sent by the user and the compression identifier corresponding to the local compression model; the compression identifier is used to indicate the compression strategy corresponding to the local compression model;
[0022] The local compression model is aggregated based on the compression identifier to obtain global model parameters;
[0023] The global model parameters are sent to the user terminal.
[0024] In one possible implementation, the aggregation process of the local compression model based on the compression identifier to obtain global model parameters includes:
[0025] The local compression model that matches the compression identifier with the preset identifier is retained and determined as the target compression model;
[0026] The target compression model is aggregated to obtain global model parameters.
[0027] In one possible implementation, sending the global model parameters to the user terminal includes:
[0028] The global model parameters are sent to the target user terminal corresponding to the target compression model.
[0029] Thirdly, embodiments of this application provide a user terminal, including:
[0030] The first acquisition module is used to acquire at least one local model corresponding to the target application;
[0031] The determination module is used to determine the compression strategy of the local model based on the performance metrics of the target application;
[0032] A compression module is used to compress the local model based on the compression strategy to obtain a local compressed model and a compression identifier corresponding to the local compressed model; the compression identifier is used to indicate the compression strategy corresponding to the local compressed model.
[0033] The first transmission module is used to upload the local compressed model and compression identifier to the server for model aggregation processing and to obtain the global model parameters issued by the server.
[0034] An update module is used to update the local model based on the global model parameters.
[0035] Fourthly, embodiments of this application provide a server, including:
[0036] The second acquisition module is used to acquire the local compression model sent by the user terminal and the compression identifier corresponding to the local compression model; the compression identifier is used to indicate the compression strategy corresponding to the local compression model.
[0037] The aggregation module is used to aggregate the local compression model based on the compression identifier to obtain global model parameters;
[0038] The second transmission module is used to send the global model parameters to the user terminal.
[0039] Fifthly, embodiments of this application provide a federated learning model compression system, characterized in that it includes at least one terminal as described in the third aspect and at least one server as described in the fourth aspect; the terminal and the server are interconnected.
[0040] The federated learning model compression method provided in this application involves the following steps: The user obtains at least one local model corresponding to a target application; based on the performance metrics of the target application, a compression strategy for the local model is determined; the local model is compressed according to the compression strategy to obtain a local compressed model and its corresponding compression identifier; the local compressed model and compression identifier are uploaded to a server for model aggregation processing to obtain global model parameters issued by the server; and the local model is updated based on the global model parameters. In this way, by selecting an appropriate compression strategy based on the application's performance metrics, and considering the deployment of federated learning models across numerous devices, communication costs can be reduced while maximizing the accuracy of the federated learning model parameters, significantly improving the user experience.
[0041] Furthermore, the server obtains the local compression model and its corresponding compression identifier sent by the user client; it then performs aggregation processing on the local compression model based on the compression identifier to obtain global model parameters; finally, it sends the global model parameters to the user client. In this way, aggregating the local compression model based on the compression identifier to obtain global model parameters can meet the performance requirements of users for different types of applications and improve the user experience.
[0042] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description
[0043] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0044] Figure 1 This paper shows a schematic diagram of the structure of a federated learning model compression system provided in an embodiment of this application;
[0045] Figure 2 A flowchart illustrating a federated learning model compression method provided in an embodiment of this application is shown.
[0046] Figure 3 A flowchart illustrating another federated learning model compression method provided in an embodiment of this application is shown;
[0047] Figure 4 This illustration shows a schematic diagram of a user terminal structure provided in an embodiment of this application;
[0048] Figure 5 This illustration shows a schematic diagram of the structure of a server provided in an embodiment of this application;
[0049] Figure 6 A schematic diagram of the structure of an electronic device provided in an embodiment of this application is shown. Detailed Implementation
[0050] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0051] Figure 1 The figure shows a schematic diagram of a federated learning model compression system provided in an embodiment of this application. As shown in the figure, the system includes user terminals 101, 102, 103, 104, and 105, and servers 106 and 107. The user terminals and servers are interconnected. The user terminals are used to send local compressed models to the servers and receive global models sent by the servers.
[0052] User terminals 101, 102, 103, 104, and 105 can be various electronic devices with displays that support web browsing, including but not limited to smartphones, tablets, laptops, and desktop computers, or edge devices; the user terminals have various communication applications installed, such as knowledge reading applications, web browser applications, search applications, instant messaging tools, email clients, and / or social platform software (for example only).
[0053] Servers 106 and 107 can be various types of servers that provide a variety of services, such as cloud servers, distributed system servers, or servers that incorporate blockchain.
[0054] In existing technologies, all application models running on the client (such as web browsers, video players, etc.) are usually compressed using the same method. However, since the data types, data sizes, and data processing requirements of various applications differ, the compression effect of local models using existing methods is not ideal and the accuracy is relatively low.
[0055] Based on this, this application provides a federated learning model compression method. According to the performance indicators of the application, an appropriate compression strategy is selected for model compression, which reduces communication costs while maximizing the accuracy of federated learning model parameters and improving the user experience.
[0056] Figure 2 The diagram illustrates a flowchart of a federated learning model compression method provided in an embodiment of this application. This method 200 can be executed by user terminals 101, 102, 103, 104, and 105, and specifically includes the following steps:
[0057] S201: Obtain at least one local model corresponding to the target application.
[0058] In practice, the application currently running on the user's device can be selected as the target application, with each target application corresponding to a local model. Optionally, a model identifier with specific bits can be assigned to the local model, which indicates the application corresponding to the local model.
[0059] The target applications include, but are not limited to, knowledge reading applications, web browser applications, search applications, instant messaging tools, email clients and / or social media platforms.
[0060] In one possible implementation, in S201, obtaining the local model corresponding to at least one target application includes:
[0061] Obtain the local dataset of at least one target application;
[0062] A preset initialization model is trained based on the local dataset to obtain the local model corresponding to the target application.
[0063] In practice, one or more target applications running on the user's end provide corresponding local data, forming a local dataset. A pre-defined initialization model is trained based on this local dataset, resulting in a local model corresponding to each target application. Here, there is a one-to-one correspondence between the local model and the target application. Since the training on the user's end based on the local dataset is personalized and relies on the characteristics of the local data, each local model can contain the features of the corresponding target application. Compressing the local models and uploading them to the server for aggregation processing to obtain the corresponding global model parameters enhances the adaptability of the local models.
[0064] S202: Based on the performance metrics of the target application, determine the compression strategy of the local model.
[0065] The performance metrics include at least one of the following: performance metrics for characterizing real-time performance, performance metrics for characterizing accuracy, and performance metrics for characterizing fault tolerance.
[0066] In practical implementation, this application determines the compression strategy of the local model based on the performance indicators of the target application. For example, audio and video data have relatively large data volumes, so a compression strategy with higher transmission quality can be selected; instant messaging tools require high real-time performance, so a compression strategy with faster speed can be selected.
[0067] In one possible implementation, determining the compression strategy of the local model based on the performance metrics of the target application includes:
[0068] Obtain multiple preset compression strategies;
[0069] Based on the performance metrics of the target application, one of the multiple compression strategies is selected as the compression strategy for the local model.
[0070] In practical implementation, compression strategies can include FetchSGD, QSGD, and dynamic pruning. FetchSGD can achieve lossy compression at high compression ratios, but its inherent randomness during compression and decompression means it cannot guarantee fast convergence. Using this approach requires significant time to adjust the hash table's dimensions and final size based on the actual model to achieve a high compression ratio. QSGD performs single quantization based on boundaries and quantization levels, followed by compression and decompression, achieving fast convergence. However, due to its single quantization algorithm, the error between the quantized federated learning model parameters and the original parameters is too large at high compression ratios, often resulting in low accuracy. Dynamic pruning combines pruning and concatenation, synchronizing training and compression to compress the federated learning model. The introduction of concatenation avoids performance losses caused by incorrect pruning, thus approaching the theoretical limit. However, the values of some key parameters are uncertain in different models and layers, and it is easily limited by sparse matrix algorithms and bandwidth.
[0071] Based on the performance metrics of the target application, a compression strategy that matches the target application is selected from the preset compression strategies as the compression strategy for the local model, so as to meet the user's performance requirements for application data and improve the user experience.
[0072] S203: Based on the compression strategy, the local model is compressed to obtain a local compressed model and a compression identifier corresponding to the local compressed model; the compression identifier is used to indicate the compression strategy corresponding to the local compressed model.
[0073] In practice, the local model is compressed using the compression strategy selected in S202 above, resulting in a local compressed model and its corresponding compression identifier. Here, each local model corresponds to a local compressed model, and the compression strategy adopted by each local model is adapted to the performance metrics of its target application.
[0074] S204: Upload the local compressed model and compression identifier to the server for model aggregation processing, and obtain the global model parameters issued by the server.
[0075] In practice, each local compression model contains the structure of its original complete local model. Each local compression model has a fixed-bit compression identifier to indicate the corresponding compression strategy, facilitating server decompression. The server receives the local compression model uploaded by the user, determines the compression strategy used by the local compression model based on the compression identifier, and aggregates the local compression models based on model data from other user clients to obtain global model parameters. These global model parameters are then sent to the user clients for a new round of iterative training until preset conditions are met, completing the federated learning aggregation.
[0076] S205: Update the local model based on the global model parameters.
[0077] In practice, the user client updates the local model based on the obtained global model parameters, performs a new round of iterative training, compresses the updated local model, and uploads it to the server.
[0078] Through the above steps, the user end selects an appropriate compression strategy to compress the model based on the application's performance metrics. With the federated learning model deployed on numerous devices, communication costs can be reduced while maximizing the accuracy of the federated learning model parameters, significantly improving the user experience.
[0079] Figure 3 This paper illustrates a flowchart of another federated learning model compression method provided in an embodiment of this application. This method 300 can be executed by servers 106 and 107, and specifically includes the following steps:
[0080] S301: Obtain the local compression model sent by the user terminal and the compression identifier corresponding to the local compression model; the compression identifier is used to indicate the compression strategy corresponding to the local compression model.
[0081] In practice, the server obtains the local compression model and the corresponding compression identifier sent by the user. Based on the compression identifier, the compression strategy corresponding to the local compression model can be determined.
[0082] S302: Aggregate the local compression model based on the compression identifier to obtain global model parameters.
[0083] In practice, one can set up a server to aggregate local compression models with the same compression identifier to obtain global model parameters; or multiple servers can be set up, with each server processing only one type of compression identifier, so that the number of servers is equal to the number of compression identifier categories.
[0084] In one possible implementation, in S302, the aggregation process of the local compression model based on the compression identifier to obtain global model parameters includes:
[0085] The local compression model that matches the compression identifier with the preset identifier is retained and determined as the target compression model; the target compression model is aggregated to obtain the global model parameters.
[0086] In practice, the server can identify local compression models that match preset identifiers based on the compression identifiers corresponding to the local compression models from each user terminal, and clear local compression models corresponding to other types of compression strategies; or it can determine whether a local compression model has a compression identifier that matches the preset identifier. If it does, the local compression model is loaded into the memory; if it does not, the local compression model corresponding to other types of compression strategies is cleared.
[0087] S303: Send the global model parameters to the user terminal.
[0088] In practice, the aggregated global model is distributed to the user terminal.
[0089] In one possible implementation, in S303, sending the global model parameters to the user terminal includes: sending the global model parameters to the target user terminal corresponding to the target compression model.
[0090] In this way, by aggregating the local compression model based on the compression identifier, global model parameters can be obtained, which can meet the performance requirements of users for different types of applications and improve the user experience.
[0091] The following sections will further illustrate the above methods using FetchSGD, QSGD, and dynamic pruning as examples:
[0092] Commonly used letter symbols and their meanings include: W represents the number of user terminals, T represents the total number of iterations, C represents the number of layers in the complete model to be compressed, L(·) represents the network's loss function, ⊙ represents the Hadamard product operator, and β represents the learning rate.
[0093] a. FetchSGD uses hash tables to compress and restore parameters in federated learning models. Specifically, the user calculates gradients based on their local data; the gradient is compressed using a Count Sketch data structure, and the compressed model is uploaded locally; the server stores momentum and error accumulation Count Sketches, from which weight updates for each training round can be extracted. A specific application example is as follows:
[0094] Input parameters include: compression function decompression function The size of the local model batch is l, and the momentum parameter is ρ.
[0095] (1) Initialize the relevant parameters of the compression function to 0.
[0096] (2) The weights w of the new global model downloaded by each user terminal t -w 0 The global model may be sparse.
[0097] (3) Train the Batch B of the user terminal i The formula for calculating the stochastic gradient and updating the local model is as follows:
[0098]
[0099] Where t is the current iteration number. For local models, Here is a function used to represent the loss. Let z be the partial derivative of the loss function with respect to w. j This refers to the local data of the j-th user.
[0100] Use compression functions Local model The local compression model is obtained after compression. And upload it to all servers.
[0101] (4) Each server aggregates the local compression models received from all users, introduces a bias using Top-k, and calculates the weights for the next round. Simultaneously, it calculates the cumulative error for the next round to facilitate subsequent error feedback. The execution expression is as follows:
[0102] Aggregating local compression models from various user terminals
[0103] Calculate momentum
[0104] Calculate the error of this round
[0105] Based on the error, reconstruct the compression model and calculate the bias.
[0106] Accumulated calculation error
[0107] The weights are updated based on the error, w t+1 =w t -Δ t ;
[0108] Repeat steps (2)-(4) above T times, where T is the preset maximum number of iterations, until the federated learning model training is completed.
[0109] b. The stochastic quantization applied in the QSGD algorithm is a general, parameterizable, lossy compression method for stochastic gradient vectors. Specifically, it includes: First, given the gradient vector on the user terminal, quantizing each component by randomly selecting a set of discrete values, preserving the statistical properties of the original values; second, performing an efficient lossless encoding of the quantized gradient, utilizing the statistical properties of the gradient to generate an effective code. The client can weigh the number of bits transmitted in each iteration against the increased variance during the process. A specific application example is as follows:
[0110] For any non-zero vector v, the quantization function Q s The expression for (v) is as follows:
[0111] Q s (v i )=||v||2·sgn(v i )·ξ i (v, s)
[0112] Where s is an adjustable parameter representing the quantization level, with a value greater than or equal to 1, and the corresponding quantization level follows a uniform distribution between (0, 1); ||·||2 represents the magnitude of the vector; sgn(·) represents the sign function.
[0113] The quantization values maintain the mathematical expectation of the quantization level and take into account the minimum variance.
[0114] Where, ξ i (v, s) are independent random variables, defined as follows:
[0115]
[0116] Where l is an integer, and [l / s, (l+1) / s] is the value corresponding to |v i The quantization interval of | / ||v||2.
[0117] Wherein, for any a∈[0,1], p(a,s)=as-l.
[0118] Traverse {0, 1 / s, ..., 1}, ξ i (v, s) has the minimum variance, and its expected value satisfies
[0119] The algorithm uses an efficient gradient encoding method, and the specific implementation process is as follows:
[0120] Given a triple gradient vector (||v||2, σ, ζ) with s quantization levels as input, the encoded output string is defined as follows:
[0121] First, ||v||2 is encoded using 32 bits.
[0122] Second, Elias recursive encoding is used to encode the position of the first non-zero term of ζ.
[0123] Third, use 1 bit to represent σ i Then, Elias recursive encoding is used to align the subsequent (s·ξ) i (v, s)) is encoded.
[0124] Iteratively, the distance from the current ζ to the next non-zero term is encoded using Elias recursive coding, and the σ of the current ζ is encoded in the same way. i and (s·ξ) i (v, s)) is encoded.
[0125] The decoding method corresponding to the encoding is as follows:
[0126] First, read 32 bits to construct ||v||2.
[0127] Then, the decoding scheme of Elias recursive encoding is used iteratively to read the non-zero positions and values of ζ and σ.
[0128] The specific process of the algorithm is as follows:
[0129] Step 1: Initialize the local model for the first iteration of each Epoch to any value y. (1) =x0;
[0130] The user-side computes the gradient of the unquantized local model. And aggregate based on the received global model.
[0131] The client updates the local compression model for the t-th iteration in the p-th epoch. The formula is as follows:
[0132]
[0133] Uploaded by each user terminal To all servers.
[0134] Step 2: Each server calculates the updated local model.
[0135] Each server updates the global model
[0136] After T iterations, each server calculates the result of the (p+1)th epoch. Available for download by all users.
[0137] Step 3: Repeat the above steps for each epoch until the federated learning model training is complete.
[0138] c. Dynamic pruning is based on two operations: pruning and concatenation. Pruning removes weights initially deemed unimportant to compress the federated learning model; however, the importance of weights cannot be definitively determined, often resulting in over-pruning. Therefore, concatenation reintegrates important but incorrectly pruned weights into the model, restoring effective connections between layers. Because the importance of updating model parameters needs to be considered when necessary, these two operations are performed simultaneously, making the approach dynamic. A specific application example is as follows:
[0139] Due to the mutual influence and activation between interconnected units, the importance of parameters in a network is extremely difficult to measure. A network connection may be redundant due to the presence of other connections. However, once those other connections are removed, the network connection becomes crucial. Therefore, a learning process should be appropriately implemented and the network architecture continuously maintained.
[0140] Taking the k-th layer as an example, the optimization problem is expressed as follows:
[0141]
[0142] in, It is a binary matrix representing the state of network connections, i.e., whether they are currently pruned. Therefore, these matrices can be viewed as mask matrices.
[0143] Among them, set From matrix w k The indexes of all items in the array.
[0144] Among them, h k (·) is the discriminant function, and its formula is as follows:
[0145]
[0146] Since the metric of parameter importance affects the state of network connectivity, the function Pruning dynamic networks is crucial. In each iteration, parameters with smaller amplitudes are temporarily pruned, while parameters with larger amplitudes are preserved or spliced together.
[0147] The threshold has a significant impact on the final compression ratio. To improve the robustness of the scheme, a small edge t is introduced, and two thresholds a are set. k and b k The relationship is a k =b k +t. Set parameters whose absolute values exceed this range to the corresponding connection state entries. This indicates that the parameter will neither be pruned nor spliced in the current iteration.
[0148] Based on the Lagrange multiplier algorithm and gradient descent algorithm, the formula for weight update is as follows:
[0149]
[0150] This weight update formula not only updates the important parameters, but also updates T. k The parameters corresponding to the zeroth term are considered ineffective in reducing network loss. The flexibility of the scheme is improved by splicing improperly pruned connections.
[0151] The chain rule is used to calculate using randomly selected small batches of samples. and The partial derivatives. Apply the updated... and Recalculate the activation and loss function gradients of the complete local model. Iterating through these steps will enable the sparse model to produce the desired accuracy.
[0152] The specific process of the algorithm is as follows:
[0153] Step 1: Input local model reference weights Baseline learning rate α, learning policy ψ(·); weights W of any layer of the local model k Initialize as reference weights Connect the state matrix T k Initialize as a matrix of all 1s; initialize the learning rate β to 1, and initialize the current iteration number iter to 0;
[0154] Step 2: Select a mini-batch of samples from the training data x and perform forward propagation; through (W0⊙T0), ..., (W C ⊙T C Calculate the loss function L;
[0155] Step 3: Place {W k T k Backpropagation is performed within the range 0 ≤ k ≤ C, and the gradient of the loss function is calculated.
[0156] Step 4: Through function h k (·) and the current W with probability σ(iter) k Update the network connection state matrix T k ;
[0157] Step 5: Apply the gradient of the current loss function Update W k ;
[0158] Step 6: Repeat steps 4 and 5 layer by layer until all layers of the model have been executed;
[0159] Step 7: After incrementing the iteration number iter by 1, update the learning rate using the learning strategy, β = ψ(α, iter);
[0160] Repeat steps 2 through 7 above until the number of iterations iter reaches the desired maximum value.
[0161] Figure 4 A schematic diagram of a user terminal provided in an embodiment of this application is shown. As shown in the figure, the user terminal 400 includes:
[0162] The first acquisition module 410 is used to acquire at least one local model corresponding to a target application;
[0163] The determination module 420 is used to determine the compression strategy of the local model based on the performance indicators of the target application;
[0164] Compression module 430 is used to compress the local model based on the compression strategy to obtain a local compressed model and a compression identifier corresponding to the local compressed model; the compression identifier is used to indicate the compression strategy corresponding to the local compressed model.
[0165] The first transmission module 440 is used to upload the local compressed model and compression identifier to the server for model aggregation processing and to obtain the global model parameters issued by the server.
[0166] The update module 450 is used to update the local model based on the global model parameters.
[0167] In one possible implementation, the first acquisition module 410 includes:
[0168] A data acquisition unit is used to acquire a local dataset for at least one target application;
[0169] The model training unit is used to train a preset initialization model based on the local dataset to obtain the local model corresponding to the target application.
[0170] In one possible implementation, the performance metrics include at least one of the following: performance metrics for characterizing real-time performance, performance metrics for characterizing accuracy, and performance metrics for characterizing fault tolerance.
[0171] In one possible implementation, the determining module 420 includes:
[0172] The strategy acquisition unit is used to acquire multiple preset compression strategies;
[0173] The strategy selection unit is used to select one of the multiple compression strategies as the compression strategy for the local model based on the performance indicators of the target application.
[0174] This application provides a user terminal including a first acquisition module, a determination module, a compression module, a first transmission module, and an update module. The first acquisition module acquires at least one local model corresponding to a target application. The determination module determines a compression strategy for the local model based on the performance metrics of the target application. The compression module compresses the local model based on the compression strategy to obtain a local compressed model and a compression identifier corresponding to the local compressed model. Then, the first transmission module uploads the local compressed model and the compression identifier to a server for model aggregation processing to obtain global model parameters issued by the server. The update module updates the local model based on the global model parameters. In this way, the user terminal selects an appropriate compression strategy for model compression according to the application's performance metrics. With federated learning models deployed across multiple devices, communication costs can be reduced while maximizing the accuracy of federated learning model parameters, significantly improving the user experience.
[0175] Figure 5 A schematic diagram of a server structure provided in an embodiment of this application is shown. As shown in the figure, this embodiment of the application provides a server 500, which includes:
[0176] The second acquisition module 510 is used to acquire the local compression model sent by the user terminal and the compression identifier corresponding to the local compression model; the compression identifier is used to indicate the compression strategy corresponding to the local compression model.
[0177] Aggregation module 520 is used to aggregate the local compression model based on the compression identifier to obtain global model parameters;
[0178] The second transmission module 530 is used to send the global model parameters to the user terminal.
[0179] In one possible implementation, the aggregation module 520 includes:
[0180] The model determination unit is used to retain the local compression model that matches the compression identifier with the preset identifier and determine it as the target compression model;
[0181] The model aggregation unit is used to aggregate the target compressed model to obtain global model parameters.
[0182] In one possible implementation, the second transmission module 530 is further configured to send the global model parameters to the target user terminal corresponding to the target compression model.
[0183] This application provides a server including a second acquisition module, an aggregation module, and a second transmission module. The second acquisition module acquires a local compression model sent by a user terminal and a compression identifier corresponding to the local compression model. The aggregation module performs aggregation processing on the local compression model based on the compression identifier to obtain global model parameters. Then, the second transmission module sends the global model parameters to the user terminal. In this way, the server aggregates the local compression model according to the compression identifier to obtain global model parameters, which can meet the performance requirements of users for different types of applications and improve the user experience.
[0184] Figure 6 This diagram illustrates the hardware structure of an electronic device implementing the embodiments of this application. Referring to the diagram, at the hardware level, the electronic device includes a processor and optionally, an internal bus, a network interface, and a memory. The memory may include RAM, such as high-speed random-access memory (RAM), or non-volatile memory, such as at least one disk storage device. Of course, this computer device may also include other hardware required for its functions.
[0185] The processor, network interface, and memory can be interconnected via an internal bus, which can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. This bus can be categorized as an address bus, data bus, control bus, etc. For ease of illustration, only a single bidirectional arrow is used in this diagram, but this does not imply that there is only one bus or one type of bus.
[0186] Memory stores programs. Specifically, programs may include program code, which includes computer operation instructions. Memory may include main memory and non-volatile memory, and provides instructions and data to the processor.
[0187] The processor reads the corresponding computer program from non-volatile memory into main memory and then runs it, logically forming a device that locates the target user. The processor executes the program stored in memory and specifically performs the following: Figure 2 or Figure 3 The methods disclosed in the embodiments shown achieve the functions and beneficial effects of the methods described in the preceding method embodiments, and will not be repeated here.
[0188] The above is as stated in this application. Figure 2 or Figure 3 The methods disclosed in the illustrated embodiments can be implemented in or by a processor. The processor may be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; it can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can reside in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, or registers. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method.
[0189] The computer device can also execute the methods described in the preceding method embodiments and achieve the functions and beneficial effects of the methods described in the preceding method embodiments, which will not be repeated here.
[0190] Of course, in addition to software implementation, the electronic device of this application does not exclude other implementation methods, such as logic devices or a combination of hardware and software, etc. In other words, the execution subject of the following processing flow is not limited to each logic unit, but can also be hardware or logic devices.
[0191] This application also proposes a computer-readable storage medium that stores one or more programs, which, when executed by an electronic device including multiple applications, cause the electronic device to perform... Figure 2 or Figure 3 The methods disclosed in the embodiments shown achieve the functions and beneficial effects of the methods described in the preceding method embodiments, and will not be repeated here.
[0192] The computer-readable storage medium includes read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks, etc.
[0193] Furthermore, embodiments of this application also provide a computer program product, the computer program product including a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, which, when executed by a computer, implement the following process: Figure 2 or Figure 3 The methods disclosed in the embodiments shown achieve the functions and beneficial effects of the methods described in the preceding method embodiments, and will not be repeated here.
[0194] In summary, the above description is merely a preferred embodiment of this application and does not limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.
[0195] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, a computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.
[0196] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information by any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can store information accessible to a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.
[0197] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0198] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to interchangeably. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.
Claims
1. A federated learning model compression method, applied to the user end, characterized in that, include: Obtain at least one local model corresponding to the target application; Based on the performance metrics of the target application, determine the compression strategy for the local model; The local model is compressed based on the compression strategy to obtain a local compressed model and a compression identifier corresponding to the local compressed model. The compression identifier is used to indicate the compression strategy corresponding to the local compression model; The local compressed model and compression identifier are uploaded to the server for model aggregation processing to obtain the global model parameters issued by the server. Update the local model based on the global model parameters; The step of determining the compression strategy of the local model based on the performance metrics of the target application includes: Obtain multiple preset compression strategies; based on the performance metrics of the target application, select a compression strategy that matches the target application from the preset compression strategies as the compression strategy for the local model; the compression strategies include FetchSGD, QSGD, and dynamic pruning. The server is used to aggregate the local compression model based on the compression identifier to obtain global model parameters, specifically including at least one of the following: If the compression strategy corresponding to the local compression model is determined to be FetchSGD based on the compression identifier, the momentum and the error of the current round are calculated according to the local compression models of all users. The compression model is reconstructed based on the error and the bias is calculated. The weights are updated based on the bias until the preset maximum number of iterations is reached, and the global model parameters are obtained. If the compression strategy corresponding to the local compression model is determined to be QSGD based on the compression identifier, the global model is updated according to the local compression models uploaded by all users, and the process is repeated T times. The average value of the global model in the T iterations is then determined as the global model parameter. If the compression strategy corresponding to the local compression model is determined to be dynamic pruning based on the compression identifier, the weights of each layer in the local compression model are updated according to the connection state matrix and the gradient of the loss function until the maximum number of iterations is reached, and the global model parameters are obtained.
2. The method according to claim 1, characterized in that, The step of obtaining at least one local model corresponding to a target application includes: Obtain the local dataset of at least one target application; A preset initialization model is trained based on the local dataset to obtain the local model corresponding to the target application.
3. The method according to claim 1, characterized in that, The performance metrics include at least one of the following: performance metrics for characterizing real-time performance, performance metrics for characterizing accuracy, and performance metrics for characterizing fault tolerance.
4. A machine learning model compression method, applied to a server, characterized in that, include: Obtain the local compression model sent by the user and the compression identifier corresponding to the local compression model; The compression identifier is used to indicate the compression strategy corresponding to the local compression model. The compression strategy includes FetchSGD, QSGD and dynamic pruning. The user terminal obtains at least one local model corresponding to the target application. Based on the performance index of the target application, it selects a compression strategy that matches the target application from the preset compression strategies as the compression strategy of the local model. The local compression model is aggregated based on the compression identifier to obtain global model parameters; The global model parameters are sent to the user terminal; The step of aggregating the local compression model based on the compression identifier to obtain global model parameters specifically includes at least one of the following: If the compression strategy corresponding to the local compression model is determined to be FetchSGD based on the compression identifier, the momentum and the error of the current round are calculated according to the local compression models of all users. The compression model is reconstructed based on the error and the bias is calculated. The weights are updated based on the bias until the preset maximum number of iterations is reached, and the global model parameters are obtained. If the compression strategy corresponding to the local compression model is determined to be QSGD based on the compression identifier, the global model is updated according to the local compression models uploaded by all users, and the process is repeated T times. The average value of the global model in the T iterations is then determined as the global model parameter. If the compression strategy corresponding to the local compression model is determined to be dynamic pruning based on the compression identifier, the weights of each layer in the local compression model are updated according to the connection state matrix and the gradient of the loss function until the maximum number of iterations is reached, and the global model parameters are obtained.
5. The method according to claim 4, characterized in that, The aggregation process based on the compression identifier to obtain global model parameters includes: The local compression model that matches the compression identifier with the preset identifier is retained and determined as the target compression model; The target compression model is aggregated to obtain global model parameters.
6. The method according to claim 5, characterized in that, The step of sending the global model parameters to the user terminal includes: The global model parameters are sent to the target user terminal corresponding to the target compression model.
7. A user terminal, characterized in that, include: The first acquisition module is used to acquire at least one local model corresponding to the target application; The determination module is used to determine the compression strategy of the local model based on the performance metrics of the target application; A compression module is used to compress the local model based on the compression strategy to obtain a local compressed model and a compression identifier corresponding to the local compressed model. The compression identifier is used to indicate the compression strategy corresponding to the local compression model; The first transmission module is used to upload the local compressed model and compression identifier to the server for model aggregation processing and to obtain the global model parameters issued by the server. An update module is used to update the local model based on the global model parameters; The determining module, when determining the compression strategy of the local model based on the performance metrics of the target application, is specifically used for: Obtain multiple preset compression strategies; based on the performance metrics of the target application, select a compression strategy that matches the target application from the preset compression strategies as the compression strategy for the local model; the compression strategies include FetchSGD, QSGD, and dynamic pruning. The server is used to aggregate the local compression model based on the compression identifier to obtain global model parameters, specifically including at least one of the following: If the compression strategy corresponding to the local compression model is determined to be FetchSGD based on the compression identifier, the momentum and the error of the current round are calculated according to the local compression models of all users. The compression model is reconstructed based on the error and the bias is calculated. The weights are updated based on the bias until the preset maximum number of iterations is reached, and the global model parameters are obtained. If the compression strategy corresponding to the local compression model is determined to be QSGD based on the compression identifier, the global model is updated according to the local compression models uploaded by all users, and the process is repeated T times. The average value of the global model in the T iterations is then determined as the global model parameter. If the compression strategy corresponding to the local compression model is determined to be dynamic pruning based on the compression identifier, the weights of each layer in the local compression model are updated according to the connection state matrix and the gradient of the loss function until the maximum number of iterations is reached, and the global model parameters are obtained.
8. A server, characterized in that, include: The second acquisition module is used to acquire the local compression model sent by the user terminal and the compression identifier corresponding to the local compression model; The compression identifier is used to indicate the compression strategy corresponding to the local compression model. The compression strategy includes FetchSGD, QSGD and dynamic pruning. The user terminal obtains at least one local model corresponding to the target application. Based on the performance index of the target application, it selects a compression strategy that matches the target application from the preset compression strategies as the compression strategy of the local model. The aggregation module is used to aggregate the local compression model based on the compression identifier to obtain global model parameters; The second transmission module is used to send the global model parameters to the user terminal; The aggregation module, when used to aggregate the local compression model based on the compression identifier to obtain global model parameters, is specifically used for: If the compression strategy corresponding to the local compression model is determined to be FetchSGD based on the compression identifier, the momentum and the error of the current round are calculated according to the local compression models of all users. The compression model is reconstructed based on the error and the bias is calculated. The weights are updated based on the bias until the preset maximum number of iterations is reached, and the global model parameters are obtained. If the compression strategy corresponding to the local compression model is determined to be QSGD based on the compression identifier, the global model is updated according to the local compression models uploaded by all users, and the process is repeated T times. The average value of the global model in the T iterations is then determined as the global model parameter. If the compression strategy corresponding to the local compression model is determined to be dynamic pruning based on the compression identifier, the weights of each layer in the local compression model are updated according to the connection state matrix and the gradient of the loss function until the maximum number of iterations is reached, and the global model parameters are obtained.
9. A federated learning model compression system, characterized in that, It includes at least one user terminal as described in claim 7 and at least one server as described in claim 8; the user terminal and the server are interconnected.