Sparse neural network based federated meta-learning image classification method

By introducing sparse neural networks and meta-learning algorithms into federated learning, the privacy protection and communication efficiency issues of training models on edge devices are solved, and efficient image classification tasks are achieved on edge devices.

CN115359298BActive Publication Date: 2026-06-12NANJING UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING UNIV OF SCI & TECH
Filing Date
2022-08-24
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Traditional centralized machine learning carries the risk of private information leakage, while federated learning faces reduced accuracy and communication resource limitations when training models on edge devices. Existing methods are insufficient in terms of computational and communication efficiency.

Method used

The federated meta-learning method using sparse neural networks generates sparse neural networks by introducing sparsification into fully connected neural networks, and performs gradient updates and weighted aggregation on edge devices to reduce the number of communication parameters. Combined with meta-learning algorithms, it can quickly adapt to new tasks.

Benefits of technology

It achieves improved training accuracy and communication efficiency of models on edge devices while protecting user privacy, and reduces system overhead, making it suitable for non-iID and highly personalized edge device data environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115359298B_ABST
    Figure CN115359298B_ABST
Patent Text Reader

Abstract

The application discloses a kind of federated meta-learning image classification methods based on sparse neural network, communication efficiency is high, algorithm performance is good.The method of the present application mainly includes the following steps: (10) set up fully connected neural network node and correlation coefficient, establish source node set and target node set;(20) the full connection neural network is sparsified, and the weight parameters of sparse neural network are initialized;(30) training internal update is carried out on the training set data of each source node;External update is carried out on the test set data of each source node;(40) remove the minimum positive number and the maximum negative number of each layer of weight matrix after external update of each source node, whether iteration number t is the integer multiple of the set local iteration number is judged to make corresponding processing;(50) gradient update training is carried out on the training set data in each target node based on the parameters obtained by training source node set, and image classification task is carried out using the fine-tuned model of target node.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of edge intelligent computing, specifically a federated meta-learning image classification method based on sparse neural networks. Background Technology

[0002] Over the past few decades, smartphone usage has increased dramatically. Compared to classic PCs, smartphones are more portable and easier for users to accept. Using smartphones has become an integral part of modern daily life, and the billions of data points transferred between smartphones have provided tremendous support for training machine learning models. However, traditional centralized machine learning requires local clients (such as smartphone users) to upload data directly to a central server for model training, which can lead to serious leaks of private information.

[0003] A recent emerging technique called federated learning allows a central server to train a robust global model while maintaining training data to be distributed across client devices. Instead of sending data directly to the central server, each local client downloads the current global model from the server, updates the shared model with its local data, and then uploads the updated global model back to the server. By avoiding the sharing of local private data, user privacy can be effectively protected in federated learning, while statistical and systemic challenges become significant issues in algorithm design. For statistical challenges, the highly personalized and heterogeneous nature of discretized data across different devices can significantly reduce the accuracy of the trained model. For systemic challenges, the number of devices is typically orders of magnitude larger than in traditional distributed setups; furthermore, each edge device may have significant limitations in storage, computation, and communication capabilities.

[0004] Initialization-based meta-learning algorithms, such as MAML, are known for their rapid adaptation to new tasks and good generalization, making them particularly suitable for decentralized federated settings where edge device data is non-iID and highly personalized. The fundamental principle behind meta-learning is to train the model's initial parameters on multiple tasks, so that using only a small amount of data corresponding to a new task, the pre-trained model quickly adapts and achieves maximum performance on the new task. Inspired by this, a federated meta-learning method is proposed, in which all source edge nodes collaboratively learn a global model initialization to achieve maximum performance when the target edge node updates its model parameters using only a small number of data samples, thus enabling real-time edge intelligence.

[0005] Federated learning requires significant communication resources. To address the limitations of edge device communication capabilities in federated learning, the FedAvg algorithm proposed by McMahan et al. can reduce communication rounds by decreasing the local training batch size or increasing the number of local training iterations, thereby improving communication efficiency. Another method to reduce communication costs is to reduce the complexity of the neural network model by minimizing the uploaded parameters. Early ideas in evolutionary artificial neural networks proposed systematic neural network encoding methods; however, most of these are direct encoding methods, which are not easily extended to deep neural networks with a large number of layers and connections. To address this issue, Neural Evolution with Enhanced Topology (NEAT) and undirected graph encoding have proposed methods to enhance the encoding flexibility of neural networks. Although they can significantly improve encoding efficiency, both NEAT and undirected graph methods consume too many computational resources. Therefore, we propose pursuing topological sparsity from the design stage of artificial neural networks, which will lead to a significant reduction in connections, thereby improving memory and computational efficiency. We further find that in artificial neural networks, sparsely connected layers with the Erdos Rènyi topology can replace fully connected neural network layers without reducing accuracy, reducing the search space for optimizing deep neural networks with a large number of connections. Summary of the Invention

[0006] The purpose of this invention is to provide a federated meta-learning image classification method based on sparse neural networks, which has good algorithm performance, high efficiency, and can achieve fast real-time edge intelligence.

[0007] The technical solution to achieve the purpose of this invention is: a federated meta-learning image classification method based on sparse neural networks, comprising the following steps:

[0008] (10) Set the nodes and correlation coefficients of the fully connected neural network to establish the source node set and target node set for the image classification task;

[0009] (20) Sparsify the fully connected neural network to generate a sparse neural network;

[0010] (30) Initialize the sparse neural network weight parameters and send them to all source nodes as the initial parameters for each source node;

[0011] (40) Based on the initial parameters, train on the training set data of each source node and perform internal updates with one step gradient descent;

[0012] (50) Based on the parameters obtained after the internal update of each source node, train on the test set data of each source node and perform external update with one step gradient descent;

[0013] (60) Remove the smallest positive and largest negative numbers from each layer of the weight matrix of the parameters after external update, and take appropriate action by checking whether the iteration number t is an integer multiple of the set local iteration number:

[0014] If the number of iterations t is not an integer multiple of the number of local iterations, then the parameters obtained after external update of each source node i are used as the initial parameters for internal update of each source node in the next round of iteration. Determine whether the number of iterations has reached the set total number of iterations. If it has, proceed to step (70); otherwise, return to step (40).

[0015] If the number of iterations t is an integer multiple of the number of local iterations, then add an equal number of random weight parameters to the parameters of the sparse connection of each source node i, and add the same number of random weight parameters as the number of connections removed. Then, perform weighted aggregation on the sparse neural network parameters obtained after external updates of each source node i, and use the weighted aggregation global parameters as the initial parameters for internal updates of each source node i in the next iteration. Determine whether the number of iterations has reached the set total number of iterations. If it has, proceed to step (70); otherwise, return to step (40).

[0016] (70) Use the parameters obtained after T iterations of the source node as the initial parameters of each target node, and perform gradient update training with the training set data in each target node to obtain the fine-tuned model parameters of the target node.

[0017] (80) Use the model after fine-tuning the target node for image classification.

[0018] Preferably, the set of fully connected neural network nodes and correlation coefficients specifically includes: the source node set S, the target node set G, the total number of iterations T, the number of local iterations T0, the internal update learning rate α, the external update learning rate β, the data proportion p of the training set for each node, and the sparse neural network parameters ε.

[0019] Preferably, in step (20), the probability of connection between two adjacent neurons in the sparse neural network is:

[0020]

[0021] In the formula, ε is the sparse parameter, ε < <n k , ε< <n k-1 n k and n k-1 It represents the number of neurons in the k-th and k-1-th layers.

[0022] Preferably, the internal update formula is as follows:

[0023]

[0024] In the formula, α is the learning rate for internal updates. This represents the initial model parameters for each source node i during its t-th internal update. For training set data, It is the gradient value of the expected loss function for each source node i. Let t be the parameters after the t-th internal update of each source node i, where t = 1, 2, ..., T is the iteration number.

[0025] Preferably, the expected loss function of a node is as follows:

[0026]

[0027] Where D represents the local image dataset {(x 1 ,y 1 ),...,(x j ,y j ),...,(x D ,y D )},|D i | represents the dataset size, l(θ,(x) j ,y j )) represents the loss function, (x j ,y j )∈D represents the j-th image data sampling point in the local image dataset D of the node, x j It is the matrix after image grayscale processing, y j θ represents the image category, and θ represents the modeling parameters.

[0028] Preferably, the external update formula is as follows:

[0029]

[0030] In the formula, Let β represent the parameters of source node i before the t-th external update, and β be the external update learning rate. The source node i test set data The gradient value of the expected loss function, These are the parameters obtained after the t-th external update.

[0031] Preferably, the specific method for weighted aggregation of the sparse neural network parameters obtained after external updates for each node is as follows:

[0032]

[0033] Where S represents the set of all source nodes i, |D i | indicates the amount of data in the local dataset of source node i. These are the parameters obtained after the t-th external update.

[0034] Preferably, the model parameters φ after fine-tuning the target node t t Specifically:

[0035]

[0036] In the formula, α is the internal update learning rate. The target node t is the training set data. The gradient value of the expected loss function, where θ is the externally updated parameter obtained after T iterations of the source node set.

[0037] Compared with the prior art, the significant advantages of this invention are:

[0038] 1. In this invention, each local client only transmits model parameters to the server, instead of sending data directly to the central server. By avoiding the sharing of local private data, user privacy can be effectively protected in federated learning.

[0039] 2. The meta-learning method used in this invention is particularly suitable for decentralized federated settings where edge device data is non-iID and highly personalized. In target node image recognition tasks, only a small amount of data is needed. After fine-tuning, the pre-trained model can achieve good performance on the target node.

[0040] 3. The topology of the sparse neural network in this invention reduces the search space and lowers communication costs and system overhead when optimizing deep neural networks with a large number of connections.

[0041] The present invention will now be described in further detail with reference to the accompanying drawings. Attached Figure Description

[0042] Figure 1 This is the main flowchart of the federated meta-learning image classification method based on sparse neural networks of the present invention.

[0043] Figure 2 yes Figure 1 A flowchart illustrating the specific process of federated meta-learning training for the source nodes.

[0044] Figure 3 yes Figure 1 The flowchart shows how the target node updates the training parameters received from the source node to obtain the final model parameters.

[0045] Figure 4 This is a comparison chart showing the test loss of the target node for an image classification task after a small number of iterations, after the parameters obtained from training at the source node are passed to the target node, in a federated learning model (FedAvg) and a federated meta-learning model (FedMeta) based on sparse neural networks.

[0046] Figure 5This is a comparison chart showing the system overhead of image classification using federated learning and image classification using sparse neural networks. Detailed Implementation

[0047] This invention presents a federated meta-learning image classification method based on sparse neural networks, implemented in the following scenarios:

[0048] An edge computing scenario model is established. An image classification task dataset is selected, and data is distributed to different nodes to simulate edge devices carrying data. The edge nodes are divided into a set of disjoint source nodes S and a set of target nodes G. The number of source nodes is greater than the number of target nodes, and the data of each node is divided into training set and test set.

[0049] like Figure 1 As shown, a federated meta-learning image classification method based on sparse neural networks includes the following steps:

[0050] (10) Set the nodes and correlation coefficients of the fully connected neural network, set the source node set S and the target node set G for the image classification task, set the total number of iterations T, the number of local iterations T0, the internal update learning rate α, the external update learning rate β, the data proportion of the training set for each node p, and the sparse neural network parameters ε.

[0051] (20) Sparsify the fully connected neural network to generate a neural network with an Erdos Rènyi topology and a sparse parameter of ε, wherein the probability of connection between two adjacent neurons is:

[0052]

[0053] The number of connections n of neurons in a sparse layer W for

[0054]

[0055] in, It represents a Random graph W k The connection between any two neurons i and j in adjacent layers k and k-1, where ε is a real number controlling the sparsity of the connection, ε < 0. <n k , ε< <n k-1 n k and n k-1 n represents the number of neurons in the k-th and (k-1)-th layers. W It is the total number of connections between two layers after sparsification, relative to n. k n k-1 The number of fully connected connections is reduced, while the number of connections in a neural network is significantly reduced after sparsification.

[0056] (30) Initialize the sparse neural network weight parameters and send them to all source nodes as the initial parameters for each source node i.

[0057] (40) Each source node i receives initial parameters Based on the initial parameters, the training set data for each source node i Training is performed on the above, with internal updates using a single step of gradient descent. The specific update formula is as follows:

[0058]

[0059] In the formula, α is the learning rate for internal updates. This represents the initial model parameters for each source node i during its t-th internal update. For training set data, It is the gradient value of the expected loss function for each source node i, and the parameters of each source node i after the t-th internal update are: T represents the number of iterations.

[0060] The expected loss function of a node is specifically as follows:

[0061]

[0062] D represents the local image dataset of the node {(x 1 ,y 1 ),...,(x j ,y j ),...,(x D ,y D )}, where D| represents the dataset size, l(θ,(x) j ,y j )) represents the loss function, (x j ,y j )∈D represents the j-th image data sampling point in the local image dataset D of the node, x j It is the matrix after image grayscale processing, y j θ represents the image category, and θ represents the modeling parameters.

[0063] (50) Parameters obtained based on the t-th internal update of each source node i In its test set data Training is performed on the above, and external updates are performed using gradient descent in one step. The specific update formula is as follows:

[0064]

[0065] In the formula, Let β represent the parameters of each source node i before the t-th external update, and β be the external update learning rate. It is the test set data for each source node i. The gradient value of the expected loss function, These are the parameters obtained after the t-th external update.

[0066] (60) Remove The weight matrix contains the smallest positive and largest negative values ​​for each layer. Further processing is performed by checking if the iteration number t is an integer multiple of T0.

[0067] If the number of iterations t is not an integer multiple of T0, then update the external path of each source node i. As the initial parameter for the next iteration's internal update, determine whether the number of iterations has reached the set total number of iterations. If it has, proceed to step (70); otherwise, return to step (40).

[0068]

[0069] If the iteration number t is an integer multiple of T0, then randomly add an equal number of random weight parameters to the parameters of the sparse connections of each source node, as well as the number of connections removed. Then update the parameters of the sparse neural network obtained after externalizing each source node. The data is transmitted to a central server for weighted aggregation.

[0070]

[0071] Where S represents the set of all source nodes i, |D i | indicates the amount of data in the local dataset of each source node i.

[0072] Then, the weighted aggregated global parameters are used as the initial parameters for the next iteration update of each source node i. It is determined whether the number of iterations has reached the set total number of iterations. If it has, step (70) is performed; otherwise, step (40) is returned.

[0073]

[0074] (70) Use the parameters θ obtained after T iterations of the source node as the initial parameters of each target node t, and use the training set data in each target node t. Perform gradient update training to obtain the model parameters φ after fine-tuning the target node t. t .

[0075]

[0076] In the formula, α is the internal update learning rate. The target node t is the training set data. The gradient value of the expected loss function.

[0077] (80)φ t That is, the model parameters for the image classification task of each target node are used to perform image classification.

[0078] Example:

[0079] The MNIST image dataset was chosen as the simulation dataset. Two methods were used to split the MNIST dataset: one was IID (Individually Identified Distributed), where data was randomly distributed among 100 clients, with each client receiving 600 samples; the other was non-IID, where the entire MNIST dataset was sorted according to its labeled classes, then uniformly divided into 200 segments, and two segments were randomly assigned to each client. In this simulation setup, the non-IID method was used to maximize the performance of the meta-learning method.

[0080] In the FedMeta experiment of the method of this invention, for each node, the local dataset is divided into a training set and a test set, with a sparsity parameter ε = 20, a total of 500 training iterations T = 500, and 10 local training iterations T0 = 10. 80% of the nodes are selected as source nodes, and the fast adaptive performance is evaluated on the remaining target nodes. The internal update learning rate α and the external update meta-learning rate β are both set to 0.01. The proportion of training set data for each node is changed to 80%, 50%, and 5% respectively, and image classification experiments are simulated based on all the above parameters.

[0081] Using the above data settings, a comparison was conducted through FedAvg experiments. After FedAvg performs federated learning training using all data from the source node, the final parameters are updated on the training set of the target node and then tested and evaluated for loss on its test set data.

[0082] The system budget is described by the number of floating-point computations per second across all nodes, as well as the total number of bytes uploaded to and downloaded from the server, to quantify the communication overhead of FedAvg and FedMeta for image classification.

[0083] The final experimental comparison results are as follows Figure 4 , Figure 5 As shown.

[0084] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any changes or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention.

[0085] It should be understood that, in order to simplify the present invention and help those skilled in the art understand its various aspects, in the above description of exemplary embodiments of the present invention, various features of the present invention are sometimes described in a single embodiment or with reference to a single figure. However, the present invention should not be construed as including all features in the exemplary embodiments as essential technical features of the claims of this patent.

[0086] It should be understood that the modules, units, components, etc., included in the device of one embodiment of the present invention can be adaptively changed to be placed in a device different from that embodiment. Different modules, units, or components included in the device of the embodiment can be combined into a single module, unit, or component, or they can be divided into multiple sub-modules, sub-units, or sub-components.

Claims

1. A federated meta-learning image classification method based on sparse neural networks, characterized in that, Includes the following steps: (10) Set the nodes and correlation coefficients of the fully connected neural network to establish the source node set and target node set for the image classification task; (20) Sparsify the fully connected neural network to generate a sparse neural network; (30) Initialize the sparse neural network weight parameters and send them to all source nodes as the initial parameters for each source node; (40) Based on the initial parameters, train on the training set data of each source node and perform internal updates using gradient descent in one step; (50) Based on the parameters obtained after the internal update of each source node, train on its test set data and perform external update with one step gradient descent; (60) Remove the smallest positive number and the largest negative number in each layer of the weight matrix of the externally updated parameters of each source node i, and take appropriate action by judging whether the iteration number t is an integer multiple of the set local iteration number: If the number of iterations t is not an integer multiple of the number of local iterations, then the parameters obtained after the external update of each source node i are used as the initial parameters for the internal update of each source node i in the next round of iteration. Determine whether the number of iterations has reached the set total number of iterations. If it has, proceed to step (70); otherwise, return to step (40). If the number of iterations t is an integer multiple of the number of local iterations, then add an equal number of random weight parameters to the parameters of the sparse connection of each source node i, and then perform weighted aggregation on the sparse neural network parameters obtained after external update of each source node i. The weighted aggregation global parameters are used as the initial parameters for internal update of each source node i in the next iteration. Determine whether the number of iterations has reached the set total number of iterations. If it has, proceed to step (70) and return to step (40). (70) Use the parameters obtained after T iterations of the source node as the initial parameters of each target node, and perform gradient update training with the training set data in each target node to obtain the fine-tuned model parameters of the target node. (80) Use the model after fine-tuning the target node for image classification.

2. The federated meta-learning image classification method based on sparse neural networks according to claim 1, characterized in that, The fully connected neural network nodes and correlation coefficients set specifically include: the source node set S, the target node set G, the total number of iterations T, the number of local iterations T0, the internal update learning rate α, the external update learning rate β, the proportion of training set data p for each node, and the sparse neural network parameters ε.

3. The federated meta-learning image classification method based on sparse neural networks according to claim 1, characterized in that, The probability of a connection between two adjacent neurons in the sparse neural network in step (20) is: In the formula, ε is the sparsity parameter, ε << , ε<< , and It represents the number of neurons in the k-th and (k-1)-th layers. This represents an Erdös–Rényi random graph. The connection between any two neurons i and j in adjacent layers k and k-1.

4. The federated meta-learning image classification method based on sparse neural networks according to claim 1, characterized in that, The internal update formula is as follows: In the formula, It is an internally updated learning rate. This represents the initial model parameters for each source node i during its t-th internal update. For training set data, It is the gradient value of the expected loss function of source node i. Let be the parameters of node i after the t-th internal update, where t = 1, 2, ..., T is the iteration number.

5. The federated meta-learning image classification method based on sparse neural networks according to claim 4, characterized in that, The expected loss function of a node is as follows: in, Local image dataset representing nodes , Indicates the size of the dataset. Represents the loss function. Local image dataset representing nodes The j-th image data sampling point, It is the matrix after image grayscale processing. It is an image category. Represents the modeling parameters.

6. The federated meta-learning image classification method based on sparse neural networks according to claim 1, characterized in that, The external update formula is as follows: In the formula, This represents the parameters before the t-th external update. It is an external update of the learning rate. It is the test set data for each source node i. The gradient value of the expected loss function, These are the parameters obtained after the t-th external update.

7. The federated meta-learning image classification method based on sparse neural networks according to claim 1, characterized in that, The specific method for weighted aggregation of the sparse neural network parameters obtained after external updates to each source node i is as follows: , Where S represents the set of all source nodes i, This represents the amount of data in the local dataset of each source node i. These are the parameters obtained after the t-th external update.

8. The federated meta-learning image classification method based on sparse neural networks according to claim 1, characterized in that, Model parameters after fine-tuning at target node t Specifically: In the formula, It is an internal update of the learning rate. The target node t is the training set data. The gradient value of the expected loss function, The externally updated parameters are obtained after performing T iterations on the set of source nodes.