Federative learning methods and systems that can reduce domain skew between clients
The federated learning method addresses domain skew by generating enhanced prototypes through weight adjustment and contrastive learning, enhancing model generalization and convergence in heterogeneous data environments.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- UNIVERSITY INDUSTRY COOPERATION GROUP OF KYUNG HEE UNIVERSITY
- Filing Date
- 2025-12-12
- Publication Date
- 2026-07-02
Smart Images

Figure 2026110542000001_ABST
Abstract
Description
[Technical Field]
[0001] An embodiment of the present invention relates to federative learning that can reduce domain skew between clients. [Background technology]
[0002] Federated Learning (FL) is a widely studied distributed machine learning framework that allows multiple clients to collaboratively learn a model while maintaining data privacy. However, federated learning faces a significant problem: data heterogeneity. That is, because the data distributions among clients are non-independent and non-identical, learning performance becomes unstable, negatively impacting the convergence of the global model.
[0003] Recent research on associative learning attempts to improve the efficiency of local learning by utilizing regularization techniques and new aggregation methods. However, conventional research has focused solely on the label shift problem, under the assumption that client data is collected from the same domain. In real-world environments, however, data is often collected from different domains. For example, a photograph and a sketch of a cat share the same label, "cat," but they belong to different domains, resulting in heterogeneity in the feature distributions between clients. Such domain skew makes each client's local model domain-specific, ultimately reducing the generalization performance of the global model. [Prior art documents] [Patent Documents]
[0004] (Patent Document 01) Korean Published Patent No. 2023-0114530 (2023.08.01) [Overview of the Initiative] [Problems that the invention aims to solve]
[0005] The disclosed embodiments are intended to provide a federated learning method and system that can reduce domain skew between clients. [Means for solving the problem]
[0006] A federated learning method according to one embodiment disclosed is a method performed on a computing device which is a server and a client for federated learning, comprising one or more processors and memory for storing one or more programs executed by the one or more processors, and comprising the steps of: inputting local data into a feature extractor to extract feature vectors; inputting the feature vectors into a classifier to classify the local data into classes; receiving generalized prototypes of each class from the server; and training a local model including the feature extractor and the classifier using a first loss function based on the difference between the output value of the classifier and a preset ground truth value, and a second loss function for comparative learning between the feature vectors and the generalized prototypes of each class.
[0007] The generalized prototype is generated by readjusting the weight of the initial global prototype based on the distance between the local prototype and the initial global prototype for each class, and the initial global prototype may be generated by aggregating the class-specific local prototypes for each client.
[0008] The generalized prototype may be generated by assigning a greater weight to the initial global prototype the greater the distance between the local prototype and the initial global prototype, and a smaller weight to the initial global prototype the smaller the distance between the local prototype and the initial global prototype.
[0009] The second loss function may be configured such that the feature vector approaches generalized prototypes that have the same class as the feature vector, and moves away from generalized prototypes that have a different class from the feature vector.
[0010] The aforementioned associative learning method may further include the step of generating augmented prototypes for each class based on the feature vectors.
[0011] The step of generating the augmented prototype may include the step of generating augmented features by augmenting the feature vector, and the step of calculating the average of the augmented features belonging to each class and generating an augmented prototype for that class.
[0012] The associated learning method may further include the step of learning the local model by a third loss function which is the difference between the feature vector and the augmentation prototype for each class.
[0013] Other embodiments of the disclosed federative learning method are methods performed on a computing device which is a server that performs federative learning with a plurality of clients, and which has one or more processors and memory for storing one or more programs executed by the one or more processors, and which includes the steps of: receiving local prototypes for each class from each client; generating initial global prototypes for each class based on the class-specific local prototypes received from each client; readjusting the weights on the initial global prototypes to generate generalized prototypes for each class; and transmitting the generalized prototypes for each class to each client.
[0014] The step of generating the generalized prototype may involve readjusting the weight of the initial global prototype based on the distance between the local prototype and the initial global prototype of each class to generate the generalized prototype.
[0015] The step of generating the generalized prototype may assign a larger weight to the initial global prototype as the distance between the local prototype and the initial global prototype is farther, and assign a smaller weight to the initial global prototype as the distance between the local prototype and the initial global prototype is closer.
[0016] An federated learning system according to an embodiment is an federated learning system including a plurality of clients and a server, wherein each client inputs local data into a feature extractor to extract a feature vector, inputs the feature vector into a classifier to classify the local data into classes, and uses a first loss function based on the difference between the output value of the classifier and a preset correct value, and a second loss function for contrast learning between the feature vector and the generalized prototype of each class received from the server to train a local model including the feature extractor and the classifier.
[0017] Each client generates a local prototype for each class based on the feature vector and transmits it to the server. The server generates an initial global prototype for each class based on the local prototypes of each class received from each client, readjusts the weight for the initial global prototype to generate a generalized prototype for each class, and may transmit the generalized prototype of each class to each client.
[0018] The server may assign a larger weight to the initial global prototype as the distance between the local prototype and the initial global prototype is farther, and assign a smaller weight to the initial global prototype as the distance between the local prototype and the initial global prototype is closer to generate the generalized prototype of each class.
[0019] Each of the clients may enhance the feature vector to generate an enhanced feature, and calculate the average of the enhanced features belonging to each class to generate an enhanced prototype for the corresponding class.
[0020] Each of the clients may further train the local model with a third loss function based on the difference between the feature vector and the enhanced prototype for each class.
Advantages of the Invention
[0021] According to the disclosed embodiments, by generating an enhanced prototype, which is an intra-domain prototype at the client side, it becomes possible to extract richer semantic information from the feature vectors of local personal data, and the generalization ability can be enhanced in the subsequent prototype aggregation process.
[0022] Also, by providing each client with a generalized prototype for each class, it becomes possible to perform local learning by leveraging inter-domain knowledge, thereby alleviating the domain skew problem.
Brief Description of the Drawings
[0023] [Figure 1] It is a diagram showing the configuration of a federated learning system according to an embodiment of the present invention. [Figure 2] It is a diagram for explaining the operation of a federated learning system according to an embodiment of the present invention. [Figure 3] It is a diagram showing a state where prototype reweighting is applied to an initial global prototype to generate a generalized prototype in an embodiment of the present invention. [Figure 4] It is a flowchart showing a federated learning method according to an embodiment of the present invention. [Figure 5] It is a flowchart showing a federated learning method according to another embodiment of the present invention. [Figure 6]This is a block diagram illustrating and illustrating a computing environment, including computing devices, that are suitable for use in an exemplary embodiment. [Modes for carrying out the invention]
[0024] Specific embodiments of the present invention will be described below with reference to the drawings. The following detailed description is provided to aid in a comprehensive understanding of the methods, apparatus, and / or systems described herein. However, this is illustrative and the invention is not limited thereto.
[0025] In describing embodiments of the present invention, detailed descriptions of prior art related to the present invention will be omitted if it is deemed that such detailed descriptions would obscure the gist of the invention. Furthermore, terms used later are defined in consideration of the function in the present invention, and these may vary depending on the intent or convention of the user or operator. Therefore, their definitions should be based on the content throughout this specification. Terms used in the detailed description are solely for the purpose of describing embodiments of the present invention and should not be restrictive. Unless used to clearly distinguish them, singular expressions include the meaning of plural forms. In this specification, expressions such as "includes" or "equip" refer to certain characteristics, numbers, steps, actions, elements, some or a combination thereof, and should not be construed as excluding the existence or possibility of one or more other characteristics, numbers, steps, actions, elements, some or a combination thereof other than those described.
[0026] Figure 1 shows the configuration of an associative learning system according to one embodiment of the present invention, and Figure 2 is a diagram illustrating the operation of the associative learning system according to one embodiment of the present invention.
[0027] Referring to Figures 1 and 2, the federative learning system 100 may include multiple clients 102 and a server 104. The federative learning system 100 can perform federative learning in the presence of domain shifts.
[0028] Multiple clients 102 are connected to a server 140 via a communication network 105, where the communication network 105 may include the internet, one or more local area networks, wide area networks, cellular networks, mobile networks, other types of networks, or a combination of these networks.
[0029] Each client 102 has unique personal data (i.e., local data)
number
number
number
[0030] Each client 102 may have a local model with the same structure. Each local model may include a feature extractor 111 and a classifier 113. The feature extractor 111 is a sample
number
[0031] Each client 102 can receive class-specific generalized prototypes from server 104. The purpose of these generalized prototypes is to provide unbiased inter-domain knowledge during local learning. The details of how server 104 generates generalized prototypes will be described later. Each client 102 can perform comparative learning between feature vectors (i.e., local features) and generalized prototypes.
[0032] In other words, each client 102 can learn to approach generalized prototypes whose feature vectors have the same semantic class as the feature vector in question, and to move away from generalized prototypes whose feature vectors have a different semantic class. Here, the loss function for contrast learning is (L GPCL GPCL (Generalized Prototype Contrastive Learning) can be expressed by equation 1.
[0033]
number
[0034] The contrastive loss according to Equation 1 can induce clients 102 with personal data in different domains to effectively acquire inter-domain knowledge from the generalized prototype. Thereby, it can improve the generalization performance of the local model and mitigate the uncertain influence of domain skew on global learning.
[0035] Also, each client 102 can perform an enhancement technique on the feature vector extracted by the feature extractor 111 to generate an augmented feature. In one embodiment, each client 102 can perform MixUp-based enhancement on the feature vector to obtain an augmented feature, but the enhancement technique is not limited thereto.
[0036] Here, each client 102 performs enhancement not in the input step of personal data, but in the feature vector step, that is, in the embedded feature space, so as to generate an augmented feature with richer semantic expressions and less domain specialization bias. In one embodiment, each client 102 can synthesize the feature vector of a sample belonging to a class different from the corresponding feature vector with the corresponding feature vector by linear interpolation to generate an augmented feature. Augmented feature
Number
[0037]
Number
[0038] Note: There seems to be a duplicate ID of 35 which is likely a mistake. I've translated it as if it were a unique ID. If this is incorrect, please clarify.Each client 102 can generate an augmentation prototype for each class based on its augmentation features. In one embodiment, each client 102 can generate an augmentation prototype for a class by calculating the average of the augmentation features belonging to that class. In this case, the augmentation prototype for each class can be calculated by the following formula 3.
[0039]
number
number
number
[0040] Each client 102 can learn a local model using a loss function based on the difference between the feature vector and the augmentation prototype. In this case, the loss function (L APA (APA: Augmented Prototype Alignment) can be expressed by the following formula 4.
[0041]
number
number
[0042] In other words, the loss function (L) in equation 4 APA According to this method, it is possible to induce alignment between the feature vector and the augmentation prototype of the relevant class in each class, thereby allowing the local model to be learned in a domain-independent form and improving its generalization ability.
[0043] In this way, by generating enhanced prototypes of domain-specific prototypes at the client end, the diversity of local personal data can be increased, and the overfitting problem that arises when each client holds personal data limited to a specific domain can be mitigated.
[0044] Here, each client 102 can be trained using an overall loss function as shown in equation 5 below.
[0045] L=L CE +L APA +L GPCL L CE : Cross-entropy loss function that minimizes the difference between the classifier's output value and the correct answer. L APA Loss function based on the difference between the feature vector and the augmented prototype. L GPCL Loss function for contrast learning between feature vectors and generalized prototypes
[0046] Furthermore, each client 102 can generate a local prototype for each class based on its feature vectors. In one embodiment, each client 102 can generate a local prototype for a class by calculating the average of the feature vectors belonging to that class. In this case, the local prototype for each class can be calculated using the following formula 6.
[0047]
number
number
number
[0048] Each client 102 can transmit a local prototype for each class to the server 104. The local prototype for each class can then be used by the server 104 to generate a generalized prototype. The process of generating a generalized prototype in the server 104 is described in detail below.
[0049] Server 104 can receive and aggregate local prototypes for each class from each client 102. Server 104 can generate an initial global prototype for each class based on the local prototypes aggregated from each client 102. In one embodiment, Server 104 can generate an initial global prototype for each class by averaging the local prototypes aggregated from each client 102. In this case, the initial global prototype for each class can be represented by the following formula 7.
[0050]
number
[0051] On the other hand, in a domain-shifted environment, the initial global prototype may be distorted due to a sample distribution biased towards a particular domain. Therefore, in the disclosed examples, a generalized prototype can be generated by readjusting the weights based on the distance between the local prototype and the initial global prototype for each class.
[0052] In other words, server 104 can generate a generalized prototype through prototype reweighting. Figure 3 shows a state in one embodiment of the present invention where a generalized prototype is generated by applying prototype reweighting to the initial global prototype.
[0053] Server 104 can generate generalized prototypes by assigning larger weights to the distance between the local prototype and the initial global prototype as it is greater, and smaller weights to the distance between the local prototype and the initial global prototype as it is smaller. Server 104 can generate generalized prototypes for each class using the following formula 8.
[0054]
number
number
number
[0055] Server 104 can transmit a generalized prototype G of each class to each client 102. Here,
number
[0056] Furthermore, server 104 may update the generalized prototype for each class in each round. In one embodiment, server 104 may apply the update using an exponential moving average (EMA). After round t, the generalized prototype may be updated by the following equation 9.
[0057]
number
[0058] According to the disclosed embodiments, generating augmented prototypes, which are in-domain prototypes, at the client end allows for the extraction of richer semantic information from feature vectors of local in-person data, thereby enhancing generalization capabilities in subsequent prototype aggregation processes.
[0059] Furthermore, by providing each client with a generalized prototype for each class, it becomes possible to leverage interdomain knowledge for local learning, thereby mitigating the domain skew problem.
[0060] Figure 4 is a flowchart of an associative learning method according to one embodiment of the present invention. In the illustrated flowchart, the method is described in several steps, but at least some of the steps may be performed in a different order, combined with other steps, omitted, divided into detailed steps, or with the addition of one or more steps not shown.
[0061] Referring to Figure 4, client 102 can input local data into feature extractor 111 to extract feature vectors S101.
[0062] Next, client 102 inputs the feature vector to classifier 113, and trains a local model using a first loss function that minimizes the difference between the output value and the correct value (S103).
[0063] Next, client 120 receives generalized prototypes for each class from server 104 and can train a local model using a second loss function for contrast learning between the feature vectors and the generalized prototypes S105.
[0064] Next, client 102 can perform augmentation techniques on the feature vectors to generate augmented features, and then generate augmented prototypes for each class based on the generated augmented features S107.
[0065] Next, client 102 can learn a local model using a third loss function based on the difference between the feature vector and the augmented prototype S109.
[0066] Next, client 102 generates local prototypes for each class based on the feature vectors S111, and can transmit the local prototypes for each class to server 104 S113.
[0067] Figure 5 is a flowchart of an associative learning method according to another embodiment of the present invention. In the illustrated flowchart, the method is described in several steps, but at least some of the steps may be performed in a different order, combined with other steps, omitted, divided into detailed steps, or with the addition of one or more steps not shown.
[0068] Referring to Figure 5, server 104 can receive local prototypes for each class from each client 102 S201.
[0069] Next, server 104 may generate an initial global prototype for each class based on the local prototype for each class aggregated from each client 102 S203.
[0070] Next, server 104 can readjust the weights based on the distance between the local prototype and the initial global prototype for each class to generate generalized class-specific prototypes S205.
[0071] Next, server 104 can transmit a generalized prototype of each class to each client 102 S207.
[0072] Figure 6 is a block diagram illustrating a computing environment 10, including computing equipment adapted for use in an exemplary embodiment. In the illustrated embodiment, each component may have different functions and capabilities other than those described below, and additional components may be included in addition to those described below.
[0073] The illustrated computing environment 10 includes a computing device 12. The computing device 12 may be a device for performing federative learning in an environment where domain shifts exist. In one embodiment, the computing device 12 may be a client 102. Alternatively, the computing device 12 may be a server 104.
[0074] The computing device 12 includes at least one processor 14, a computer-readable storage medium 16, and a communication bus 18. The processor 14 can be used to enable the computing device 12 to operate according to the exemplary embodiment described above. For example, the processor 14 can execute one or more programs stored on the computer-readable storage medium 16. The one or more programs may include one or more computer-executable instruction sets, and the computing device 12 can be configured to operate according to the exemplary embodiment when the computer-executable instruction sets are executed by the processor 14.
[0075] The computer-readable storage medium 16 is configured to store computer-executable instruction words or program code, program data, and / or other suitable forms of information. The program 20 stored on the computer-readable storage medium 16 includes a set of instruction words that can be executed by the processor 14. In one embodiment, the computer-readable storage medium 16 may be memory (volatile memory such as random-access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, other forms of storage media that can be accessed by other computing devices 12 to store desired information, or a suitable combination thereof.
[0076] The communication bus 18 includes a processor 14 and a computer-readable storage medium 16, and connects various other components of the computing device 12 to one another.
[0077] The computing device 12 may also include one or more input / output interfaces 22 providing interfaces for one or more input / output devices 24, and one or more network communication interfaces 26. The input / output interfaces 22 and network communication interfaces 26 are connected to a communication bus 18. The input / output devices 24 may be connected to other components of the computing device 12 via the input / output interfaces 22. Exemplary input / output devices 24 may include input devices such as pointing devices (such as a mouse or trackpad), keyboards, touch input devices (such as a touchpad or touchscreen), voice or sound input devices, multi-purpose sensor devices and / or imaging devices, and / or output devices such as display devices, printers, speakers, and / or network cards. Exemplary input / output devices 24 may be included inside the computing device 12 as one component of the computing device 12, or they may be connected to the computing device 12 as separate devices distinct from the computing device 12.
[0078] While representative embodiments of the present invention have been described in detail, a person with ordinary skill in the art to which the present invention pertains should understand that various modifications are possible to the above embodiments without departing from the scope of the present invention. Therefore, the scope of the rights of the present invention should not be limited to the embodiments described, but should be determined not only by the claims described later, but also by something equivalent to these claims. [Explanation of Symbols]
[0079] 100: Associative learning system 102: Client 104: Server 111: Feature extractor 113: Classifier
Claims
1. One or more processors, A method performed on a computing device which is a client that performs federative learning with a server, having memory for storing one or more programs executed by one or more processors, The steps include inputting local data into a feature extractor to extract feature vectors, The steps include inputting the feature vector into a classifier to perform classification of the local data into classes, The steps include receiving a generalized prototype of each class from the aforementioned server, A federative learning method comprising the step of training a local model including the feature extractor and the classifier using a first loss function based on the difference between the output value of the classifier and a preset ground truth value, and a second loss function for comparative learning between the feature vector and the generalized prototype of each class.
2. The generalized prototype is generated by readjusting the weight of the initial global prototype based on the distance between the local prototype and the initial global prototype for each class. The federative learning method according to claim 1, wherein the initial global prototype is generated by aggregating the class-specific local prototypes of each client.
3. The aforementioned generalized prototype is The associative learning method according to claim 2, wherein the initial global prototype is generated by assigning a larger weight to the initial global prototype as the distance between the local prototype and the initial global prototype increases, and by assigning a smaller weight to the initial global prototype as the distance between the local prototype and the initial global prototype decreases.
4. The second loss function is, The associative learning method according to claim 1, wherein the feature vector is made to approach generalized prototypes that have the same class as the feature vector, and to move away from generalized prototypes that have a different class from the feature vector.
5. The aforementioned associative learning method is, The associative learning method according to claim 1, further comprising the step of generating an augmented prototype for each class based on the feature vectors.
6. The step of generating the aforementioned enhanced prototype is: The steps include generating enhanced features by enhancing the aforementioned feature vector, The associative learning method according to claim 5, comprising the step of calculating the average of the enhancement features belonging to each class and generating an enhancement prototype for the class in question.
7. The aforementioned associative learning method is, The associative learning method according to claim 5, further comprising the step of learning the local model by a third loss function which is the difference between the feature vector and the augmentation prototype for each class.
8. One or more processors, A method performed on a computing device which is a server that performs federative learning with multiple clients, comprising memory for storing one or more programs executed by one or more processors, The steps include receiving local prototypes for each class from each client, The steps include generating an initial global prototype for each class based on the class-specific local prototypes received from each of the aforementioned clients, The steps include readjusting the weights for the initial global prototype to generate a generalized prototype for each class, A federative learning method comprising the step of transmitting a generalized prototype of each of the aforementioned classes to each of the aforementioned clients.
9. The step of generating the generalized prototype is: The associative learning method according to claim 8, wherein the weights of the initial global prototype are readjusted based on the distance between the local prototype and the initial global prototype of each class to generate a generalized prototype.
10. The step of generating the generalized prototype is: The associative learning method according to claim 9, wherein the greater the distance between the local prototype and the initial global prototype, the greater the weight assigned to the initial global prototype, and the smaller the distance between the local prototype and the initial global prototype, the smaller the weight assigned to the initial global prototype.
11. A federated learning system including multiple clients and servers, Each of the aforementioned clients, A federated learning system that inputs local data into a feature extractor to extract feature vectors, inputs the feature vectors into a classifier to classify the local data into classes, and trains a local model including the feature extractor and the classifier using a first loss function based on the difference between the output value of the classifier and a preset ground truth value, and a second loss function for comparative learning between the feature vectors and generalized prototypes of each class received from the server.
12. Each of the aforementioned clients generates a local prototype for each class based on the feature vector and transmits it to the server. The federative learning system according to claim 11, wherein the server generates an initial global prototype for each class based on class-specific local prototypes received from each client, readjusts the weights for the initial global prototypes to generate a generalized prototype for each class, and transmits the generalized prototype for each class to each client.
13. The aforementioned server, The associative learning system according to claim 12, wherein the greater the distance between the local prototype and the initial global prototype, the greater the weight assigned to the initial global prototype, and the smaller the distance between the local prototype and the initial global prototype, the smaller the weight assigned to the initial global prototype, thereby generating a generalized prototype for each class.
14. Each of the aforementioned clients, The associative learning system according to claim 11, which generates enhanced features by augmenting the feature vector, calculates the average of the enhanced features belonging to each class, and generates an enhanced prototype for the corresponding class.
15. Each of the aforementioned clients, The associative learning system according to claim 14, further training the local model by a third loss function which is the difference between the feature vector and the augmentation prototype for each class.