Transverse federated learning method and system using non-iid data

By calculating the weighted aggregation of local model parameter sets among clients, this technology solves the problems of model training adaptability and privacy protection for non-IID datasets in existing technologies, and achieves more efficient collaborative learning and model customization.

CN115552429BActive Publication Date: 2026-06-19HUAWEI CLOUD COMPUTING TECHNOLOGIES CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HUAWEI CLOUD COMPUTING TECHNOLOGIES CO LTD
Filing Date
2021-03-01
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing horizontal federated learning methods struggle to effectively utilize non-IID local datasets for model training and cannot customize learning models for each client while protecting data privacy.

Method used

Collaborative learning is achieved by calculating the similarity of local model parameter sets among clients, using weighted aggregation to update the local model parameter sets, and managing collaboration at the central node to learn task-related local models.

Benefits of technology

While protecting data privacy, a customized learning model was implemented for each client, improving the model's adaptability and accuracy, and making it suitable for training both shallow and deep models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115552429B_ABST
    Figure CN115552429B_ABST
Patent Text Reader

Abstract

A method and system for lateral federated learning are described. Multiple local model parameter sets are obtained. Each local model parameter set has been learned at its corresponding client. For each given local model parameter set, a collaboration coefficient representing the similarity between the given local model parameter set and every other local model parameter set is calculated. The multiple local model parameter sets are updated to obtain multiple updated local model parameter sets. Each given local model parameter set is updated using a weighted aggregation of the other local model parameter sets, wherein the weighted aggregation is calculated using the collaboration coefficients. The multiple updated local model parameter sets are provided to each corresponding client.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] Cross-references to related applications

[0002] This application claims priority to U.S. Patent Application Serial No. 16 / 890,682, filed June 2, 2020, entitled “METHODS AND SYSTEMS FOR HORIZONTAL FEDERATED LEARNING USING NON-IID DATA”, the contents of which are incorporated herein by reference. Technical Field

[0003] This invention relates to methods and systems for training and deploying machine learning-based models, and more specifically to methods and systems for performing lateral federated learning to learn task-related models using machine learning and non-random, homogeneous data. Background Technology

[0004] The usefulness of artificial intelligence (AI) or machine learning systems depends on the large amount of data used in the training process of these systems. However, many practical applications only have limited, low-quality data, which makes the application of AI technology difficult in some applications.

[0005] There has been ongoing interest in leveraging data from diverse sources to learn task-specific models using machine learning and data analytics. However, there are challenges in utilizing this data. For example, concerns and restrictions on data privacy (such as Europe's General Data Protection Regulation (GDPR) and China's Cybersecurity Law) can make it difficult, if not impossible, to centralize data from different sources, which is often necessary for the traditional use of machine learning and data analytics to learn task-specific models.

[0006] Federated learning is a machine learning technique in which multiple local data owners (also known as clients or nodes) collaboratively learn a task-related model without sharing their local training datasets. Horizontal federated learning can use non-IID local datasets that include laterally partitioned data. Non-IID (where IID stands for "Independent and Identical Distribution") means that local datasets may have different data distributions. Laterally partitioned data means that different local datasets include different sets of data samples that cover the same set of features. Some existing lateral federated learning methods, such as federated averaging (FedAvg) or federated proximal (FedProx), are based on learning a single, centralized model for all clients. These methods cannot customize the learned model for each client, and the single, centralized model may not be suitable for different non-IID local datasets. In another existing approach (called Federated Multi-Task Learning or MOCHA), different models can be learned for different clients; however, MOCHA methods require strict assumptions and simple convex set models, which limits their applicability.

[0007] Providing a lateral federated learning approach would be useful, as it can learn custom models for clients that store non-IID local datasets while ensuring data privacy. Summary of the Invention

[0008] In various exemplary embodiments, the present invention provides examples for implementing horizontal federated learning, wherein different clients privately store different non-IID local datasets including horizontal partition data.

[0009] In various exemplary embodiments, the present invention describes methods and systems that enable collaboration between clients to learn task-related models using machine learning and non-iid local datasets. Specifically, collaborative learning is performed without infringing on the privacy of any client data.

[0010] This invention describes exemplary embodiments that enable clients to collaborate when learning task-related models using machine learning without compromising the privacy of their local datasets. Specifically, collaboration can be based on similarity or clustering techniques. The disclosed exemplary embodiments can support the customization of local models related to the same task for different clients with different non-IID local datasets. The disclosed exemplary embodiments are generally applicable to different types of models learned using machine learning, including shallow and deep models.

[0011] This invention describes exemplary embodiments in a federated learning context; however, it should be understood that the disclosed exemplary embodiments can also be applied to implementation in the context of any distributed optimization or distributed learning system and multi-task learning system, specifically for non-IID local datasets.

[0012] In a first exemplary aspect, the present invention provides a computing system including a memory and a processing device in communication with the memory. The processing device is configured to execute instructions to cause the computing system to: obtain a plurality of local model parameter sets, each local model parameter set having been learned at a corresponding client; for each given model parameter set, compute one or more cooperation coefficients representing the similarity between the given local model parameter set and each other local model parameter set in the plurality of local model parameter sets; for each given local model parameter set, update the given local model parameter set by using a weighted aggregation of the other local model parameter sets to obtain a plurality of updated local model parameter sets, the weighted aggregation being calculated using the one or more cooperation coefficients; and provide the plurality of updated local model parameter sets to be sent to each corresponding client.

[0013] The processing device can be used to execute instructions to cause the computing system to calculate one or more cooperation coefficients for each given local model parameter set by the following steps: calculating the cosine similarity between the given local model parameter set and each of the corresponding other local model parameter sets in the plurality of local model parameter sets; and standardizing the cosine similarity values ​​to obtain the corresponding cooperation coefficients representing the similarity between the given local model parameter set and each of the corresponding other local model parameter sets in the plurality of local model parameter sets.

[0014] The processing device can be used to execute instructions to cause the computing system to perform an update for each given local model parameter set by the following steps: calculating a weighted average of the other local model parameter sets, the weighted average being a weighted aggregation; and adding the weighted average to the given local model parameter set.

[0015] The processing device can be used to execute instructions to enable the computing system to: generate an initial model parameter set; and provide the initial model parameter set to each client so that each client initializes its corresponding local model parameters to the initial model parameter set.

[0016] The processing device can be used to execute instructions to further enable the computing system to obtain multiple model parameter sets through the following steps: sending a request for the corresponding local model parameter set to the agent at each client, the corresponding local model parameter set having been learned at the corresponding client using dedicated data.

[0017] Iteration can be defined by the following steps: obtaining multiple sets of local model parameters; calculating one or more cooperative coefficients; performing updates; and providing multiple updated sets of local model parameters. The processing device can be used to execute instructions to further enable the computational system to: repeat the iteration until a predefined convergence condition is met.

[0018] In another exemplary aspect, the present invention provides a method for lateral federated learning. The method includes: obtaining a plurality of local model parameter sets, each model parameter set having been learned at a corresponding client; for each given local model parameter set, calculating one or more collaboration coefficients representing the similarity between the given local model parameter set and each other local model parameter set in the plurality of local model parameter sets; for each given local model parameter set, updating the given local model parameter set by using a weighted aggregation of the other local model parameter sets to obtain a plurality of updated local model parameter sets, the weighted aggregation being calculated using the one or more collaboration coefficients; and providing the plurality of updated local model parameter sets to be sent to each corresponding client.

[0019] This method may include any steps performed by the computing system described above.

[0020] In another exemplary aspect, the present invention provides a computer-readable medium having instructions stored therein. When executed by a processing device of a computing system, the instructions cause the computing system to: obtain a plurality of model parameter sets, each model parameter set representing a corresponding local model learned at a corresponding client; for each given local model parameter set, compute one or more cooperation coefficients representing the similarity between the given model parameter set and each of the other model parameter sets in the plurality of model parameter sets; for each given model parameter set, perform updates to the plurality of model parameter sets by updating the given model parameter set using a weighted aggregation of the other model parameter sets to obtain a plurality of updated model parameter sets, the weighted aggregation being calculated using one or more cooperation coefficients; and provide the plurality of updated model parameter sets to be sent to each corresponding client.

[0021] Computer-readable media may include instructions that cause a computing system to perform any of the steps described above. Attached Figure Description

[0022] The accompanying drawings, which illustrate exemplary embodiments of this application, are shown by way of example.

[0023] Figure 1 This is a block diagram of an exemplary system that can be used to implement federated learning.

[0024] Figure 2 This is a block diagram of an exemplary computing system that can be used to implement the exemplary embodiments described herein.

[0025] Figure 3 Is Figure 1 A block diagram illustrating an exemplary implementation of a horizontal federated learning system in systems such as [system name missing].

[0026] Figure 4 Is Figure 3 A flowchart illustrating an exemplary method for performing horizontal federated learning in systems such as [example system name].

[0027] Figure 5 Is Figure 3 The flowchart illustrates an exemplary method for performing collaborative updates in a horizontal federated learning system, etc.

[0028] Figure 6A and Figure 6B An example of grouping or clustering the learned model during multiple rounds of training is shown.

[0029] Figure 7 yes Figure 3 A block diagram of an exemplary configuration of the system in the prediction / inference phase.

[0030] The same reference numerals may be used to denote the same parts in different figures. Detailed Implementation

[0031] In the exemplary embodiments disclosed herein, methods and systems for implementing practical applications of horizontal federated learning using non-IID local datasets are described. Some exemplary embodiments of the disclosed methods may be referred to as non-IID horizontal federated learning (NHFL); more generally, the exemplary embodiments described herein may simply be referred to as horizontal federated learning. The exemplary embodiments disclosed herein can implement horizontal federated learning using non-IID local datasets. Data security can be maintained while enabling customization of learning models for non-IID datasets, collaboration between non-IID datasets, and relatively high versatility. To aid in understanding the invention, the following is discussed first: Figure 1 .

[0032] Figure 1 An exemplary system 100, as discussed herein, is shown as an example that can be used to implement lateral federated learning using non-IID local datasets. For ease of understanding, system 100 is simplified in this example; typically, the entities and components in system 100 can be more... Figure 1 More entities and components are shown.

[0033] System 100 includes multiple clients 102, each client 102 collecting and storing a corresponding local dataset (also referred to as a local dataset). Each client 102 can run a supervised machine learning algorithm to update the parameters of a local model using the local dataset (i.e., the local dataset). For example, each client 102 can run a supervised machine learning algorithm to learn the weights of a neural network that approximates the model. For the purposes of this invention, running a machine learning algorithm at client 102 means executing computer-readable instructions of the machine learning algorithm to update the parameters of the local model. For generality, there can be k clients 102 (k is any integer greater than 1), and therefore there can be k local datasets. Local datasets are typically non-IID datasets (IID stands for "Independent and Identical Distribution"), meaning that local datasets are unique and distinct from one another, and it may be impossible to infer the characteristics or distribution of any one local dataset from any other local dataset. Each client 102 can independently be an end-user device, a network device, a private network, or other singular or plural entity storing private data. When client 102 is an end-user device, client 102 may be or may include client equipment / terminals, user equipment / devices (UE), wireless transmit / receive units (WTRU), mobile stations, fixed or mobile subscriber units, cellular phones, stations (STA), personal digital assistants (PDAs), smartphones, laptops, computers, tablets, wireless sensors, wearable devices, smart devices, machine-type communication devices, intelligent (or connected) vehicles, or consumer electronics devices. When client 102 is a network device, client 102 may be or may include base stations (BS) (e.g., eNodeB or gNodeB), routers, access points (APs), personal basic service set (PBSS) coordinate points (PCPs), etc. When client 102 is a private network, client 102 may be or may include private networks of institutions (e.g., hospitals or financial institutions), retailers or retail platforms, corporate intranets, etc.

[0034] When client 102 is an end-user device, the local data at client 102 can be data collected or generated during actual use by the user of client 102 (e.g., captured images / videos, captured sensor data, captured tracking data, etc.). When client 102 is a network device, the local data at client 102 can be data collected from end-user devices associated with or served by the network device. For example, client 102, acting as a BS, can collect data from multiple user devices (e.g., tracking data, network usage data, traffic data, etc.), and this can be stored locally on the BS.

[0035] Regardless of the form of client 102, the data collected and stored by each client 102 as a local dataset is considered private (e.g., if client 102 is on a private network, its use is restricted to within the private network only), and it is generally desirable to ensure the privacy and security of the local dataset at each client 102.

[0036] For horizontal federated learning, the local datasets stored by the corresponding clients 102 are horizontally partitioned. That is, each of the k local datasets includes different data samples representing the same set of features. The data samples included in different local datasets may or may not overlap, and the distribution of the k local datasets is non-IID. It should be noted that clients 102 can also store local datasets that are not horizontally partitioned; however, in horizontal federated learning, local datasets that are not horizontally partitioned across multiple clients 102 can be disregarded.

[0037] To learn a task-relevant and effective model using laterally partitioned data during training, traditional (non-federated learning) methods collect local datasets from all clients 102 and use an aggregated dataset (created by collecting data samples from all local datasets) to learn a single central model. However, collecting local datasets from all clients 102 in this way compromises the data privacy of all clients 102.

[0038] In contrast, in lateral federated learning, the clients 102 do not expose their respective local datasets or the parameters of their local models (hereinafter referred to as local model parameters) to each other. Instead, the clients 102 collaborate to learn a single task-relevant global model, achieving performance comparable to traditional task-relevant learning models.

[0039] exist Figure 1 In the example, client 102 communicates with central node 110. Communication between each client 102 and central node 110 can be via any suitable network (e.g., the Internet, P2P network, WAN, and / or LAN), and can be a public network.

[0040] Central node 110 can be implemented using one or more servers, but the following discussion uses a single server as an example of central node 110. It should be understood that central node 110 can include servers, distributed computing systems, virtual machines, or containers (also referred to as Docker containers or Docker) running on infrastructure in a data center, infrastructure provided as a service by a cloud service provider (e.g., virtual machines), etc. Typically, central node 110 (including the horizontal federated learning system 200 discussed further below) can be implemented using any suitable combination of hardware and software and can be implemented as a single physical device (e.g., a server) or multiple physical devices (e.g., multiple machines sharing pooled resources in the case of a cloud service provider, etc.). Therefore, central node 110 can also generally be referred to as a computing system or processing system. Central node 110 can be used to implement collaborative federated learning, as discussed further below. Central node 110 can implement the techniques and methods described herein.

[0041] Figure 2 This is a simplified exemplary implementation of a central node 110 in the form of a server (e.g., a cloud server). Other examples suitable for implementing the embodiments described in this invention may be used, and these examples may include components different from those described below. Although Figure 2 A single instance of each component is shown, but multiple instances of each component may exist in the central node 110.

[0042] Central node 110 (e.g., implemented as a server) may include one or more processing devices 114, such as processors, microprocessors, digital signal processors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), dedicated logic circuits, dedicated artificial intelligence processing units, tensor processing units, neural processing units, hardware accelerators, or combinations thereof. Central node 110 may also include one or more optional input / output (I / O) interfaces 116, which may support connection to one or more optional input devices 118 and / or optional output devices 120.

[0043] In the illustrated example, input device 118 (e.g., keyboard, mouse, microphone, touchscreen, and / or keypad) and output device 120 (e.g., display, speaker, and / or printer) are shown as optional and external to the server. In other exemplary embodiments, no input device 118 and output device 120 may be present, in which case I / O interface 116 may not be required.

[0044] Central node 110 (e.g., implemented as a server) may include one or more network interfaces 122 for wired or wireless communication with network 104, node 102, or other entities in system 100. Network interface 122 may include wired links (e.g., Ethernet cables) and / or wireless links (e.g., one or more antennas) for intra-network and / or inter-network communication.

[0045] The central node 110 (e.g., implemented as a server) may also include one or more storage units 124, which may include high-capacity storage units such as solid-state drives, hard disk drives, disk drives and / or optical disk drives.

[0046] Central node 110 (e.g., implemented as a server) may include one or more memories 128, which may include volatile or non-volatile memories (e.g., flash memory, random access memory (RAM), and / or read-only memory (ROM)). Non-transitory memories 128 may store instructions executed by processing device 114 to perform exemplary embodiments described herein, etc. One or more memories 128 may include other software instructions, such as software instructions for implementing an operating system and other applications / functions. In some exemplary embodiments, memories 128 may include software instructions executed by processing device 114 to implement federated learning system 200 (for performing FL), as discussed further below. In some exemplary embodiments, additionally or alternatively, the server may execute instructions from external memory (e.g., an external drive connected to the server via wired or wireless communication), or executable instructions may be provided by transient or non-transitory computer-readable media. Examples of non-transitory computer-readable media include RAM, ROM, erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, CD-ROM, or other portable storage devices.

[0047] Some background discussion of lateral federated learning is now provided. Typically, lateral federated learning involves multiple rounds of training to learn a task-relevant global model (i.e., the learned parameters of the global model), with each round involving communication between the central node 110 and the clients 102. An example of a single round of training is now described. Each client 102 sends an update to the central node 110, where the update includes the local model parameters or gradients of its local model (i.e., the difference between the learned local model parameters and the previously received global model parameters). The local model at each client 102 is task-relevant to the same task as the global model. At the central node 110, the received updates are aggregated in some way, and the parameters of the global model are updated based on the aggregated local model parameters. The updated parameters of the global model are then sent back to each client 102. Each client 102 then initializes its local model parameters with the updated global model parameters and uses its local dataset and a machine learning algorithm to learn updates to its corresponding local model parameters. Training rounds can continue until a convergence condition is met. For example, a convergence condition can be satisfied when the number of training epochs meets or exceeds a predefined threshold number (e.g., defined by the administrator of central node 110). In another example, a convergence condition can be satisfied when the global model from the previous epoch and the global model updated in this epoch are sufficiently close to each other (e.g., the change between the global model from the previous epoch and the global model in this epoch is less than a predefined percentage or difference). In yet another example, a convergence condition can be satisfied if the 2-norm of the gradient is within a predefined threshold (e.g., if the update is a gradient). Generally, the 2-norm of the gradient cannot be used as a convergence condition when using deep neural networks to approximate task-related local and global models. Instead, a predefined number of iterations can be used as a convergence condition (i.e., training should stop after a certain number of iterations).

[0048] In some exemplary embodiments, for each round of training, only a subset (e.g., 10%) of all available clients 102 is selected. The selection for each round of training may be randomized.

[0049] At the central node 110, different methods can be used to aggregate updates received from clients 102. One such method is called federated averaging (or simply "FedAvg"), as described, for example, by McMahan et al. ("Communication-efficient learning of deep networks from decentralized data," AISTATS, 2017). In the FedAvg method, the central node 110 receives updates from each corresponding client 102, including updates to the local model parameters of its respective client, and aggregates the received local model parameters by averaging all the local model parameters to generate an updated global model that includes global parameters updated using the averaged local model parameters. The updated global parameters of the updated global model are then sent to each client 102. Each client 102 initializes its local model parameters with the received updated global parameters of the updated global model, learns updates to the local model parameters using its corresponding local dataset and machine learning algorithm, and transmits updates including the corresponding learned local model parameters back to the central node 110. When the convergence condition is met, the final averaged model is sent to all clients 102 and used by all clients 102. In the FedAvg method, all clients 102 collaboratively learn a single task-related global model during training, and a copy of this single task-related global model is deployed to all clients 102 (after convergence) and used by all clients 102 for prediction. A drawback of the FedAvg method of lateral federated learning is that when the local dataset is non-IID, the global parameters of the global model (updated using the averaged local model parameters) may not fit all clients 102. The FedAvg method results in all clients 102 using a single global model, meaning there is no ability to customize the single task-related global model based on each client 102's local dataset. This can lead to further performance degradation.

[0050] Another approach that can be used is called Federated Prox (or simply "FedProx"), as described by Li et al. ("Federated optimization in heterogeneous networks", SysML 2020). Similar to FedAvg, in the FedProx method, an updated global model (including global parameters updated using the averaged local model parameters) is generated at the central node 110 by averaging all local model parameters received from client 102. Each client 102 initializes its local model parameters to the received average model parameters of the global model and learns updates to the local model parameters using its corresponding local dataset and machine learning algorithm, but with an additional proximity constraint that the updated local model parameters should be similar to the global model parameters. When the convergence condition is met, the final model parameters of the updated global model are sent to all clients 102 and used by all clients 102 for prediction. Similar to the FedAvg method, the FedProx method produces a single global model that includes the global model parameters that will be used by all clients 102 for prediction. Therefore, the drawbacks of the FedPox method may be similar to those discussed above for the FedAvg method.

[0051] Another approach to lateral federated learning is called Federated Multi-Task Learning (or "MOCHA"), described for example by Smith et al. ("Federated Multi-Task Learning" in Advances in Neural Information Processing Systems (NeurIPS), pp. 4424–4434, 2017). MOCHA is based on updating the model using dual variables. Instead of each client 102 sending updates of its local model parameters to the central node 110, each client 102 sends the corresponding locally updated dual variables of its model to the central node 110. The central node 110 uses the received dual variables to estimate the relationships between the models of the different clients 102 and then updates the dual variables corresponding to each client 102. The updated dual variables are then sent to each corresponding client 102. Each client 102 performs a local update of its corresponding dual variables using its corresponding local dataset. When the convergence condition is met, the central node 110 collects the dual variables from all clients 102. These dual variables are used at the central node 110 to recover the corresponding local model for each client 102. Each client 102 can then use its corresponding local model recovered by the central node 110. While the MOCHA method allows the local model at each client 102 to be well-tailored to each local dataset, its use may be limited. For example, the MOCHA method requires certain strict assumptions and is designed to work using simple convex set models. It may not be applicable to other applications, such as training deep neural networks.

[0052] In the exemplary embodiments described herein, methods for horizontal federated learning are described that can help address at least some of the aforementioned challenges and / or drawbacks. In some exemplary embodiments, a non-IID horizontal federated learning (NHFL) system is described that can help maintain the data privacy of each client 102 while still supporting collaboration (at the central node 110) when learning models using task-related machine learning. The described exemplary embodiments enable each client 102 to learn its corresponding local model related to the same task (rather than a single global model related to the task used by all clients 102), which can be better fitted to its corresponding non-IID local dataset, thus achieving better performance (e.g., higher accuracy) of the learned local model. During training, collaboration managed at the central node 110 enables local models related to the same task learned at different clients 102 to learn from each other based on the similarity and / or clustering of the local models. The disclosed exemplary embodiments are generally applicable to training various types of models (including shallow and deep models) using machine learning.

[0053] To aid in understanding this invention, some notation is introduced. k is the number of clients 102 participating in a given round of training. Although the number of clients 102 participating in a given round of training can vary from round to round, for simplicity, it will be assumed that k clients 102 participate in the current round of training without loss of generality. The training dataset used to perform the update of the local model parameter set of the local model at the j-th client 102 is denoted as X. j (Where, j is typically an integer between 1 and k, including the extreme values). It should be noted that the training dataset X... j It may not be all available data samples from the local dataset suitable for learning the local model parameters during training on client j, 102. In the case of performing a full-batch training method, X j This refers to all suitable data samples available at client j, position 102. In the case of performing a randomized training method, for example, X... j This could be a set of data samples randomly sampled from suitable data samples available at client j, location 102. The training dataset X j Local model parameters are used to learn the local model (also referred to as model(j) or the j-th model) at the j-th client 102. The set of local model parameters (e.g., the weights of the neural network) of the local model(j) stored at the j-th client 102 is represented as θ. j Local model parameter set θ jThis includes all parameters of the local model (j). Mathematically, the local model (j) at the j-th client 102 (also known as the special model (j)) can be represented as g j (X j |θ j ), to indicate that the local model (j) depends on the training dataset X. j and model parameter set θ j .

[0054] Figure 3 This is a block diagram of system 100 with more details, including details that can be used to implement the horizontal federated learning system 200 in central node 110. For simplicity, central node 110 is shown as a single server (e.g., as...). Figure 2 (As shown). However, it should be understood that the central node 110 can actually be a virtual server or virtual machine implemented by pooling resources across multiple physical servers, or it can be implemented using a virtual machine or container (also known as a Docker container or Docker) within a single physical server, and so on.

[0055] For simplicity, each client 102 is shown to have a similar configuration. However, it should be understood that different clients 102 can have different architectures. For example, a client 102 may store multiple different local models and / or may access multiple different local datasets. As shown, each client 102 hosts a corresponding local model 104, a corresponding local dataset 106 (also referred to as local dataset 106), and a corresponding agent 108. In some exemplary embodiments, the local model 104, local dataset 106, and agent 108 may be stored by the corresponding client 102 within a virtual machine or container (e.g., in the case where client 102 is a private network). As mentioned above, each client 102 can have different configurations that are independent of each other, and some clients 102 may use containers while others may not.

[0056] The local datasets 106 of different clients 102 are horizontally partitioned. Each client 102's local model 104 can have full access to its corresponding local dataset 106. Each client 102's local model 104 is associated with the same task. An agent 108 at each client 102 manages the sending and receiving of information to and from the central node 110, enabling each client 102 to participate in collaborative learning of its task-related local models 104 during training. The agent 108 can manage all communication between the corresponding client 102 at the central node 110 and the coordinator 202. Specifically, the agent 108 can help ensure that no corresponding local dataset 106 is sent out from the corresponding client 102 or accessed by any entity outside of the client 102.

[0057] In this example, there is no communication or interaction between the different clients 102. The agent 108 at each client 102 is only used during the training phase (i.e., during training) as each client 102 learns its corresponding local model 104 (i.e., the set of local model parameters for its corresponding local model 104). After the training phase is complete (e.g., after the convergence condition is met), the agent 108 may no longer be needed, and there may be no further communication between each client 102 and the central node 110. Furthermore, after the training phase is complete, the local model 104 can be used for predictions during the inference phase. Further discussion regarding the inference phase is provided below.

[0058] Collaborative learning is performed using a lateral federated learning system 200 at the central node 110. The central node 110 cannot access or receive any local dataset 106 from the clients 102. In this example, the lateral federated learning system 200 includes a coordinator 202 and a collaborative update block 204. The coordinator 202 coordinates communication with the clients 102 during training, including receiving local model parameter sets from each client 102 and sending updated model parameters to each client 102. The collaborative update block 204 performs grouped collaborative updates, as discussed further below.

[0059] Federated learning system 200 can be implemented using software (e.g., instructions for execution by processing device 114 of central node 110), hardware (e.g., programmable electronic circuits designed to perform specific functions), or a combination of software and hardware. Although federated learning system 200 is shown and described with respect to blocks 202 and 204, it should be understood that this is for illustrative purposes only and is not intended to be limiting. For example, federated learning system 200 may not be functionally divided into blocks 202 and 204, but may be implemented as a single block or a single overall function. Furthermore, a function described as being executed by one of blocks 202 and 204 may alternatively be executed by the other of blocks 202 and 204.

[0060] Figure 4 This is a flowchart illustrating an exemplary method 400 for the training phase that can be executed by system 100. The steps of method 400 are executed differently by client 102 and central node 110. Figure 4 A general overview of the training phase (i.e., training the local model 104 to learn the local model parameter set of the local model 104) is provided, and further details are shown in the additional figures discussed below.

[0061] At 402, initialization is performed. Initialization can be defined as the start of training and the start of all training rounds. Initialization can be performed by the lateral federated learning system 200 at the central node 110. For example, the lateral federated learning system 200 can (through communication between the coordinator 202 and the agent 108) collect information about the model structure of the corresponding local models 104 from all clients 102. For example, the lateral federated learning system 200 generates an initial model parameter set θ by initially setting the value of each parameter in the initial model parameter set θ to a corresponding random value. The initial model parameter set θ is sent to all clients 102. In some exemplary embodiments, the lateral federated learning system 200 may also send a notification to each client 102 to initialize the corresponding local model 104 using the initial model parameter set θ. After receiving the initial model parameter set θ, each client 102 initializes its corresponding local model 104 using the received initial model parameter set θ. That is, the result of initialization is that the k local models 104 stored by the k clients 102 all have the same model parameters:

[0062] θ1=θ2=…=θ k =θ

[0063] At 404, each client 102 performs training on its corresponding local model 104 to learn updates to its local model parameter set using the corresponding local dataset 106 and a machine learning algorithm. That is, the j-th client 102 uses the training dataset X. j(It can be the entirety or a subset of the local dataset 106) to train a local model (j) to learn its local model parameter set θ j The update. Training can be performed by each client 102 for a defined number of epochs, which may or may not be the same across all clients 102.

[0064] At position 406, the horizontal federated learning system 200 at the central node 110 collects the local model parameter set θ1, ..., θ2 from all k clients 102. k For example, the coordinator 202 can send a request to each agent 108 to obtain the local model parameter set θ1, ..., θ2 from each of the k clients 102. k (For example, such as) Figure 3 (As shown by the solid black arrow in the diagram). The lateral federated learning system 200 (e.g., using collaborative update block 204) performs collaborative updates to generate multiple updated local model parameter sets θ′1, ..., θ′. k The details of the collaborative updates will be discussed further below. Each updated local model parameter set is sent back to the corresponding client 102 by the central node 110, and the local model parameter set θ1, ..., θ2 is... k It is received from the client 102 (e.g., through the corresponding proxy 108, such as...). Figure 3 (As shown by the dashed arrow in the diagram).

[0065] At step 408, it is determined whether a predefined convergence condition (e.g., any suitable convergence condition discussed above) is satisfied. This determination can be performed by the central node 110, or in some cases by a single client 102. If the convergence condition is not satisfied, method 400 can return to step 404 to perform another round of training. If the convergence condition is satisfied, method 400 can proceed to step 410. Typically, the determination of convergence is performed by the central node 110. After determining that the convergence condition is satisfied, the central node 110 can notify each client 102 that the training phase has ended.

[0066] If convergence is determined by a single client 102, some clients 102 may determine that the convergence condition is met, while others may determine that it is not (e.g., in cases where different clients 102 have different predefined convergence conditions). In such a case, the clients 102 that determine the convergence condition is met can simply stop participating in further training rounds, and method 400 returns to step 404, reducing the number of clients 102.

[0067] At 410, each client 102 stores a corresponding trained local model 104, which includes a corresponding collaboratively updated set of local model parameters received from the lateral federated learning system 200 at the central node 110. The trained local model 104 can then be deployed for prediction during the inference phase.

[0068] Further details regarding collaborative updates will now be described. Although the local datasets 106 at different clients 102 are non-IID, in practice, some similarity can be expected within the local datasets 106 (e.g., similarity based on geographic location, demographics, etc., associated with client 102). Therefore, some grouping or clustering of the local datasets 106 can be reasonably expected. Local models 104 learned using similar local datasets 106 should be similar to each other, and it would be beneficial to leverage this similarity to enable similar local models 104 to learn collaboratively. However, due to the non-IID nature of the local datasets 106, the techniques used to achieve collaborative learning are important, thus promoting collaboration between more similar models and discouraging collaboration between less similar models. Specifically, simply averaging all local models will not achieve beneficial collaboration.

[0069] Figure 5 This is a flowchart detailing an exemplary method 500 for performing collaborative updates. Method 500 can be executed by a horizontal federated learning system 200 using collaborative update block 204, etc.

[0070] At position 502, local model parameter sets θ1, ..., θ2 are received from all clients 102. k (For example, communication between coordinator 202 and the corresponding agent 108).

[0071] At step 504, a cooperation coefficient is calculated. In subsequent step 506, the cooperation coefficient is used to perform a weighted update of each local model parameter set. The cooperation coefficient can be considered a numerical or mathematical representation of the similarity between different local model parameter sets. Specifically, the cooperation coefficient can represent the pairwise similarity between pairs of local model parameter sets. For a first local model parameter set among multiple local model parameter sets, a cooperation coefficient can be calculated to represent the similarity between the first local model parameter set and each of the other local model parameter sets (the local model parameter sets of the total k local model parameter sets received in step 502). The cooperation coefficient between the first and second local model parameter sets among multiple local model parameter sets can be used to control how much influence the second local model parameter set will have on the update of the first local model parameters, and vice versa.

[0072] For example, for a given set of local model parameters θ i and θj The pairwise cooperation coefficient α can be calculated. ij Cooperation coefficient α ij A higher value can indicate a local model parameter set θ i With θ j The similarity between them is high, while the cooperation coefficient α ij A lower value can indicate a local model parameter set θ i With θ j The similarity between them is low. It should be noted that the range and meaning of a given collaboration coefficient value can depend on the technique used to calculate the collaboration coefficient. For example, (depending on the calculation technique) a higher collaboration coefficient value may indicate lower similarity, and a lower collaboration coefficient value may indicate higher similarity.

[0073] An exemplary technique for calculating the cooperation coefficient is based on pairwise cosine similarity among all local model parameter sets. Pairwise cosine similarity is calculated by pairing each local model parameter set with every other local model parameter set (a total of k×k pairs), and the pairwise cosine similarity is calculated as follows:

[0074] s ij =cos(θ) i θ j )

[0075] Among them, s ij It is the local model parameter set θ i With the local model parameter set θ j The cosine similarity between , where 1≤i, j≤k. Cosine similarity s ij The value 1 indicates that the local model parameter set θ i With the local model parameter set θ j The highest degree of similarity between them (e.g., the two local models have the same parameter set). Cosine similarity s ij A value of 0 indicates that the local model parameter set θ i With the local model parameter set θ j The minimum similarity between them (e.g., vectors formed by local model parameter sets are perpendicular to each other).

[0076] Then, the local model parameter set θ i With the local model parameter set θ j The cooperation coefficient α between them ij It can be calculated as follows:

[0077]

[0078] This calculation can be performed to convert the cosine similarity values ​​into standardized collaboration coefficients, which may be more suitable for calculating weighted aggregation. Other techniques can be used to calculate collaboration coefficients between pairs of local model parameter sets. For example, suitable clustering techniques can be used to group the model parameter sets θ1, ..., θ k Grouping into clusters and computing local model parameter sets θ i With the local model parameter set θ j The cooperation coefficient α between them ij θ, as the local model parameter set i With the local model parameter set θ j The distance between the centroids of the clusters they belong to. In another possible approach, the local model parameter set θ i With the local model parameter set θ j The cooperation coefficient α between them ij It can be calculated as the simple Euclidean distance between two local model parameter sets. It is used to represent a pair of local model parameter sets θ. i With θ j Any other technique for calculating the similarity between them can be used to calculate the cooperation coefficient α. ij .

[0079] At position 506, each local model parameter set is updated using an aggregation of other local model parameter sets weighted by their corresponding collaboration coefficients. That is, for the first local model parameters, the pairwise collaboration coefficients between the first and second local model parameter sets are used to weight the impact of the second local model parameter set on the update of the first local model parameters. Mathematically, this can be expressed as:

[0080]

[0081] Where the symbol ← indicates updating the i-th local model parameter set θ i The update process, θ′ i Let represent the local model parameter set updated at the i-th time (to distinguish it from the local model parameter set θi initially received from the i-th client), and ∑ j α ij θ j Conceptually, it can be thought of as a weighted aggregation (or weighted average) of all other local model parameter sets. The update process can simply replace the i-th local model parameter set θ with the weighted average. i .

[0082] For all model parameters θ1, ..., θ k Perform this collaborative update to obtain k updated local model parameter sets θ′1, ..., θ′ k .

[0083] At 508, each updated local model parameter set is sent to the corresponding client 102. For example, the coordinator 202 can identify the client 102 (or the agent 108 corresponding to that client 102) for a given updated local model parameter set using some identifying metadata associated with the given updated local model parameter set (e.g., a tag or identifier initially associated with each local model parameter set received from the client 102).

[0084] After receiving the updated local model parameters, each client 102 can update its own corresponding local model 104 using the updated local model parameter set. Each client node 102 updates its local model 104 by changing the local model parameter set θ of its local model 104. i Initialize to the updated local model parameter set θ′ i This updates its own corresponding model 104. If the convergence condition is met, the local model 104 (e.g., including the learned local model parameter set) is trained and can then be deployed for prediction during the inference phase.

[0085] In the collaborative updates described above, grouping collaboration can serve as the basis for updating each local model parameter set. It should be noted that it may not be necessary to explicitly form groups or clusters of local model parameter sets, nor may it be necessary to specify the number of groups to form. For example, the calculation of cosine similarity as described above can be viewed as an implicit grouping of local model parameter sets based on similarity. Local model parameter sets that are highly similar to each other (or grouped together) indicate that the corresponding local datasets (on which local models are learned) also have similar data distributions. Such similar local models are expected to benefit from strong collaboration with each other. For example, attention-based averaging can be used between local models, resulting in stronger weight-sharing effects among more similar local models.

[0086] In multiple training iterations, the collaboration of attention-based attention and grouping methods discussed above may lead to adaptive grouping of similar local models. Figure 6A and Figure 6B An exemplary illustration of the concept of adaptive grouping is shown. These figures represent 12 local model parameter sets θ1, ..., θ2. 12 The similarity is represented by the corresponding circle, and the similarity between each of the 12 local model parameter sets and the first model parameter set θ1 in the 12 local model parameter sets is represented by a solid or dashed line. The shorter the line, the higher the similarity to the first model parameter set θ1.

[0087] Figure 6A This represents the local model parameter set θ1, ..., θ2 before any collaborative update is performed. 12The similarity between some local model parameter sets (e.g., θ2, ..., θ6 in this example) and some other local model parameter sets (e.g., θ7, ..., θ6 in this example) in the 12 local model parameter sets. 12 Compared to the first local model parameter set θ1, the former shows a higher similarity. In this example, there are 12 local model parameter sets θ1, ..., θ 12 Two clusters are formed, represented by black and white circles, respectively. Using collaborative updates, over multiple training rounds (e.g., 10 rounds), the local model parameter sets belonging to a given cluster become more similar, such as... Figure 6B As shown, as the local model parameter sets within a given cluster become more similar to each other, more collaboration within the cluster is encouraged.

[0088] When training (e.g., federated learning) is complete (e.g., when convergence is met), multiple updated local model parameter sets are available to update the corresponding local model parameter sets of local model 104 at the respective client 102. Each client 102 can then independently use its corresponding trained local model 104 to make predictions during inference (e.g., to make predictions on new local data at client 102).

[0089] Figure 7 This is a block diagram of system 100, which includes local models trained for making predictions during inference. It should be noted that although the central node 110 is... Figure 7 As shown, however, the central node 110 may not play any role in the prediction / inference phase. Each client 102 may independently include its corresponding trained local model for making predictions without communicating with the central node 110 or any other client 102. The trained local model 104 is a set of learned local model parameters θ. i (For example, using the updated local model parameter set θ′ when the training phase is complete) i The model (updated local model parameter set).

[0090] Agent 108 at each client 102 can be inactive and can be omitted or rendered inactive when the trained local model is used for prediction. Figure 3 Similarly, for simplicity, each client 102 is shown to have a similar configuration. However, it should be understood that different clients 102 may have different architectures, each client 102 may be a single physical computing unit (e.g., a single device or a single server), or it may be a network of devices / servers (e.g., a private network), and each client 102 may or may not use containers or virtual machines to host the corresponding trained local model and local data 106.

[0091] like Figure 7 As shown, after training the local model 104 related to the same task, each client 102 stores the corresponding trained local model 104. For the i-th client 102, the trained local model 104 is represented as g. i (X i |θ i ), to indicate that the trained local model 104 has used the local data X. i (e.g., training on a local dataset of 106) and including the learned parameter set θ i As discussed above, the learned parameter set θ i During training, collaborative learning is used at the central node 110. During inference, each client 102 can make predictions using its own corresponding trained local model 104. The trained local model 104 does not require interaction between clients 102 to make predictions. Some possible predictions made by the trained model include predicting class labels, class scores, probabilities, or bounding boxes.

[0092] Some exemplary simulation results are discussed. It has been found that by achieving collaboration between similar local models, the disclosed lateral federated learning method outperforms the individual local models trained on each client 102.

[0093] In the first simulation example, data from the Fashion-MNIST dataset (a dataset of 28×28 grayscale images associated with labels “0” through “9” forming 10 classes) was used. In this simulation, 60,000 data samples were used for training and 10,000 for testing. For the simulation, the number of clients was set to K=10, and the data samples were uniformly distributed across the clients (so that each client held 6,000 training data samples and 1,000 test data samples). To simulate non-IID local data, data samples representing labels “0”–“4” were assigned to 5 clients, and data samples representing labels “5”–“9” were assigned to the other 5 clients. Furthermore, for each client, half of the training data samples represented a single assigned label, and the other half represented the other four assigned labels. Therefore, the majority of labels for each client were unique and different from those of other clients. The training and test data samples for the same client followed the same data distribution.

[0094] In the exemplary simulation, the local models learned during training, relevant to the image classification task, are learned using data samples saved by all clients to collaboratively learn local models for performing image classification, without revealing any dedicated data samples or local model parameters for each client. For each client, a deep neural network with two convolutional layers and two fully connected layers is used to learn the model, with the final layer producing a 10-dimensional vector as output. The clients execute supervised machine learning algorithms to update the parameter set (e.g., weights) of the deep neural network. The local models are trained (using the exemplary training method discussed herein, as well as other known training methods used for comparison). The trained local models are then used to make predictions using each test dataset, and the accuracy of the predictions is determined.

[0095] During the training phase, the training hyperparameters used were: 100 internal training generations; 20 external training iterations; a batch size of 1000; and Adam as the optimizer. Internal training generations refer to the number of training generations executed by each client 102, external training iterations refer to the number of training epochs executed through collaborative updates (at the central node 110), and batch size refers to the number of data samples randomly sampled from the local dataset 106 in each training epoch. The training methods compared were: the exemplary training method discussed in this paper; centralized training using all local datasets (i.e., traditional non-federated learning, without data privacy); training each local model individually using the local dataset (i.e., each client performs individual and independent non-cooperative training); the FedAvg method; and the FedProx method. Simulation results show that the exemplary training method has an accuracy of 96.1%; the centralized training method has an accuracy of 92.2%; the individual training method has an accuracy of 95.0%; the FedAvg method has an accuracy of 82.5%; and the FedProx method has an accuracy of 81.8%. Therefore, the first simulation shows that the exemplary method discussed in this paper has the highest accuracy among the training methods compared.

[0096] A second simulation example was executed, with settings similar to the first simulation example, but with a larger number of clients. In the second simulation example, 50 clients were used, with each client storing 1200 training data samples and 200 test data samples. Similar training parameters were used. The results of the second simulation show that the accuracy of the exemplary training method disclosed herein is 95.0%; the accuracy of the ensemble training method is 92.2%; the accuracy of the individual training method is 92.0%; the accuracy of the FedAvg method is 80.2%; and the accuracy of the FedProx method is 80.8%. Therefore, the second simulation also demonstrates that the exemplary method disclosed herein has the highest accuracy among the compared training methods.

[0097] A more challenging third simulation was conducted. To increase the challenge, data from the extended MNIST dataset was used. The extended MNIST dataset consists of a set of handwritten alphanumeric characters in 28×28 pixel image format. Possible class labels are the digits “0” to “9”, uppercase letters “A” to “Z”, and lowercase letters “a” to “z” (there are therefore 62 different labels in total). In this simulation, 697,932 data samples were used for training, and 116,323 data samples were used for testing. For the simulation, the number of clients was set to K=62, with each client having a different number of training and testing data samples. 10 clients were assigned data samples representing labels “0” to “9”; 26 other clients were assigned data samples representing labels “A” to “Z”; and the remaining 26 clients were assigned data samples representing labels “a” to “z”. Furthermore, for each client, half of the training data samples represent a single assigned label, and the other half represent the remaining assigned labels. Therefore, most labels for each client are unique and different from those of other clients. Training and testing data samples from the same client follow the same data distribution.

[0098] During the training phase, the training hyperparameters used were: 30 internal training generations; 5 external training iterations; batch size of 1000; and Adam as the optimizer. Simulation results show that the accuracy of the exemplary training method is 93.3%; the accuracy of the ensemble training method is 77.4%; the accuracy of the individual training method is 88.8%; the accuracy of the FedAvg method is 37.7%; and the accuracy of the FedProx method is 18.1%. Therefore, a more challenging third simulation demonstrates that the exemplary method discussed in this paper significantly improves accuracy compared to some known training methods.

[0099] In various exemplary embodiments, the present invention describes methods and systems for performing lateral federated learning. The disclosed exemplary embodiments support collaboration between clients while maintaining the data privacy of each client. Local models learned using the lateral federated learning techniques discussed herein can achieve relatively high accuracy performance for all clients with non-IID data distributions. Collaboration between non-IID clients is achieved through grouped collaboration (e.g., implicitly by calculating collaboration coefficients between pairs of model parameter sets). This collaboration can improve the accuracy of the trained model's performance compared to training a model individually for each client.

[0100] Compared to other known federated learning techniques, the exemplary embodiments discussed herein are more generally applicable to using machine learning to learn various types of local models, including shallow and deep models. For example, the exemplary embodiments described herein can be used to learn various models such as logistic regression, support vector machines (SVMs), decision trees, and other neural network architectures.

[0101] As implemented in the exemplary embodiments discussed herein, non-IID collaboration may be more efficient than some other known techniques due to the smaller number of cloud averages (or cloud aggregations) required. Since cloud averages require the use of communication resources (e.g., bandwidth) to communicate between the cloud server and the client, it is generally desirable to reduce the number of cloud averages.

[0102] The exemplary embodiments disclosed herein can be implemented relatively simply, without the need for complex safe arithmetic operators, and / or without requiring significant changes to the operations at the client end.

[0103] The exemplary embodiments described herein can be applied to different applications. For example, although the present invention describes exemplary embodiments in a horizontal federated learning context, the exemplary embodiments discussed herein can be applied to distributed learning or multi-task learning, specifically when non-IID clients are involved.

[0104] Because federated learning enables machine learning without infringing on client privacy, exemplary embodiments of the present invention can be used to learn models through collaboration between machine learning and clients without compromising data privacy. Therefore, the exemplary embodiments disclosed herein can support the practical application of machine learning in privacy-critical settings (e.g., in health settings, or in other contexts where there may be legal obligations to ensure privacy).

[0105] Other applications of the invention include those in the context of autonomous driving (e.g., autonomous vehicles can provide data to learn up-to-date models related to traffic, building, or pedestrian behavior to promote safe driving) or in the context of sensor networks (e.g., a single sensor can perform local learning of the model to avoid sending large amounts of data back to a central server). Other possible applications include those in the context of mobile communications, where lateral federated learning can be used to learn user behavior to improve service and / or increase efficiency (e.g., better management of power usage and / or CPU control). Exemplary embodiments of the invention can also have applications in the context of the Internet of Things (IoT), where the client can be any IoT-enabled device (e.g., IoT-enabled lights, refrigerators, ovens, tables, doors, windows, air conditioners, etc.).

[0106] Although the present invention describes methods and processes by steps performed in a certain order, one or more steps in the methods and processes may be omitted or modified as appropriate. Where appropriate, one or more steps may be performed in an order other than that described.

[0107] Although the invention has been described at least partially in relation to the method, those skilled in the art will understand that the invention also relates to various components, whether hardware components, software, or any combination of both, for performing at least some aspects and features of the described method. Accordingly, the technical solutions of the invention can be embodied in the form of a software product. Suitable software products can be stored in pre-recorded storage devices or other similar non-volatile or non-transitory computer-readable media, including DVDs, CD-ROMs, USB flash drives, removable hard drives, or other storage media. The software product includes instructions tangibly stored therein that enable a processing device (e.g., a personal computer, server, or network device) to perform exemplary embodiments of the methods disclosed herein. Machine-executable instructions can be in the form of code sequences, configuration information, or other data that, when executed, cause a machine (e.g., a processor or other processing device) to perform the steps in the method according to exemplary embodiments of the invention.

[0108] The invention may be embodied in other specific forms without departing from the spirit of the claims. The exemplary embodiments described are to be regarded in all respects as illustrative rather than restrictive. Features selected from one or more of the foregoing embodiments may be combined to create alternative embodiments not explicitly described, and features suitable for such combinations will be understood within the scope of the invention.

[0109] Furthermore, all values ​​and sub-ranges within the scope of the disclosure are disclosed. Additionally, although the systems, devices, and processes disclosed and illustrated herein may include a specific number of elements / components, these systems, devices, and components may be modified to include more or fewer such elements / components. For example, although any disclosed element / component may be a single quantity, the embodiments disclosed herein may be modified to include multiple such elements / components. The subject matter described herein is intended to cover and include all suitable technical variations.

Claims

1. A computing system, comprising: include: Memory; A processing device communicating with the memory, the processing device being configured to execute instructions to cause the computing system to: Multiple local model parameter sets are obtained, and each local model parameter set has been learned at the corresponding client. For each given local model parameter set, calculate one or more cooperation coefficients representing the similarity between the given local model parameter set and each other local model parameter set in the plurality of local model parameter sets; For each given local model parameter set, the given local model parameter set is updated by weighted aggregation of the other local model parameter sets to obtain multiple updated local model parameter sets, wherein the weighted aggregation is calculated using the one or more cooperation coefficients; Provide the multiple updated local model parameter sets to be sent to each corresponding client.

2. The computing system of claim 1, wherein, The processing device is configured to execute instructions to cause the computing system to calculate the one or more cooperation coefficients for each given set of local model parameters through the following steps: Calculate the cosine similarity between the given local model parameter set and each of the other corresponding local model parameter sets in the plurality of local model parameter sets; The cosine similarity value is standardized to obtain the corresponding cooperation coefficient representing the similarity between the given local model parameter set and each of the other corresponding local model parameter sets in the plurality of local model parameter sets.

3. The computing system of claim 1 or 2, wherein, The processing device is used to execute instructions to cause the computing system to perform updates for each given set of local model parameters through the following steps: Calculate the weighted average of the other local model parameter sets, where the weighted average is the weighted aggregation; The weighted average is added to the given local model parameter set.

4. The computing system of claim 1 or 2, wherein, The processing device is used to execute instructions to enable the computing system to: Generate the initial model parameter set; The initial model parameter set is provided to each client so that each client initializes its corresponding local model parameters to the initial model parameter set.

5. The computing system of claim 1 or 2, wherein, The processing device is used to execute instructions to further enable the computing system to obtain the plurality of model parameter sets through the following steps: A request for a corresponding local model parameter set is sent to the agent at each client, the corresponding local model parameter set having been learned at the corresponding client using dedicated data.

6. The computing system of claim 1 or 2, wherein, The iteration is defined by the following steps: obtaining the plurality of local model parameter sets; calculating the one or more cooperation coefficients; performing an update; providing the plurality of updated local model parameter sets, wherein the processing device is used to execute instructions to further enable the computing system to: Repeat the iteration until the predefined convergence condition is met.

7. A method for federated learning across horizons, the method comprising: The method includes: Multiple local model parameter sets are obtained, and each local model parameter set has been learned at the corresponding client. For each given local model parameter set, calculate one or more cooperation coefficients representing the similarity between the given local model parameter set and each other local model parameter set in the plurality of local model parameter sets; For each given local model parameter set, the given local model parameter set is updated by weighted aggregation of the other local model parameter sets to obtain multiple updated local model parameter sets, wherein the weighted aggregation is calculated using the one or more cooperation coefficients; Provide the multiple updated local model parameter sets to be sent to each corresponding client.

8. The method of claim 7, wherein, Calculating the one or more cooperation coefficients includes, for each given set of local model parameters: Calculate the cosine similarity between the given local model parameter set and each of the other corresponding local model parameter sets in the plurality of local model parameter sets; The cosine similarity value is standardized to obtain the corresponding cooperation coefficient representing the similarity between the given local model parameter set and each of the other corresponding local model parameter sets in the plurality of local model parameter sets.

9. The method according to claim 7 or 8, characterized in that, Performing the update involves, for each given set of local model parameters: Calculate the weighted average of the other local model parameter sets, where the weighted average is the weighted aggregation; The weighted average is added to the given local model parameter set.

10. The method according to claim 7 or 8, characterized in that, Also includes: Generate the initial model parameter set; The initial model parameter set is provided to each client so that each client initializes its corresponding local model parameters to the initial model parameter set.

11. The method of claim 7 or 8, wherein, Obtain the multiple local model parameter sets, including: A request for a corresponding local model parameter set is sent to the agent at each client, the corresponding local model parameter set having been learned at the corresponding client using dedicated data.

12. The method of claim 7 or 8, wherein, Iteration is defined by the following steps: obtaining the multiple local model parameter sets; calculating the one or more cooperation coefficients; Perform the update; provide the multiple updated local model parameter sets, the method comprising: Repeat the iteration until the predefined convergence condition is met.

13. A computer readable medium characterized by It stores instructions that, when executed by the processing device of the computing system, cause the computing system to perform the method of any one of claims 7 to 12.

14. A computer program, characterized in that, The instruction includes instructions that, when executed by a processing device of a computing system, cause the computing system to perform the method according to any one of claims 7 to 12.