A Federated Learning Multi-Task Scheduling Method and Apparatus

By subdividing federated machine learning tasks into multiple events, determining priority ratios based on event type and attribute information, and using an adaptive resource allocation algorithm to allocate computing nodes to events, the problems of low resource utilization and slow task execution in federated machine learning systems are solved, achieving efficient and fair task execution.

CN116010051BActive Publication Date: 2026-06-30BEIJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING UNIV OF POSTS & TELECOMM
Filing Date
2022-12-22
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing federated machine learning systems, edge devices have limited computing and communication resources when multiple tasks are executed concurrently, resulting in slow task execution and low resource utilization. Furthermore, existing scheduling methods fail to effectively consider the heterogeneity and dynamic priority of tasks.

Method used

By subdividing federated machine learning tasks into multiple events, determining the event priority ratio based on event type and attribute information, and combining an adaptive resource allocation algorithm to allocate computing nodes to events, high-priority tasks are executed in a timely manner and resource utilization is optimized.

Benefits of technology

It improves the resource utilization and task execution efficiency of the federated machine learning system, reduces the idle resources of edge devices, and achieves fairness and efficiency in task execution.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116010051B_ABST
    Figure CN116010051B_ABST
Patent Text Reader

Abstract

This invention provides a method and apparatus for multi-task scheduling in federated learning. The method involves acquiring a target federated learning task and its current events, wherein the target federated learning task includes multiple events, which are divided into stages based on their execution time and required resources. An event priority ratio is determined based on the attribute information of the target federated learning task and the type of the current event. A cumulative priority is determined based on the event priority ratio and the time the current event entered the scheduling queue. A predefined adaptive resource allocation algorithm is used to allocate computing nodes to the current event, and the current event is executed based on the computing nodes and according to the cumulative priority. This invention ensures fairness in task execution, reduces the overall task execution time of the system, and improves resource utilization.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of federated learning technology, and in particular to a federated learning multi-task scheduling method and apparatus. Background Technology

[0002] With the rapid development of artificial intelligence and big data technologies, data plays a vital role as a production factor. Artificial intelligence requires centralized data processing, but due to growing data privacy concerns, traditional AI is unsuitable for privacy-preserving scenarios. Federated machine learning is a novel distributed machine learning approach that allows machine learning training on privacy-sensitive edge devices without external data transmission, avoiding the privacy risks associated with data centralization to servers. Protecting data privacy through federated machine learning technology enables the further application of artificial intelligence in real-world scenarios. Edge devices store various types of business data and can concurrently participate in multiple federated machine learning training tasks, maximizing the returns on data assets.

[0003] In existing technologies, a candidate scheduling scheme list is initialized, where the candidate scheduling schemes are used to allocate terminal devices for training to each of multiple machine learning tasks. For each candidate scheduling scheme in the list, the candidate scheduling scheme is perturbed to generate a new scheduling scheme. Based on the fitness values ​​of the candidate scheduling schemes and the new scheduling schemes, it is determined whether to replace the candidate scheduling schemes with the new scheduling schemes, generating a new list of scheduling schemes. Based on the fitness values ​​of each new scheduling scheme in the new list, a target scheduling scheme is determined. This existing multi-task scheduling method improves the efficiency of service scheduling and the training efficiency of multi-task federated machine learning. It generates scheduling schemes based on the resource status of terminal devices, but it does not consider the heterogeneity of resources required for subdivided events in federated machine learning tasks and lacks a design for resource adaptation based on the attributes of task subdivision events.

[0004] Federated machine learning can also be approached through the following steps: S1. Constructing a system model for multi-task federated machine learning; S2. Establishing an optimization problem aimed at minimizing the time of the multi-task federated machine learning process; S3. Scheduling devices to participate in the federated machine learning task training process; S4. Transforming the device scheduling process into a multi-armed gambling machine and matching process; S5. Designing a device scheduling algorithm. This existing technology schedules the most suitable device for each task in federated machine learning, thereby minimizing the latency of the multi-task federated machine learning process. While it uses a multi-armed gambling machine and matching method for device scheduling, it only builds a list of available device preferences for different tasks and does not prioritize resource allocation based on the dynamic priority of tasks, resulting in the drawback of high-priority tasks not being able to execute quickly.

[0005] Federated machine learning can also be achieved by deploying federated machine learning application modules on the federated machine learning management platform and the computing devices of federated machine learning participants. This allows participants to join the federation through registration and approval, even when using heterogeneous computing devices. Under the scheduling of the management platform, they can use general federated machine learning application modules adapted to heterogeneous computing devices to perform federated machine learning modeling and prediction tasks. This allows participants willing to join the federation to flexibly and on-demand select computing devices while meeting basic needs, and reduces deployment and maintenance complexity. The existing technology described above applies to the scheduling mechanism of federated machine learning systems on heterogeneous computing devices. It uses general federated machine learning application modules adapted to heterogeneous computing devices to allow participants willing to join the federation to select computing devices based on device attributes, but it does not consider scheduling optimization design for concurrent tasks in federated machine learning.

[0006] In summary, edge devices offer limited computing and communication resources. Furthermore, current federated machine learning systems suffer from uneven workloads on edge devices during concurrent multi-task execution, leading to slow overall task execution and low system resource utilization. Therefore, designing a multi-task scheduling mechanism to achieve efficient concurrent execution of multiple federated machine learning tasks under resource constraints is a pressing issue for current federated machine learning systems. Summary of the Invention

[0007] This invention provides a federated learning multi-task scheduling method and apparatus to solve the above-mentioned problems.

[0008] This invention provides a federated learning multi-task scheduling method, comprising:

[0009] Obtain the target federated learning task and the current events of the target federated learning task, wherein the target federated learning task includes multiple events, and the events are divided into stages based on the event execution time and required resources;

[0010] The event priority ratio is determined based on the attribute information of the target federated learning task and the type of the current event;

[0011] The cumulative priority of the current event is determined based on the event priority ratio and the time when the current event enters the scheduling queue.

[0012] A predefined adaptive resource allocation algorithm is used to allocate computing nodes to the current event, and the current event is executed based on the computing nodes and according to the cumulative priority.

[0013] According to a federated learning multi-task scheduling method provided by the present invention, the step of determining the event priority ratio based on the attribute information of the target federated learning task and the type of the current event includes:

[0014] Obtain the corresponding event weight based on the type of the current event;

[0015] The event priority ratio is calculated based on the event weights and the priority of the target federated learning task in the attribute information.

[0016] According to the federated learning multi-task scheduling method provided by the present invention, the step of calculating the event priority ratio based on the event weights and the priority information of the target federated learning task in the attribute information includes:

[0017] According to the event weight α E(i) The event priority ratio β is calculated from the priority PR of the target federated learning task in the attribute information. i :

[0018]

[0019] Among them, C w The priority weight constants for the target federated learning task.

[0020] According to a federated learning multi-task scheduling method provided by the present invention, after determining the event priority ratio based on the attribute information of the target federated learning task and the type of the current event, the method further includes:

[0021] The queue number of the current event in the scheduling queue is determined based on the event priority ratio.

[0022] Accordingly, determining the cumulative priority of the current event based on the event priority ratio and the time the current event entered the scheduling queue includes:

[0023] Get the time when the current event entered the scheduling queue;

[0024] The cumulative priority is calculated based on the queue number and the time.

[0025] According to a federated learning multi-task scheduling method provided by the present invention, the step of determining the queue number of the current event in the scheduling queue based on the event priority ratio includes:

[0026] According to the event priority ratio β i Determine the queue number Q of the current event in the scheduling queue. i :

[0027]

[0028] Where, β threshold N is the event priority threshold. queue This is the threshold for the queue number of the scheduling queue.

[0029] Accordingly, the step of calculating the cumulative priority based on the queue number and the time includes:

[0030] According to the queue number Q i and the time t i The cumulative priority P is calculated. i (t):

[0031]

[0032] Where t is the current time, t i The time when each event arrives in the scheduling sequence.

[0033] According to a federated learning multi-task scheduling method provided by the present invention, the step of executing the current event based on the computing node and according to the cumulative priority includes:

[0034] If at least two events have the same cumulative priority, the order in which the current event is executed is determined based on the time when the current event enters the scheduling queue.

[0035] The current event is executed based on the computing node and the order.

[0036] According to a federated learning multi-task scheduling method provided by the present invention, the step of allocating computing nodes to the current event using a predefined adaptive resource allocation algorithm includes:

[0037] The training time baseline is determined based on historical computing resource information and historical dataset information;

[0038] The training time range is determined based on the training time benchmark and the time threshold, and the available computing nodes are determined based on the training time range. All available computing nodes constitute a computing resource set.

[0039] Each available computing node in the computing resources is scored, and the available computing nodes are sorted according to the scoring results to obtain a sorted set of computing resources.

[0040] Assign computing nodes to the current event based on the sorted set of computing resources.

[0041] According to a federated learning multi-task scheduling method provided by the present invention, the sorted set of computing resources is obtained by sorting the available computing nodes from largest to smallest according to the scoring results;

[0042] Accordingly, the step of allocating computing nodes to the current event based on the sorted set of computing resources includes:

[0043] The available computing nodes in the sorted computing resource set are sequentially assigned to the current event until the number of available computing nodes assigned to the current event is equal to a preset threshold or there are no available computing nodes to be assigned in the sorted computing resource set.

[0044] According to the federated learning multi-task scheduling method provided by the present invention, the current event is one of the following: a local training event, a model evaluation event, or a global model update event of a machine learning task.

[0045] The present invention also provides a multi-task scheduling device, comprising:

[0046] The task and event acquisition module is used to acquire the target federated learning task and the current events of the target federated learning task. The target federated learning task includes multiple events, which are obtained by dividing the target federated learning task into stages based on the event execution time and required resources.

[0047] The event priority ratio determination module is used to determine the event priority ratio based on the attribute information of the target federated learning task and the type of the current event.

[0048] The cumulative priority determination module is used to determine the cumulative priority of the current event based on the event priority ratio and the time when the current event enters the scheduling queue.

[0049] The node allocation and execution module is used to allocate computing nodes to the current event using a predefined adaptive resource allocation algorithm, and to execute the current event based on the computing nodes and according to the cumulative priority.

[0050] The federated learning multi-task scheduling method and apparatus provided by this invention subdivides federated machine learning tasks into multiple federated machine learning events, each with a corresponding event weight. By combining task attribute information and event waiting time factors, the current events are dynamically prioritized and a queue of tasks to be executed is generated, ensuring fairness in the task execution process and solving the problem of long concurrent task execution time in federated machine learning systems. Furthermore, an adaptive resource allocation algorithm is used to rationally allocate computing resources to events, reducing idle resources on edge devices and improving system resource utilization. Attached Figure Description

[0051] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0052] Figure 1 This is a schematic diagram of the architecture of the multi-task scheduling mechanism provided in an embodiment of the present invention;

[0053] Figure 2 This is a flowchart illustrating the federated learning multi-task scheduling method provided in an embodiment of the present invention;

[0054] Figure 3 This is a comparison chart of event execution times provided in an embodiment of the present invention;

[0055] Figure 4 This is a comparison chart of event throughput provided in the embodiments of the present invention;

[0056] Figure 5 This is a schematic diagram of the structure of a multi-task scheduling device provided in an embodiment of the present invention;

[0057] Figure 6 This is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0058] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0059] In federated machine learning training, each communication round requires multiple communication negotiations between the computing nodes and the aggregation nodes. This leads to idle computing resources while nodes wait for these negotiations, and also idle communication resources during the training process. Therefore, this invention divides federated machine learning tasks into events, subdividing each round into various federated machine learning events based on different computing and communication resource utilization rates. A fine-grained federated machine learning multi-task scheduling mechanism is designed for different events, improving the resource utilization of edge devices. Due to the heterogeneity of computing resources, differences in data assets, and dispersed locations of edge devices, this invention designs the resource allocation of the federated machine learning multi-task scheduling mechanism by combining the data volume, computing power, and communication capabilities of the computing nodes, thereby reducing the overall execution time of concurrent tasks in the federated machine learning system. Through the federated machine learning multi-task scheduling mechanism, the problems of long execution time and low resource utilization in concurrent tasks within the federated machine learning system are solved, providing an efficient training environment for federated machine learning tasks.

[0060] Before introducing the federated learning multi-task scheduling method of the present invention, the hardware devices involved in the federated learning multi-task scheduling method will be described.

[0061] Figure 1 This is a schematic diagram of the architecture of the multi-task scheduling mechanism provided in an embodiment of the present invention; as shown... Figure 1 As shown, the architecture of the multi-task scheduling mechanism mainly includes distributed computing nodes, aggregation servers, and scheduling servers.

[0062] Distributed computing nodes refer to various edge privacy-sensitive devices in federated machine learning scenarios. They are data holders and actual computing nodes that perform local model training tasks in the federated machine learning process. They have differences in data assets and heterogeneity in computing and communication resources. They are mainly used to perform computationally intensive federated machine learning events, such as training events and model testing events.

[0063] An aggregation server aggregates federated machine learning tasks and is typically a node participating in the task. It mainly performs communication-intensive federated machine learning tasks, such as model distribution events and model aggregation events.

[0064] The scheduling server provides resource registration and task scheduling services.

[0065] Distributed computing nodes register computing resources, communication resources, data resources, and task resources with the management terminal through a resource registration interface. The management terminal receives, updates, and maintains the registration information. The management terminal manages the lifecycle of tasks through a scheduling service, prioritizing tasks and allocating available resources.

[0066] Under the aforementioned multi-task scheduling mechanism architecture, the federated learning multi-task scheduling method is implemented. Simply put, the execution order of ready tasks is first determined based on the cumulative priority queue of tasks; then, appropriate federated machine learning tasks are selected, resources are allocated, and task execution begins. The federated learning multi-task scheduling method improves resource utilization and event throughput.

[0067] Figure 2 This is a flowchart illustrating the federated learning multi-task scheduling method provided in an embodiment of the present invention; as shown below. Figure 2 As shown, this federated learning multi-task scheduling method includes:

[0068] S101, Obtain the target federated learning task and the current events of the target federated learning task.

[0069] The target federated learning task comprises multiple events. These events are divided into stages based on their execution time and required resources (or differences in computational and communication resource utilization). For example, a federated machine learning task might include local model training, model evaluation, and global model update events. This event-based approach meticulously divides the entire federated learning task's execution process, allowing for fine-grained scheduling and resource allocation based on these subdivided events. The event division can be further refined, but this invention does not limit this approach.

[0070] In this step, we first obtain the target federated learning task, that is, the federated learning task to be executed, and then determine the current event of the target federated learning task, that is, which stage the target federated learning task has been executed to. The current event can be any one of the local training event, model evaluation event or global model update event mentioned above.

[0071] S102, determine the event priority ratio based on the attribute information of the target federated learning task and the type of the current event.

[0072] In this step, the attribute information of the target federated learning task refers to the task description, such as task priority, task data information, etc. Different current event types have different event weights, meaning that under the same task, different execution stages correspond to different execution priorities. Based on the attribute information mentioned above and the determined current event type, the event priority ratio corresponding to the current event is obtained, thereby ensuring that high-priority tasks can run in a timely manner. Furthermore, to treat each task fairly, an event weight shadow is added to the task to achieve fair sharing among tasks.

[0073] S103, determine the cumulative priority of the current event based on the event priority ratio and the time when the current event enters the scheduling queue.

[0074] S104, use a predefined adaptive resource allocation algorithm to allocate computing nodes to the current event, and execute the current event based on the computing nodes and according to the cumulative priority.

[0075] In this step, the call order of the current event is characterized by the determined cumulative priority, and then the adaptive resource allocation algorithm is used to allocate computing nodes to the current event. The current event is then executed according to the allocated computing nodes and the cumulative priority.

[0076] The federated learning multi-task scheduling method provided in this invention subdivides federated machine learning tasks into multiple federated machine learning events, each with a corresponding event weight. It dynamically accumulates the priority of current events by combining task attribute information and event waiting time, generating a queue of tasks to be executed. This ensures fairness in task execution and solves the problem of long concurrent task execution times in federated machine learning systems. Furthermore, it utilizes an adaptive resource allocation algorithm to rationally allocate computing resources to events, reducing idle resources on edge devices and improving system resource utilization.

[0077] In some embodiments of the present invention, determining the event priority ratio based on the attribute information of the target federated learning task and the type of the current event includes:

[0078] Obtain the corresponding event weight α based on the type E(i) of the current event i. E(i) Among them, the event weight α E(i) Pre-set according to the event type.

[0079] According to the event weight α E(i) The event priority ratio β is calculated from the priority PR of the target federated learning task in the attribute information. i :

[0080]

[0081] Among them, C w The priority weight constants for the target federated learning task.

[0082] The federated learning multi-task scheduling method provided in this invention determines the event priority ratio of the current event by using the priority of the task and the event weight, thereby ensuring that high-priority tasks can run in a timely manner. In order to treat each task fairly, an event weight shadow is added to the task to achieve fair sharing among tasks.

[0083] In some embodiments of the present invention, after determining the event priority ratio based on the attribute information of the target federated learning task and the type of the current event, the method further includes:

[0084] According to the event priority ratio β i Determine the queue number Q of the current event in the scheduling queue. i :

[0085]

[0086] Where, β threshold N is the event priority threshold. queue This is the threshold for the queue number of the scheduling queue.

[0087] Accordingly, determining the cumulative priority of the current event based on the event priority ratio and the time the current event entered the scheduling queue includes:

[0088] According to the queue number Q i and the time t i The cumulative priority P is calculated. i (t):

[0089]

[0090] Where t is the current time, t i The time when each event arrives in the scheduling sequence.

[0091] Specifically, the time it takes for each event to arrive at the scheduling queue is t. i Then the time when the first event arrives is t1, and the time when the i-th event arrives is t. i Then the cumulative priority P of the i-th event i (t) is Q i (tt i ).

[0092] In some embodiments of the present invention, executing the current event based on the computing node and according to the cumulative priority includes:

[0093] If at least two events have the same cumulative priority, the order in which the current event is executed is determined based on the time when the current event enters the scheduling queue.

[0094] The current event is executed based on the computing node and the order.

[0095] Specifically, the event with the highest cumulative priority in the current event queue, as determined by the cumulative priority, is selected as the currently executable event in the system and execution begins. Additionally, during the actual determination of cumulative priorities, situations may arise where events have the same cumulative priority. For events with the same cumulative priority, their scheduling order is determined based on the time they entered the scheduling queue.

[0096] After obtaining the cumulative priority of events using the above methods, it is necessary to select appropriate resources for the execution of events through resource allocation. By allocating appropriate computing resources to events, the overall system resources can be managed.

[0097] Assume that there are N data holders registered as computing resources in the federated machine learning multi-task scheduling system. Each communication round of a federated machine learning task requires at least M data holders to participate in training, where N≥M. The current resource allocation mechanism obtains the available resource information of each node through the computing resource records in the federated scheduler, and selects the M nodes with the richest available resources for federated machine learning tasks to process federated task events.

[0098] To fully utilize the system's available resources and improve system performance, it is necessary to reduce node latency. Ideally, if the computational performance of all nodes assigned to a federated task is matched to their data volume, and each node has a similar training time, then the latency T of each node is eliminated. wait That is, T wait =0, at which point the model aggregation node does not need to wait too long, thereby improving the system resource utilization and reducing the total training time of the task.

[0099] Based on the above ideas, this invention designs a resource allocation process for a multi-task scheduling mechanism. In simple terms, firstly, this process allocates computing resources to federated machine learning tasks, striving to ensure that each training node has similar training time, which helps reduce the total federated training time. Secondly, to fully utilize system resources, training nodes immediately release computing resources and update the corresponding resource records in the federated scheduler after completing their current task. The system can then reallocate computing resources, reducing device waiting time and improving computing resource utilization. The detailed adaptive resource allocation algorithm is as follows:

[0100] The training time baseline T is determined based on historical computing resource information and historical dataset information. base :

[0101]

[0102] In the formula, d i Let C be the average training time per sample on the i-th device. data Let N be the size of the dataset used for training by the i-th device, and N be the total number of nodes used for training.

[0103] Based on the training time benchmark T base and time threshold T length Determine the training time range (T) base -T length ,T base +T length Based on the training time range, available computing nodes are determined, and all available computing nodes constitute a computing resource set.

[0104] Each available computing node in the computing resources is scored, and the available computing nodes are sorted according to the scoring results to obtain a sorted set of computing resources.

[0105] Assign computing nodes to the current event based on the sorted set of computing resources.

[0106] The federated learning multi-task scheduling method provided in this embodiment of the invention introduces an adaptive resource allocation method to match computing nodes with similar current event execution times for federated machine learning tasks, thereby reducing device waiting time during event execution and improving system resource utilization.

[0107] In some embodiments of the present invention, the sorted computing resource set Set is obtained by sorting the available computing nodes from largest to smallest according to the scoring results.

[0108] Accordingly, the step of allocating computing nodes to the current event based on the sorted set of computing resources includes:

[0109] Available computing nodes from the sorted set of computing resources are sequentially allocated to the current event until the number of available computing nodes allocated to the current event equals a preset threshold M or there are no available computing nodes to allocate in the sorted set of computing resources. The preset threshold M is determined based on the computing resources required by the current event.

[0110] Specifically, computing nodes i are selected and added to the training node identifier vector v of the current event according to the computing resource scores from largest to smallest, i.e. v: v+i, until |v|==M or there are no nodes to choose from in the computing resource set, where |v| refers to the number of computing nodes currently added to v, that is, the number of available computing nodes allocated to the current event. When the number of selected nodes reaches the required M, the resource allocation process is completed.

[0111] The above is the case where the number of available computing nodes in the computing resource set is greater than the resources required for the current event. When there are no available computing nodes to allocate in the sorted computing resource set, that is, when |v| < M, and the number of computing nodes already added to v is less than the required M, computing nodes still need to be allocated for this event. To reduce the waiting time of the aggregation node, within the effective training time range (0, T base -T length ), continue to select node i in the order of decreasing computing resource scores, and end when v: v + i and |v| == M.

[0112] The present invention also provides an embodiment. The federated learning multi-task scheduling method includes the following steps:

[0113] Step 1, obtain the federated learning event;

[0114] Step 2, update the federated learning task identifier after the federated learning event enters the scheduling end;

[0115] Step 3, calculate the cumulative priority ratio of the event, and obtain the event priority queue number Q based on the conversion of the cumulative priority ratio of the event i ;

[0116] Step 4, obtain Q according to the event priority queue number Q i and get Q i .push_back(event i );

[0117] Step 5, initialize the computing node identifier vector v;

[0118] Step 6, when the event priority queue is not empty, read the event dataset description and generate a computing node set;

[0119] Step 7, calculate T according to and obtain T base ;

[0120] Step 8, select nodes with computing load estimates within the range of (T base -T length , T base +T length ) to generate an available computing node set Set;

[0121] Step 9, sort Set according to resource richness;

[0122] Step 10, when |v| < M and there is an available computing node k in Set that has not been selected, add the k node in Set to the training node identifier vector v, that is, v: v + Set(k);

[0123] Step 11: Execute the event based on the compute node identifier vector v and the event accumulation priority.

[0124] In this embodiment, the experimental environment for the federated multitasking mechanism consists of one 8-core, 8GB server and ten Raspberry Pi 4Bs. The server's CPU has a clock speed of 2.50GHz and serves as the scheduler for deploying the federated machine learning multitasking mechanism. The Raspberry Pis have CPUs with a clock speed of 1.5GHz and 4GB of memory and serve as clients for deploying the federated machine learning multitasking mechanism, storing local datasets CIFAR-10 and MNIST.

[0125] The server hosting the scheduling terminal uses the Ubuntu system, the Raspberry Pi uses the Raspberry Pi OS (32-bit) system, the scheduling algorithm is implemented in Go, and the federated machine learning algorithm is implemented in Python.

[0126] Furthermore, federated machine learning tasks are divided into local training events, model evaluation events, and global model update events for scheduling. Experiments are conducted to obtain the execution time of federated machine learning events and the system event throughput under multi-task concurrent execution conditions, thus evaluating the performance of the multi-task scheduling mechanism.

[0127] Figure 3 This is a comparison chart of event execution times provided in an embodiment of the present invention; Figure 3 The comparison shows the execution time of federated machine learning events with and without a scheduling mechanism when different numbers of events are running simultaneously, ranging from 3 to 7. Figure 3 (a)-(c) represent the comparison of execution times for local training events, model evaluation events, and global model update events, respectively.

[0128] from Figure 3 It can be seen that during the execution of local training events, the proposed scheduling mechanism (Scheduled) can effectively reduce the average execution time of local training events within the system compared to the unscheduled mechanism. As the number of training events executed in parallel on the devices increases, the execution time of the events continuously increases. The increase in the execution time of training events in the federated machine learning process using the scheduling mechanism is lower than that without the scheduling mechanism. In the comparison between model evaluation events and global update events, the task scheduling mechanism has a smaller effect because these events require fewer resources and have faster execution times. During federated machine training, to reduce task execution time, only a certain proportion of devices are selected for training models at a time, and slow-running models are no longer received. This results in wasted resources and slow device operation. The scheduling mechanism selects devices with abundant available resources for event allocation, effectively reducing the average runtime of federated machine learning events.

[0129] Figure 4 This is a comparison chart of event throughput provided in the embodiments of the present invention; Figure 4 (a)-(c) represent a comparison of the event throughput of local training events, model evaluation events, and global model update events, respectively. Figure 4 As is known, the system's training event throughput was improved after the task scheduling mechanism was implemented. As the number of parallel execution events increased, the throughput of parallel events decreased. However, the task scheduling mechanism can still achieve a high throughput.

[0130] The multi-task scheduling device provided by the present invention is described below. The multi-task scheduling device described below can be referred to in correspondence with the federated learning multi-task scheduling method described above.

[0131] Figure 5 This is a schematic diagram of the structure of the multi-task scheduling device provided in an embodiment of the present invention, as shown below. Figure 5 As shown, the multi-task scheduling device includes a task and event acquisition module 501, an event priority ratio determination module 502, a cumulative priority determination module 503, and a node allocation and execution module 504.

[0132] The task and event acquisition module 501 is used to acquire the target federated learning task and the current events of the target federated learning task.

[0133] The target federated learning task comprises multiple events. These events are divided into stages based on their execution time and required resources (or differences in computational and communication resource utilization). For example, a federated machine learning task might include local model training, model evaluation, and global model update events. This event-based approach meticulously divides the entire federated learning task's execution process, allowing for fine-grained scheduling and resource allocation based on these subdivided events. The event division can be further refined, but this invention does not limit this approach.

[0134] In this module, the target federated learning task is first obtained, that is, the federated learning task to be executed. Then, the current event of the target federated learning task is determined, that is, which stage the target federated learning task has been executed to. The current event can be any one of the local training event, model evaluation event or global model update event mentioned above.

[0135] The event priority ratio determination module 502 is used to determine the event priority ratio based on the attribute information of the target federated learning task and the type of the current event.

[0136] In this module, the attribute information of the target federated learning task refers to the task description, such as task priority and task data. Different current event types have different event weights; that is, under the same task, different execution stages correspond to different execution priorities. Based on the aforementioned attribute information and the determined current event type, the event priority ratio corresponding to the current event is obtained, thereby ensuring that high-priority tasks can run in a timely manner. Furthermore, to treat each task fairly, event weight shadows are added to the tasks to achieve fair sharing among tasks.

[0137] The cumulative priority determination module 503 is used to determine the cumulative priority of the current event based on the event priority ratio and the time when the current event enters the scheduling queue.

[0138] The node allocation and execution module 504 is used to allocate computing nodes to the current event using a predefined adaptive resource allocation algorithm, and execute the current event based on the computing nodes and according to the cumulative priority.

[0139] In this step, the call order of the current event is characterized by the determined cumulative priority, and then the adaptive resource allocation algorithm is used to allocate computing nodes to the current event. The current event is then executed according to the allocated computing nodes and the cumulative priority.

[0140] The multi-task scheduling device provided in this invention subdivides federated machine learning tasks into multiple federated machine learning events, each with a corresponding event weight. It dynamically accumulates the priority of current events by combining task attribute information and event waiting time, generating a queue of tasks to be executed. This ensures fairness in the task execution process and solves the problem of long concurrent task execution times in federated machine learning systems. Furthermore, it utilizes an adaptive resource allocation algorithm to rationally allocate computing resources to events, reducing idle resources on edge devices and improving system resource utilization.

[0141] Figure 6 This is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention, such as... Figure 6As shown, the electronic device may include: a processor 610, a communication interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communication interface 620, and the memory 630 communicate with each other through the communication bus 640. The processor 610 can call logical instructions in the memory 630 to execute a federated learning multi-task scheduling method. This federated learning multi-task scheduling method includes: obtaining a target federated learning task and the current events of the target federated learning task, wherein the target federated learning task includes multiple events, and the events are divided into stages based on the event execution time and required resources; determining an event priority ratio based on the attribute information of the target federated learning task and the type of the current event; determining the cumulative priority corresponding to the current event based on the event priority ratio and the time when the current event enters the scheduling queue; allocating computing nodes to the current event using a predefined adaptive resource allocation algorithm; and executing the current event based on the computing nodes and according to the cumulative priority.

[0142] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0143] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program implements a federated learning multi-task scheduling method. This federated learning multi-task scheduling method includes: acquiring a target federated learning task and its current events, wherein the target federated learning task includes multiple events, and the events are divided into stages based on event execution time and required resources; determining an event priority ratio based on the attribute information of the target federated learning task and the type of the current event; determining the cumulative priority corresponding to the current event based on the event priority ratio and the time the current event entered the scheduling queue; allocating computing nodes to the current event using a predefined adaptive resource allocation algorithm; and executing the current event based on the computing nodes and according to the cumulative priority.

[0144] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0145] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods of various embodiments or some parts of embodiments.

[0146] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A federated learning multi-task scheduling method, characterized in that, include: Obtain the target federated learning task and the current events of the target federated learning task, wherein the target federated learning task includes multiple events, and the events are divided into stages based on the event execution time and required resources; The event priority ratio is determined based on the attribute information of the target federated learning task and the type of the current event; The cumulative priority of the current event is determined based on the event priority ratio and the time when the current event enters the scheduling queue. A predefined adaptive resource allocation algorithm is used to allocate computing nodes to the current event, and the current event is executed based on the computing nodes and according to the cumulative priority.

2. The federated learning multi-task scheduling method according to claim 1, characterized in that, The step of determining the event priority ratio based on the attribute information of the target federated learning task and the type of the current event includes: Obtain the corresponding event weight based on the type of the current event; The event priority ratio is calculated based on the event weights and the priority of the target federated learning task in the attribute information.

3. The federated learning multi-task scheduling method according to claim 2, characterized in that, The step of calculating the event priority ratio based on the event weights and the priority information of the target federated learning task in the attribute information includes: According to the event weight α E(i) The event priority ratio β is calculated from the priority PR of the target federated learning task in the attribute information. i : Among them, C w The priority weight constants for the target federated learning task.

4. The federated learning multi-task scheduling method according to claim 1, characterized in that, After determining the event priority ratio based on the attribute information of the target federated learning task and the type of the current event, the method further includes: The queue number of the current event in the scheduling queue is determined based on the event priority ratio. Accordingly, determining the cumulative priority of the current event based on the event priority ratio and the time the current event entered the scheduling queue includes: Get the time when the current event entered the scheduling queue; The cumulative priority is calculated based on the queue number and the time.

5. The federated learning multi-task scheduling method according to claim 4, characterized in that, Determining the queue number of the current event in the scheduling queue based on the event priority ratio includes: According to the event priority ratio β i Determine the queue number Q of the current event in the scheduling queue. i : Where, β threshold N is the event priority threshold. queue The queue number threshold for the scheduling queue; Accordingly, the step of calculating the cumulative priority based on the queue number and the time includes: According to the queue number Q i and the time t i The cumulative priority P is calculated. i (t): Where t is the current time, t i The time when each event arrives in the scheduling sequence.

6. The federated learning multi-task scheduling method according to claim 1, characterized in that, The execution of the current event based on the computing node and according to the cumulative priority includes: If at least two events have the same cumulative priority, the order in which the current event is executed is determined based on the time when the current event enters the scheduling queue. The current event is executed based on the computing node and the order.

7. The federated learning multi-task scheduling method according to claim 1, characterized in that, The process of allocating computing nodes to the current event using a predefined adaptive resource allocation algorithm includes: The training time baseline is determined based on historical computing resource information and historical dataset information; The training time range is determined based on the training time benchmark and the time threshold, and the available computing nodes are determined based on the training time range. All available computing nodes constitute a computing resource set. Each available computing node in the computing resources is scored, and the available computing nodes are sorted according to the scoring results to obtain a sorted set of computing resources. Assign computing nodes to the current event based on the sorted set of computing resources.

8. The federated learning multi-task scheduling method according to claim 7, characterized in that, The sorted set of computing resources is obtained by sorting the available computing nodes from largest to smallest according to the scoring results; Accordingly, the step of allocating computing nodes to the current event based on the sorted set of computing resources includes: The available computing nodes in the sorted computing resource set are sequentially assigned to the current event until the number of available computing nodes assigned to the current event is equal to a preset threshold or there are no available computing nodes to be assigned in the sorted computing resource set.

9. The federated learning multi-task scheduling method according to any one of claims 1-8, characterized in that, The current event is one of the following: a local training event, a model evaluation event, or a global model update event for a machine learning task.

10. A multi-task scheduling device, characterized in that, include: The task and event acquisition module is used to acquire the target federated learning task and the current events of the target federated learning task. The target federated learning task includes multiple events, which are obtained by dividing the target federated learning task into stages based on the event execution time and required resources. The event priority ratio determination module is used to determine the event priority ratio based on the attribute information of the target federated learning task and the type of the current event. The cumulative priority determination module is used to determine the cumulative priority of the current event based on the event priority ratio and the time when the current event enters the scheduling queue. The node allocation and execution module is used to allocate computing nodes to the current event using a predefined adaptive resource allocation algorithm, and to execute the current event based on the computing nodes and according to the cumulative priority.