Multi-modal federated learning method and apparatus

By employing a multi-modal federated learning approach, datasets with single or multiple participants are differentiated, a corresponding algorithm database is established, task types are identified, and appropriate algorithms are invoked. This approach addresses the issues of resource waste and sharing of modeling results, thereby improving the efficiency and flexibility of federated learning.

CN115130682BActive Publication Date: 2026-06-19LANXIANG ZHILIAN (HANGZHOU) TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
LANXIANG ZHILIAN (HANGZHOU) TECH CO LTD
Filing Date
2022-06-29
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Enterprises face significant resource waste and are unable to share intermediate modeling results when deploying local and federated modeling.

Method used

This paper presents a multi-modal federated learning method. By establishing an algorithm database for single and multiple participants, it identifies the task types of the task flow and calls the corresponding algorithms from different algorithm databases based on the task type, thereby realizing unilateral and federated modeling.

🎯Benefits of technology

It reduces resource waste, improves the efficiency of federated learning, enables the sharing of intermediate modeling results within the same system, and reduces system coupling and debugging difficulty.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115130682B_ABST
    Figure CN115130682B_ABST
Patent Text Reader

Abstract

This disclosure presents a multi-mode federated learning method and apparatus, comprising: establishing an algorithm database for a single participant and an algorithm database for multiple participants for the federated learning modeling phase; obtaining a currently established task flow for federated learning from an interactive interface; identifying the task flow and determining the task type for the modeling phase, wherein the task type includes task types for a single participant and task types for multiple participants; and calling corresponding algorithms from different algorithm databases in the task flow based on the determined task type. By executing different modes of modeling algorithms for different task flows, resource waste is reduced, the efficiency of federated learning is improved, and the technical problem of enterprises typically deploying two modeling systems, local modeling and federated modeling, simultaneously in related technologies is overcome, which not only wastes resources but also prevents the sharing of intermediate modeling results.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of data processing technology, specifically to a multi-mode federated learning method and apparatus. Background Technology

[0002] With the rapid advancement of AI technology, data assets have gradually become an indispensable factor of production. However, the ensuing data security and data silo issues have become persistent and unavoidable problems in the AI ​​field. Regulation has become a double-edged sword—relaxed regulation leads to data security problems, while stricter regulation results in data silos, slowing down the development of AI. To address this issue, the security technology field has combined modern cryptography to propose new concepts such as multi-party secure computation, federated learning, and privacy computation. Practice has proven that these new technologies can quickly and effectively achieve "data usable but not visible," thus effectively solving data security problems. However, local modeling is equally important.

[0003] In related technologies, enterprises typically deploy two modeling systems simultaneously: local modeling and federated modeling. This not only wastes resources but also prevents the sharing of intermediate modeling results. Summary of the Invention

[0004] The main purpose of this disclosure is to provide a multimodal federated learning method and apparatus.

[0005] To achieve the above objectives, according to a first aspect of this disclosure, a multi-mode federated learning method is provided, comprising: establishing an algorithm database for a single participant and an algorithm database for multiple participants for the federated learning modeling phase; obtaining a currently established task flow for federated learning from an interactive interface; identifying the task flow and determining the task type of the modeling phase therein, wherein the task type includes the task type of a single participant and the task type of multiple participants; and calling the corresponding algorithm from different algorithm databases in the task flow based on the determined task type.

[0006] Optionally, identifying the task flow and determining the task type of the modeling phase includes: if the data in the task flow indicates local data, the task type is a single-participant task type; if the data in the task flow indicates multi-party data, the task type is a multi-participant task type.

[0007] Optionally, the method further includes: abstracting the datasets of each participant to obtain two dataset types: a single-participant dataset with only one participant and a federated dataset with multiple participants, wherein task flows can be established under the domains of different types of datasets.

[0008] Optionally, identifying the task flow and determining the task type of the modeling phase includes: identifying the task flow and determining the type of the dataset corresponding to it.

[0009] Optionally, a task flow can be established in the domain of different types of datasets, including: when a task component in the canvas of the interactive interface is triggered, the component is invoked, wherein each component corresponds to its own modeling algorithm; based on the business objectives of the participants, the connection relationship between each component is established to obtain the task flow.

[0010] Optionally, the algorithm database for a single participant and the algorithm database for multiple participants both contain multiple types of algorithms, including classification algorithms, statistical analysis algorithms, data preprocessing, and feature processing. Each type contains multiple algorithms, and each algorithm corresponds to a task component in the canvas.

[0011] According to a second aspect of this disclosure, a multi-mode federated learning apparatus is provided, comprising: a mode determination unit configured to establish an algorithm database for a single participant and an algorithm database for multiple participants for a federated learning modeling phase; an interaction unit configured to obtain a currently established task flow for federated learning from an interactive interface; a routing unit configured to identify the task flow and determine the task type of the modeling phase therein, wherein the task type includes a task type for a single participant and a task type for multiple participants; and a modeling unit configured to call corresponding algorithms from different algorithm databases in the task flow based on the determined task type.

[0012] Optionally, the apparatus further includes a data abstraction unit that abstracts the datasets of each participant to obtain two dataset types: a single-party dataset with only one participant and a federated dataset with multiple participants. Task flows can be established under the domains of different types of datasets.

[0013] According to a third aspect of this disclosure, a computer-readable storage medium is provided storing computer instructions for causing the computer to perform the multi-mode federated learning method described in any implementation of the first aspect.

[0014] According to a fourth aspect of this disclosure, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to said at least one processor; wherein the memory stores a computer program executable by said at least one processor, said computer program being executed by said at least one processor to cause said at least one processor to perform the multi-mode federated learning method described in any implementation of the first aspect.

[0015] The multi-mode federated learning method and apparatus disclosed herein includes: establishing an algorithm database for a single participant and an algorithm database for multiple participants for the federated learning modeling phase; obtaining a currently established task flow for federated learning from an interactive interface; identifying the task flow and determining the task type for the modeling phase, wherein the task type includes the task type for a single participant and the task type for multiple participants; and calling the corresponding algorithm from different algorithm databases in the task flow based on the determined task type. By executing different modeling algorithms for different task flows, resource waste is reduced, the efficiency of federated learning is improved, and the technical problem in related technologies where enterprises typically deploy two modeling systems simultaneously—local modeling and federated modeling—not only wastes resources but also makes it impossible to share intermediate modeling results. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the specific embodiments of this disclosure or the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0017] Figure 1 This is a flowchart of a multi-modal federated learning method according to an embodiment of the present disclosure;

[0018] Figure 2 This is a schematic diagram of the system architecture corresponding to the multi-mode federated learning method according to the embodiments of this disclosure;

[0019] Figure 3 This is a schematic diagram of an electronic device according to an embodiment of the present disclosure. Detailed Implementation

[0020] To enable those skilled in the art to better understand the present disclosure, the technical solutions of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments. Obviously, the described embodiments are only some embodiments of the present disclosure, and not all embodiments. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present disclosure.

[0021] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate for the embodiments of this disclosure described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0022] It should be noted that, unless otherwise specified, the embodiments and features described in this disclosure can be combined with each other. This disclosure will now be described in detail with reference to the accompanying drawings and embodiments.

[0023] According to embodiments of this disclosure, a multi-modal federated learning method is provided, such as... Figure 1 As shown, the method includes the following steps 101 to 104:

[0024] Step 101: For the federated learning modeling phase, establish an algorithm database for a single participant and an algorithm database for multiple participants.

[0025] In this embodiment, the algorithm database of a single participant can be used for local modeling with local data participating in computation; the algorithm databases of multiple participants can be used for joint modeling with both local and remote data participating in computation. It is understood that, depending on business needs, the algorithm database of a single participant can also be used for remote modeling with remote data from other participants participating in computation.

[0026] Step 102: Obtain the currently established task flow for federated learning from the interactive interface.

[0027] In this embodiment, the user can obtain the established task flow through the user interface. The task flow includes multiple task nodes, each of which may correspond to a modeling algorithm. The task nodes can be connected in series and / or in parallel to form the task flow. For example, the user can call up task nodes by dragging and dropping in the interface and check the relationships between task nodes.

[0028] Step 103: Identify the task flow and determine the task type in the modeling stage, wherein the task type includes the task type of a single participant and the task type of multiple participants.

[0029] In this embodiment, the task type of a single participant can be a modeling task of local data or a modeling task of data of other single participants; the task type of multiple participants can be a modeling task of local data, data of other participants, or a modeling task of data of multiple other participants.

[0030] As an optional implementation of this embodiment, identifying the task flow and determining the task type in the modeling stage includes: if the data in the task flow indicates local data, then the task type is a single-participant task type; if the data in the task flow indicates multi-party data, then the task type is a multi-participant task type.

[0031] In this optional implementation, the data can be pre-identified using participant IDs. If the identifier of the data to be processed indicated in the task flow is a local ID, then the task type can be determined to be a single-participant task type. If the identifier of the data to be processed indicated in the task flow includes the local ID, the IDs of other participants, or the IDs of multiple other participants, then the task type can be determined to be a multi-participant task type. It is understood that the ID can be information that uniquely identifies the participant, such as ID card information or business license information, etc., and this information can be determined according to business requirements.

[0032] As an optional implementation of this embodiment, the method further includes: abstracting the data of each participant to obtain two dataset types: a single-participant dataset with only one participant and a federated dataset with multiple participants. Task flows can be established under the domains of different dataset types. In this embodiment, abstraction transforms concrete data into business objects that can be processed by a computer. For example, in this embodiment, data is abstracted into task flows, and the modeling process is performed based on these task flows.

[0033] In this optional implementation, data from different participants can be categorized, grouping data belonging to the same participant into a single dataset. Furthermore, participant data can be classified according to different dimensions. After categorization, datasets can be grouped based on business needs: datasets requiring only single-party modeling can be categorized as single-party datasets, while datasets requiring multiple participants for federated learning can be categorized as multi-party datasets. Each participant can have a unique ID identifier, allowing identification of which participant the data belongs to.

[0034] Furthermore, the abstracted datasets can be used to establish task flows within their respective domains. Specifically, the modeling process can include data preprocessing, feature engineering, statistical analysis, and modeling stages. Different stages can employ various algorithms. For example, the data preprocessing stage can include intersection algorithms, union algorithms, splitting algorithms, imputation algorithms, etc.; the feature engineering stage can include binning algorithms, one-hot encoding algorithms, normalization algorithms, etc.; the statistical analysis stage can include PSI algorithms, feature importance statistics algorithms, etc.; and the modeling and machine learning stages can include tree models, logistic regression algorithms, DNNs, etc. If only one set of data is available, modeling can be performed using that data. Multiple implementation methods can be used in the data preprocessing stage, and multiple implementation methods can be used in the feature engineering stage. This task flow can consist of multiple tasks, each of which can call a corresponding algorithm.

[0035] For unilateral datasets, task flows can be created based on business needs, consisting of various task nodes, each with its own algorithm. For federated datasets, task members with relationships between participants can be established based on business needs to form task flows.

[0036] As an optional implementation of this embodiment, identifying the task flow and determining the task type of the modeling stage includes: identifying the task flow and determining the type of the dataset corresponding to it.

[0037] In this optional implementation, the type of dataset can be determined before modeling, and the task flow can be distributed to a specific modeling algorithm based on the type of dataset. A unilateral dataset can be routed to a unilateral modeling algorithm, and a federated dataset can be routed to a federated modeling algorithm.

[0038] It is understandable that a unilateral dataset can be a local dataset or a unilateral dataset from other participants.

[0039] As an optional implementation of this embodiment, a task flow can be established under different types of datasets, including: when a task component in the canvas of the interactive interface is triggered, the component is called, wherein each component corresponds to its own modeling algorithm; based on the business goals of the participants, the connection relationship between each component is established to obtain the task flow.

[0040] In this optional implementation, a canvas can be pre-established on the interactive interface. Multiple types and quantities of components can be invoked on the canvas, each component corresponding to a task, and each task indicating a modeling algorithm. Under a specific dataset, a task flow is established based on business requirements. When establishing the task flow, the connection relationships between the various task components can be customized. For a unilateral dataset, a task flow is established based on business requirements, consisting of task members from that unilateral party connected through serial and / or parallel relationships. For a federated dataset, a task flow is established based on business requirements, consisting of task members from various participating parties connected through serial and / or parallel relationships.

[0041] For example, in a federated dataset, there can be multiple participants. These participants can establish relationships through task components (task members). Data can be merged between multiple participants using a JOIN component. The JOIN component can then be connected to a missing value handling component, which in turn can be connected to a normalization component. The missing value handling component can also be connected to feature binning and histogram components. The normalization component can be sequentially connected to components related to model training. It is understandable that the connections between these components can be modified on the canvas, thereby altering the task execution logic.

[0042] This optional implementation method distinguishes datasets and establishes task flows under the single-dataset and federated dataset domains respectively, so as to realize modeling based on the task flow. This improves the flexibility of federated learning, reduces the coupling of the federated learning system, and overcomes the problems of low learning efficiency, complex code structure and difficult debugging caused by the mixing of scheduling flow, computing flow and communication flow.

[0043] As an optional implementation of this embodiment, the algorithm database for a single participant and the algorithm database for multiple participants both contain a variety of algorithm types. The algorithm types include classification algorithms, statistical analysis algorithms, data preprocessing, and feature processing. Each type contains a variety of algorithms, and each algorithm corresponds to a task component in the canvas.

[0044] In this optional implementation, unilateral modeling algorithms and federated modeling algorithms can include algorithms of the same type. Once the task components in the canvas are assembled into a task flow, the modeling engine calls the corresponding algorithms for each task node from the corresponding algorithm library based on the task type.

[0045] The unilateral or federated modeling algorithms implement all the necessary methods for data processing and modeling, including but not limited to logistic regression, linear regression, XGBoost, OneHot, binning, PSI, Pearson correlation coefficient, feature importance analysis, outlier handling, etc. The modeling interaction layer only needs to call these methods.

[0046] In this embodiment, by identifying the type of task flow, it can be determined which algorithm database executes the modeling algorithm for that type of task flow.

[0047] Step 104: Based on the determined task type, call the corresponding algorithm from different algorithm databases in the task flow.

[0048] In this embodiment, task flows of different task types are distributed to different modeling algorithms through algorithmic routing.

[0049] refer to Figure 2 , Figure 2 The system architecture diagram of the method in this embodiment is shown. It may include a UI interaction layer and a modeling engine. The connection between the interaction layer and the modeling engine layer can be achieved through a WEB API. Following the WEB API, an "algorithm routing" module is established in the modeling engine layer. This module identifies the modeling task type and distributes the modeling task to one of two execution modes: a unilateral modeling algorithm or a federated modeling algorithm. At the interaction layer, users can create task flows through drag-and-drop, and the concepts are unified and bound to the execution layer (e.g., task nodes correspond to algorithms), greatly reducing programming difficulty and debugging complexity.

[0050] This embodiment enables the execution of different modeling methods within the same modeling system, allowing for the sharing of intermediate modeling results and saving resources. It overcomes the problems in related technologies where the federated modeling process involves a high degree of coupling between communication and secure computation, resulting in wasted resources and the inability to share intermediate modeling results.

[0051] It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown here.

[0052] According to an embodiment of this disclosure, an apparatus for implementing the above-described multi-mode federated learning method is also provided. The apparatus includes: a mode determination unit configured to establish an algorithm database for a single participant and an algorithm database for multiple participants during the federated learning modeling phase; an interaction unit configured to obtain a currently established task flow for federated learning from an interactive interface; a routing unit configured to identify the task flow and determine the task type of the modeling phase, wherein the task type includes a task type for a single participant and a task type for multiple participants; and a modeling unit configured to call corresponding algorithms from different algorithm databases in the task flow based on the determined task type.

[0053] As an optional implementation of this embodiment, the device further includes: a data abstraction unit, which abstracts the datasets of each participant to obtain two dataset types: a single-party dataset with only one participant and a federated dataset with multiple participants. Task flows can be established under the domains of different types of datasets.

[0054] As an optional implementation of this embodiment, the device further includes: an abstraction unit configured to abstract the data of each participant to obtain two dataset types: a single-party dataset with only one participant and a federated dataset with multiple participants, wherein task flows can be established under the domains of different types of datasets.

[0055] As an optional implementation of this embodiment, identifying the task flow and determining the task type of the modeling stage includes: identifying the task flow and determining the type of the dataset corresponding to it.

[0056] As an optional implementation of this embodiment, a task flow can be established under different types of datasets, including: when a task component in the canvas of the interactive interface is triggered, the component is called, wherein each component corresponds to its own modeling algorithm; based on the business goals of the participants, the connection relationship between each component is established to obtain the task flow.

[0057] As an optional implementation of this embodiment, the algorithm database for a single participant and the algorithm database for multiple participants both contain a variety of algorithm types. The algorithm types include classification algorithms, statistical analysis algorithms, data preprocessing, and feature processing. Each type contains a variety of algorithms, and each algorithm corresponds to a task component in the canvas.

[0058] This embodiment enables the execution of different modeling methods within the same modeling system, allowing for the sharing of intermediate modeling results and saving resources. It overcomes the problems in related technologies where the federated modeling process involves a high degree of coupling between communication and secure computation, resulting in wasted resources and the inability to share intermediate modeling results.

[0059] This disclosure provides an electronic device, such as... Figure 3 As shown, the electronic device includes one or more processors 31 and a memory 32. Figure 3 Take a processor 31 as an example.

[0060] The controller may also include an input device 33 and an output device 34.

[0061] The processor 31, memory 32, input device 33, and output device 34 can be connected via a bus or other means. Figure 3 Taking the example of a connection between China and Israel via a bus.

[0062] Processor 31 can be a Central Processing Unit (CPU). Processor 31 can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations thereof. The general-purpose processor can be a microprocessor or any conventional processor.

[0063] The memory 32, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the program instructions / modules corresponding to the control method in the embodiments of this disclosure. The processor 31 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions, and modules stored in the memory 32, thereby implementing the multi-mode federated learning method of the above method embodiments.

[0064] The memory 32 may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created by the use of the processing device operated by the server. Furthermore, the memory 32 may include high-speed random access memory and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 32 may optionally include memory remotely located relative to the processor 31, and these remote memories can be connected to a network connection device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0065] Input device 33 can receive input numerical or character information, and generate key signal inputs related to user settings and function control of the server's processing device. Output device 34 may include display devices such as a display screen.

[0066] One or more modules are stored in memory 32, and when executed by one or more processors 31, they perform actions such as... Figure 1 The method shown.

[0067] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium. When executed, the program can include the processes of the embodiments of the motor control methods described above. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), random access memory (RAM), flash memory, hard disk drive (HDD), or solid-state drive (SSD), etc.; the storage medium can also include combinations of the above types of memory.

[0068] Although embodiments of the present disclosure have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the present disclosure, and such modifications and variations all fall within the scope defined by the appended claims.

Claims

1. A multi-mode federated learning method, characterized in that, include: For the federated learning modeling phase, establish an algorithm database for a single participant and an algorithm database for multiple participants. Obtain the currently established task flow for federated learning from the interactive interface; The task flow is identified to determine the task type in the modeling phase, wherein the task type includes the task type of a single participant and the task type of multiple participants; Based on a defined task type, the corresponding algorithm in the task flow is called from different algorithm databases; The method also includes: abstracting the data of each participant to obtain two dataset types: a single-party dataset with only one participant and a federated dataset with multiple participants. Task flows can be established under the domains of different dataset types. The task flow that can be established under different types of datasets includes: when a task component in the canvas of the interactive interface is triggered, the component is invoked, wherein each component corresponds to its own modeling algorithm; based on the business objectives of the participants, the connection relationship between each component is established to obtain the task flow.

2. The multi-mode federated learning method of claim 1, wherein, Identifying the task flow and determining the task types in the modeling phase includes: If the data in the task flow indicates local data, then the task type is a single-participant task type; If the data in the task flow indicates multi-party data, then the task type is a task type with multiple participants.

3. The multi-mode federated learning method of claim 2, wherein, Identifying the task flow and determining the task type of the modeling phase includes: identifying the task flow and determining the type of the dataset corresponding to it.

4. The multi-mode federated learning method of claim 3, wherein, The algorithm database for a single participant and the algorithm database for multiple participants both contain various types of algorithms, including classification algorithms, statistical analysis algorithms, data preprocessing, and feature processing. Each type contains multiple algorithms, and each algorithm corresponds to a task component in the canvas. 5.A multi-mode federated learning device, characterized in that, include: The pattern determination unit is configured to establish an algorithm database for a single participant and an algorithm database for multiple participants for the federated learning modeling phase. The interaction unit is configured to obtain the currently established task flow for federated learning from the interaction interface; The routing unit is configured to identify the task flow and determine the task type of the modeling phase therein, wherein the task type includes the task type of a single participant and the task type of multiple participants; The modeling unit is configured to call the corresponding algorithm from different algorithm databases in the task flow based on a defined task type; The device also includes a data abstraction unit, which abstracts the datasets of each participant to obtain two dataset types: a single-party dataset with only one participant and a federated dataset with multiple participants. Task flows can be established under the domains of different types of datasets. The task flow that can be established under different types of datasets includes: when a task component in the canvas of the interactive interface is triggered, the component is invoked, wherein each component corresponds to its own modeling algorithm; based on the business objectives of the participants, the connection relationship between each component is established to obtain the task flow.

6. A computer readable storage medium characterized by The computer-readable storage medium stores computer instructions for causing the computer to perform the multi-modal federated learning method according to any one of claims 1-4.

7. An electronic device, comprising: include: At least one processor; And a memory communicatively connected to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to cause the at least one processor to perform the multi-mode federated learning method according to any one of claims 1-4.

Citation Information

Patent Citations

  • Modeling task and data analysis method, device, electronic equipment and system

    CN113407327A