Business data processing method and device, electronic equipment and storage medium
By acquiring sample and key sets that match the business device in the cloud network and processing the intersection of samples to train federated model parameters, the problems of privacy data leakage and high computational complexity are solved, achieving efficient business data processing and making it suitable for mobile devices.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2021-01-21
- Publication Date
- 2026-06-30
AI Technical Summary
In cloud networks, existing technologies suffer from issues such as leakage of user privacy data and high computational complexity. In particular, when processing large volumes of business data on mobile devices, the high computational complexity of traditional encryption functions leads to high hardware overhead and long user waiting times.
By acquiring sample sets that match the first and second business party devices in the business data processing system, virtual samples and key sets are determined. These sets are then used to process the sample intersection to determine training samples. Based on the training samples, the parameters of the federated model are trained, thereby reducing the computational cost without exchanging data.
While ensuring privacy data is not leaked, it improves the efficiency of business data processing, reduces user waiting time, and is suitable for business data processing on mobile devices.
Smart Images

Figure CN113591097B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to data processing technology in cloud networks, and more particularly to business data processing methods, apparatus, electronic devices, and storage media. Background Technology
[0002] When different business parties share some business data, secure multi-party computation is required. This means that multiple parties jointly compute the result of a function without disclosing the input data of each party, and the result is made public to one or more of the parties. In related technologies, due to defects in encrypted transmission, user privacy data is frequently leaked. Furthermore, when dealing with large volumes of business data to be processed, the computational complexity of the modulo operation in traditional exchange encryption function structures is high, and the hardware overhead of the encryption process is significant, resulting in longer waiting times for users, increased hardware costs, and hindering the implementation of business data processing on mobile devices. Summary of the Invention
[0003] In view of this, embodiments of the present invention provide a business data processing method, apparatus, electronic device, and storage medium, which can determine the intersection of sample sets by configuring corresponding virtual samples, process the intersection of the sample sets, determine training samples that match the business data processing system, and finally determine the parameters of the federated model. Thus, without exchanging data, the computational cost is reduced, the task of determining the parameters of the federated model is completed, the efficiency of business data processing is improved, business data processing can be realized in mobile devices, saving users' waiting time and ensuring that privacy data is not leaked.
[0004] The technical solution of this invention is implemented as follows:
[0005] This invention provides a business data processing method, including:
[0006] Obtain a first sample set that matches a first business party device in the business data processing system, and a second sample set that matches a second business party device in the business data processing system, wherein the business data processing system includes at least a first business party device and a second business party device;
[0007] Based on the first sample set, determine the virtual sample that matches the first business device;
[0008] Based on the virtual sample matching the first service provider's device and the second sample set matching the second service provider's device, the intersection of the sample sets is determined;
[0009] Determine a first set of keys that matches the first service provider's device and a second set of keys that matches the second service provider's device;
[0010] By processing the intersection of the sample set using the first key set and the second key set, training samples that match the business data processing system are determined.
[0011] Based on training samples that match the business data processing system, the federated model corresponding to the business data processing system is trained to determine the parameters of the federated model.
[0012] This invention also provides a business data processing apparatus, comprising:
[0013] An information transmission module is used to acquire a first sample set that matches a first business party device in the business data processing system, and a second sample set that matches a second business party device in the business data processing system, wherein the business data processing system includes at least a first business party device and a second business party device.
[0014] The information processing module is used to determine a virtual sample that matches the first business device based on the first sample set.
[0015] The information processing module is used to determine the intersection of sample sets based on the virtual sample matching the first business device and the second sample set matching the second business device;
[0016] The information processing module is used to determine a first key set that matches the first business party device and a second key set that matches the second business party device;
[0017] The information processing module is used to process the intersection of the sample set using the first key set and the second key set to determine training samples that match the business data processing system.
[0018] The information processing module is used to train the federated model corresponding to the business data processing system based on training samples that match the business data processing system, and to determine the parameters of the federated model.
[0019] In the above scheme,
[0020] The information processing module is used to determine a sample set that matches the first business party device based on the business type of the first business party device in the business data processing system.
[0021] The information processing module is used to determine a sample set that matches the second business party device based on the business type of the second business party device in the business data processing system.
[0022] The information processing module is used to perform sample alignment processing on the sample set matching the first service provider device and the sample set matching the second service provider device, so as to obtain a first sample set matching the first service provider device and a second sample set matching the second service provider device.
[0023] In the above scheme,
[0024] The information processing module is used to determine the value parameters and distribution parameters of the sample ID in the first sample set using the first business party's equipment;
[0025] The information processing module is used to generate a virtual sample that matches the first service provider's device based on the value parameters and distribution parameters of the sample IDs in the first sample set.
[0026] In the above scheme,
[0027] The information processing module is used to merge the virtual sample with the first sample set to form a first sample set containing the virtual sample.
[0028] The information processing module is used to traverse the first sample set containing virtual samples and determine the ID set of the virtual samples.
[0029] The information processing module is used to traverse the second sample set and determine the intersection of the first sample set and the second sample set containing virtual samples.
[0030] In the above scheme,
[0031] The information processing module is used to trigger the target application process in response to the device type of the first business device and the second business device.
[0032] The information processing module is used to determine the data intersection set between the first sample set and the second sample set based on the target application process;
[0033] The information processing module is used to obtain, through the target application process, a first virtual sample set corresponding to the first business party device and a second virtual sample set corresponding to the second business party device;
[0034] The information processing module is used to determine, through the target application process, a virtual sample that matches the first business device based on the data intersection set of the first sample set and the second sample set, the first virtual sample set, and the second virtual sample set.
[0035] In the above scheme,
[0036] The information processing module is used to merge the virtual sample with the first sample set to form a first sample set containing the virtual sample.
[0037] The information processing module is used to traverse the first sample set containing virtual samples and determine the ID set of the virtual samples.
[0038] The information processing module is used to traverse the second sample set and determine the intersection of the first sample set and the second sample set containing virtual samples.
[0039] In the above scheme,
[0040] The information processing module is used to exchange different public keys to the corresponding business device based on the first key set and the second key set, so as to obtain the initial parameters of the federated model.
[0041] The information processing module is used to determine the number of samples corresponding to the mini-batch gradient descent algorithm that matches the business data processing system.
[0042] The information processing module is used to process the intersection of the sample sets according to the number of samples corresponding to the mini-batch gradient descent algorithm, and determine the training samples that match the business data processing system.
[0043] In the above scheme,
[0044] The information processing module is used to substitute the training samples that match the business data processing system into the loss function corresponding to the federated model of the business data processing system.
[0045] The information processing module is used to determine the model update parameters of the federated model of the business data processing system when the loss function satisfies the corresponding convergence condition.
[0046] The information processing module is used to determine the model parameters of the federated model based on the model update parameters corresponding to the federated model.
[0047] In the above scheme,
[0048] The information processing module is used to adjust the residuals corresponding to the virtual samples matched with the model update parameters of the business data processing system through the first business device when training the federated model corresponding to the business data processing system based on training samples that match the business data processing system, so as to adjust the influence of the virtual samples on the model parameters of the federated model.
[0049] In the above scheme,
[0050] The information processing module is used to trigger the target application process when training the federated model corresponding to the business data processing system based on training samples that match the business data processing system.
[0051] The information processing module is used to adjust the residuals corresponding to the virtual samples matched by the model update parameters based on the target application process, so as to adjust the influence of the virtual samples on the model parameters of the federated model.
[0052] This invention also provides an electronic device, the electronic device comprising:
[0053] Memory, used to store executable instructions;
[0054] The processor, when executing the executable instructions stored in the memory, implements the aforementioned business data processing method.
[0055] This application also provides a computer program product or computer program including computer instructions stored in a computer-readable storage medium. The processor of an electronic device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the electronic device to perform various alternative implementations of the above-described business data processing method, including different embodiments and combinations thereof.
[0056] This invention also provides a computer-readable storage medium storing executable instructions, which, when executed by a processor, implement the aforementioned business data processing method.
[0057] The embodiments of the present invention have the following beneficial effects:
[0058] This invention provides an embodiment that obtains a first sample set matching a first business device in a business data processing system, and a second sample set matching a second business device in the same system. The business data processing system includes at least a first business device and a second business device. Based on the first sample set, virtual samples matching the first business device are determined. The intersection of the sample sets is determined based on the virtual samples matching the first business device and the second sample set matching the second business device. A first key set matching the first business device and a second key set matching the second business device are determined. The intersection of the sample sets is processed using the first and second key sets to determine training samples matching the business data processing system. Based on the training samples matching the business data processing system, a federated model corresponding to the business data processing system is trained to determine the federated model parameters. Therefore, while ensuring no data exchange, the computational cost is reduced, the task of determining the federated model parameters is completed, the efficiency of business data processing is improved, business data processing can be implemented in mobile devices, saving user waiting time and ensuring that privacy data is not leaked. Attached Figure Description
[0059] Figure 1 This is a schematic diagram illustrating the usage environment of the business data processing method provided in this embodiment of the invention;
[0060] Figure 2 This is a schematic diagram of the composition structure of the business data processing device provided in an embodiment of the present invention;
[0061] Figure 3 An optional flowchart illustrating the business data processing method provided in this embodiment of the invention;
[0062] Figure 4 This is a schematic diagram of the business data processing process of the business data processing method provided in the embodiments of the present invention;
[0063] Figure 5 This is a schematic diagram of the business data processing process of the business data processing method provided in the embodiments of the present invention;
[0064] Figure 6 This is a schematic diagram of the business data processing process of the business data processing method provided in the embodiments of the present invention;
[0065] Figure 7 This is an optional flowchart illustrating the business data processing method in an embodiment of the present invention;
[0066] Figure 8 This is a schematic diagram of the business data processing process of the business data processing method provided in the embodiments of the present invention;
[0067] Figure 9 This is an optional flowchart illustrating the business data processing method in an embodiment of the present invention. Detailed Implementation
[0068] To make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. The described embodiments should not be regarded as limitations on the present invention. All other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0069] In the following description, references are made to “some embodiments,” which describe a subset of all possible embodiments. However, it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.
[0070] Before providing a further detailed description of the embodiments of the present invention, the nouns and terms involved in the embodiments of the present invention will be explained, and the nouns and terms involved in the embodiments of the present invention shall be interpreted as follows.
[0071] 1) Service provider equipment, including but not limited to: ordinary service provider equipment and dedicated service provider equipment, wherein the ordinary service provider equipment maintains a long connection and / or a short connection with the transmission channel, and the dedicated service provider equipment maintains a long connection with the transmission channel, and may be a server.
[0072] 2) Client: The carrier that implements specific functions on the business device. For example, a mobile client (APP) is the carrier of specific functions on the business device.
[0073] 3) In response, used to indicate the conditions or states on which the operation performed depends. When the conditions or states on which it depends are met, one or more operations performed can be performed in real time or with a set delay. Unless otherwise specified, there is no restriction on the order in which the multiple operations are performed.
[0074] 4) Federated Learning: Federated learning is a machine learning framework that effectively helps multiple organizations use data and perform machine learning modeling while meeting user privacy, data security, and government regulations. Federated learning can effectively solve the data silo problem, allowing participants to jointly model data without sharing existing data, thus technically breaking down data silos and enabling collaboration.
[0075] 5) Blockchain is an encrypted, chain-like storage structure for transactions formed by blocks.
[0076] For example, the header of each block can include the hash values of all transactions in the block, as well as the hash values of all transactions in the previous block, thereby preventing tampering and forgery of transactions in the block based on hash values; newly generated transactions are filled into the block and, after consensus among nodes in the blockchain network, are appended to the tail of the blockchain, thus forming a chain-like growth.
[0077] 6) A blockchain network is a collection of nodes that incorporate new blocks into a blockchain through consensus. Each business device can act as a different blockchain node in the blockchain network.
[0078] 7) Model parameters are a set of general variables used to establish the relationship between functions and variables. In artificial neural networks, model parameters are typically real-valued matrices.
[0079] Figure 1 This is a schematic diagram illustrating a use case of the business data processing method provided in an embodiment of the present invention. (Refer to...) Figure 1 The business-side devices (including business-side devices 10-1 and 10-2) are equipped with software clients capable of displaying corresponding resource transaction data. For example, clients or plugins for conducting financial activities using virtual or physical resources or for making payments through virtual resources. Users can obtain and display resource transaction data through the corresponding clients, and trigger corresponding fraud detection processes during virtual resource changes (such as WeChat Pay or financial lending processes within WeChat). This process requires the data processing device deployed on the server to assess the user's risk, and aims to obtain the processing results of business data from other institutions for auxiliary analysis without acquiring the privacy data of other institutional nodes, so as to determine the risk level of the target user (whether to proceed with lending) through the corresponding prediction results. Different business-side devices can directly connect to business-side device 200.
[0080] Of course, the business data processing device provided by this invention can be applied to environments where financial activities are conducted using virtual or physical resources, or where payments are made using physical financial resources (including but not limited to environments where various types of physical financial resources change, electronic payment shopping environments, and anti-fraud environments during e-commerce shopping) or where information is exchanged using social software. In various types of financial activities conducted using physical financial resources or where payments are made using virtual resources, financial information from different data sources is usually processed, and finally, the target business data of the business data processing system, determined by the sorting results of the test samples, is presented on the user interface (UI) of the business device.
[0081] In some embodiments of the present invention, the business data processing can be performed by a computing platform. The computing platform can be a platform located on a trusted third-party device, or it can be located on one of multiple data parties, or a platform distributed across multiple data parties. The computing platform can interact with each data party. Figure 1 Multiple business parties (which could be data party servers holding different business data) can be data parties of the same data category, such as all being data parties of the banking category, or all being data parties of the shopping platform category, etc. Multiple data parties can also be data parties of different categories. For example, business party device 10-1 is a shopping platform data party, business party device 10-2 is a lending platform data party, or in the above example, business party device 10-1 is the data owner of contact information, business party device 10-2 is a service provider, and so on. In business data processing scenarios, the business data provided by these data parties is usually of the same type. For example, if business party device 10-1 is a shopping platform data party and business party device 10-2 is a lending platform data party, and if the shopping platform is linked to a payment bank card number and the lending platform is linked to a withdrawal and repayment bank card number, the business data provided by both parties for business data processing could be bank card numbers and transfer information or lending information. If both the shopping platform data party and the lending platform data party have registered user phone numbers, the business data provided by both parties for business data processing could also be phone numbers. In other business scenarios, business data can also contain other types of data, which will not be listed here.
[0082] As an example, either business device 200 or business device 10-1 can be used to deploy a business data processing device to implement the business data processing method provided by this invention. This method involves: acquiring a first sample set matching a first business device in the business data processing system, and a second sample set matching a second business device in the same system, wherein the business data processing system includes at least a first business device and a second business device; determining virtual samples matching the first business device based on the first sample set; determining the intersection of the sample sets based on the virtual samples matching the first business device and the second sample set matching the second business device; determining a first key set matching the first business device and a second key set matching the second business device; processing the intersection of the sample sets using the first key set and the second key set to determine training samples matching the business data processing system; and training the federated model corresponding to the business data processing system based on the training samples matching the business data processing system to determine the parameters of the federated model.
[0083] The structure of the business data processing device according to an embodiment of the present invention will be described in detail below. The business data processing device can be implemented in various forms, such as a dedicated business device with business data processing capabilities, or a server or server group equipped with business data processing capabilities, such as a business information processing process deployed in business device 10-1, such as a preprocessor. Figure 1 The business side equipment is 200. Figure 2 This is a schematic diagram of the composition structure of the business data processing device provided in an embodiment of the present invention. It can be understood that... Figure 2 The diagram shows only an exemplary structure of the business data processing apparatus, not the entire structure; implementation is possible as needed. Figure 2 The structure shown may be part or all of the structure.
[0084] The service data processing apparatus provided in this embodiment of the invention includes at least one processor 201, a memory 202, a user interface 203, and at least one network interface 204. The various components in the service data processing apparatus are coupled together via a bus system 205. It can be understood that the bus system 205 is used to implement communication between these components. In addition to a data bus, the bus system 205 also includes a power bus, a control bus, and a status signal bus. However, for clarity, in... Figure 2 The general labeled all buses as Bus System 205.
[0085] The user interface 203 may include a monitor, keyboard, mouse, trackball, click wheel, buttons, touchpad, or touch screen.
[0086] It is understood that memory 202 can be volatile memory or non-volatile memory, or both. In this embodiment of the invention, memory 202 is capable of storing data to support the operation of the service provider device (e.g., 10-1). Examples of this data include any computer programs used to operate on the service provider device (e.g., 10-1), such as operating systems and applications. The operating system includes various system programs, such as framework layers, core library layers, driver layers, etc., used to implement various basic services and handle hardware-based tasks. Applications can include various applications.
[0087] In some embodiments, the business data processing apparatus provided in this invention can be implemented using a combination of hardware and software. For example, the business data processing apparatus provided in this invention can be a processor in the form of a hardware decoding processor, which is programmed to execute the business data processing method provided in this invention. For instance, the processor in the form of a hardware decoding processor can employ one or more application-specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components.
[0088] As an example of the business data processing device provided in this embodiment of the invention, which is implemented by combining software and hardware, the business data processing device provided in this embodiment of the invention can be directly embodied as a combination of software modules executed by processor 201. The software modules can be located in a storage medium, which is located in memory 202. Processor 201 reads the executable instructions included in the software modules in memory 202 and combines them with necessary hardware (e.g., including processor 201 and other components connected to bus 205) to complete the business data processing method provided in this embodiment of the invention.
[0089] As an example, processor 201 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., wherein the general-purpose processor can be a microprocessor or any conventional processor, etc.
[0090] As an example of the hardware implementation of the business data processing device provided in the embodiments of the present invention, the device provided in the embodiments of the present invention can be directly executed by a processor 201 in the form of a hardware decoding processor. For example, it can be executed by one or more application specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components to implement the business data processing method provided in the embodiments of the present invention.
[0091] In this embodiment of the invention, the memory 202 is used to store various types of data to support the operation of the business data processing device. Examples of such data include: any executable instructions for operation on the business data processing device, such as executable instructions, in which a program implementing the business data processing method of this embodiment of the invention may be included.
[0092] In other embodiments, the business data processing apparatus provided in this invention can be implemented in software. Figure 2 A business data processing apparatus stored in memory 202 is shown. This apparatus may be software in the form of programs and plug-ins, and includes a series of modules. As an example of a program stored in memory 202, it may include the business data processing apparatus, which includes the following software modules:
[0093] The information transmission module 2081 is used to acquire a first sample set that matches a first business party device in the business data processing system, and a second sample set that matches a second business party device in the business data processing system, wherein the business data processing system includes at least a first business party device and a second business party device.
[0094] Information processing module 2082 is used to determine a virtual sample that matches the first business device based on the first sample set.
[0095] The information processing module 2082 is used to determine the intersection of sample sets based on the virtual sample matching the first business device and the second sample set matching the second business device.
[0096] The information processing module 2082 is used to determine a first key set that matches the first business device and a second key set that matches the second business device.
[0097] The information processing module 2082 is used to process the intersection of the sample set using the first key set and the second key set to determine training samples that match the business data processing system.
[0098] The information processing module 2082 is used to train the federated model corresponding to the business data processing system based on training samples that match the business data processing system, and to determine the parameters of the federated model.
[0099] according to Figure 2 The electronic device shown, in one aspect of this application, also provides a computer program product or computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform various alternative implementations of the above-described business data processing method, including different embodiments and combinations thereof.
[0100] Combination Figure 2 The illustrated business data processing apparatus illustrates the business data processing method provided in this embodiment of the invention. Before introducing the business data processing method provided in this application, the business data processing method in the financial risk control scenario in related technologies will be described first. In the process of business data processing, due to the large number of business types, each user may have different network data, and some users may have tags for some nodes in the network. However, to protect privacy, users often do not share data with each other, nor do they exchange user data to process business data for different business devices. For example, in a bank risk control scenario, Bank A wants to obtain the risk ranking of current personal loan applicants. Bank A has historically identified low-risk customers, while another bank, Bank B, has the fund transfer relationships of the same customers. In this case, Bank A can calculate the risk level of the target customer using Bank B's fund transfer relationships and its own low-risk customer tags without accessing Bank B's fund transfer data. While exchanging user data can determine the risk level of the target customer, it leaks user data privacy and causes user data leakage.
[0101] To address the aforementioned deficiencies, see [link to relevant documentation]. Figure 3 , Figure 3 This is an optional flowchart illustrating the business data processing method provided in an embodiment of the present invention. It can be understood that... Figure 3The steps shown can be performed by various electronic devices running business data processing equipment, such as servers or server clusters that handle business data, or business-side devices that handle business processes. Specifically, they include:
[0102] Step 301: The business data processing device acquires a first sample set that matches the first business party device in the business data processing system, and a second sample set that matches the second business party device in the business data processing system.
[0103] The business data processing system includes at least a first business party device and a second business party device. Specifically, each business party device in the system can be used in scenarios where multiple data providers collaboratively query data against a multi-party joint query statement. This could be a scenario involving multiple data providers collaboratively querying privacy data against a multi-party joint query statement, or a scenario of vertical federated learning. Vertical federated learning refers to splitting the datasets vertically (i.e., along the feature dimension) when there is significant overlap in users but minimal overlap in user features between two datasets, and then using the portion of data where users are the same but user features are not entirely identical for training. This method is called vertical federated learning. For example, consider two different institutions: a bank in one location and an e-commerce platform in the same location. Their user groups are likely to include most residents of that location, resulting in significant user overlap. However, since the bank records users' income and expenditure behavior and credit ratings, while the e-commerce platform retains users' browsing and purchase history, their user feature overlap is relatively small. Vertical federated learning aggregates these different features in an encrypted state to enhance the model's capabilities.
[0104] Specifically, each data provider stores its data in its own data storage system or cloud server, and the original data information that each provider needs to disclose may be different. The business data processing method provided in this application allows for the exchange of processing results of various privacy-related data processed by different business devices, while ensuring that the original data of each business device is not leaked during this process. The calculation results are disclosed to each provider to guarantee that each business device can obtain the corresponding target business data in a timely and accurate manner.
[0105] In some embodiments of the present invention, obtaining a first sample set matching a first business device in the business data processing system and a second sample set matching a second business device in the business data processing system can be achieved in the following ways:
[0106] Based on the service type of the first service provider's device in the business data processing system, a sample set matching the first service provider's device is determined; based on the service type of the second service provider's device in the business data processing system, a sample set matching the second service provider's device is determined; sample alignment processing is performed on the sample sets matching the first service provider's device and the sample sets matching the second service provider's device to obtain a first sample set matching the first service provider's device and a second sample set matching the second service provider's device. (Referring to...) Figure 4 , Figure 4 This is a schematic diagram of the business data processing process of the business data processing method provided in this embodiment of the invention. Participant A and participant B of the business data processing system each possess a training feature dataset. and In this context, participant A and participant B each possess partial data features. Participants A and B can expand the dimensions of these data features or obtain data label information through vertical federated learning to train better models. For example, in a two-party vertical federated learning model, participant A (e.g., an advertising company) and participant B (e.g., a social network platform) collaborate to jointly train one or more deep learning-based personalized recommendation models. Participant A possesses partial data features, such as (X1, X2, …, X40), totaling 40 dimensions; while participant B possesses another set of data features, such as (X41, X42, …, X100), totaling 60 dimensions. By combining their features, participants A and B possess even more data features; for example, the combined features of A and B could reach 100 dimensions, significantly expanding the feature dimensions of the training data. For supervised deep learning, participant A and / or participant B also possess the label information Y of the training data.
[0107] In some embodiments of the present invention, one of the two participants does not have feature data. For example, participant A has no feature data, only tag information.
[0108] Before training the longitudinal federated learning model, participants A and B need to align their training data and label information, and filter out the intersection of the IDs of their training data, i.e., to obtain the set. and The intersection of identical sample IDs. For example, if participants A and B each possess feature information XA and XB of the same bank customer, they need to be aligned, i.e., concatenated together during model training to form a training sample (XA, XB). Concatenating feature information from different bank customers is meaningless and cannot construct a training sample.
[0109] Among them, reference Figure 5 , Figure 5 This is a schematic diagram of the business data processing process of the business data processing method provided in this embodiment of the invention. It requires finding the training sample IDs shared by participant A and participant B (this process is also called sample alignment, data alignment, or secure set intersection processing). It also requires finding the common customers of participant A and participant B, namely customers U1, U2, and U7. For example, the IDs of customers shared by a bank and another e-commerce company can generally be identified using the hash value of a mobile phone number or ID card number.
[0110] Step 302: The business data processing device determines a virtual sample that matches the first business device based on the first sample set.
[0111] Step 303: The business data processing device determines the intersection of the sample sets based on the virtual sample matching the first business device and the second sample set matching the second business device.
[0112] In some embodiments of the present invention, determining a virtual sample matching the first service provider's device based on the first sample set can be achieved in the following ways:
[0113] The first service provider device determines the value parameters and distribution parameters of the sample IDs in the first sample set; based on the value parameters and distribution parameters of the sample IDs in the first sample set, it generates virtual samples that match the first service provider device. Specifically, the virtual samples can be merged with the first sample set to form a first sample set containing virtual samples; the first sample set containing virtual samples is traversed to determine the set of IDs for the virtual samples; the second sample set is traversed to determine the intersection of the first sample set containing virtual samples and the second sample set. In particular, combining... Figure 4 and Figure 5 As shown, participant A randomly generates some virtual sample IDs (and corresponding sample features) based on the values and distribution of the sample IDs they possess. Participant A uses the union of their real sample ID set and the generated virtual sample ID set, and then performs a safe set intersection with participant B's sample ID set to obtain intersection I. This result is the intersection. This includes both the virtual ID and the real ID of participant A. Although both participant A and participant B know the sample ID information in the intersection I, the virtual sample ID is used here to obfuscate the real sample ID, thus protecting participant A's real sample ID from being known precisely by participant B.
[0114] In some embodiments of the present invention, determining the virtual sample matching the first service provider's device based on the first sample set can also be achieved in the following ways:
[0115] In response to the device types of the first and second service provider devices, a target application process is triggered. Based on the target application process, the data intersection set of the first sample set and the second sample set is determined. The first virtual sample set corresponding to the first service provider device and the second virtual sample set corresponding to the second service provider device are obtained through the target application process. Based on the data intersection set of the first and second sample sets, the first virtual sample set, and the second virtual sample set, the virtual sample matching the first service provider device is determined through the target application process. Specifically, the virtual sample is merged with the first sample set to form a first sample set containing virtual samples. The first sample set containing virtual samples is traversed to determine the ID set of the virtual samples. The second sample set is traversed to determine the intersection of the first and second sample sets containing virtual samples. In detail, combining the preceding steps... Figure 4 and Figure 5 ,refer to Figure 6 , Figure 6 This is a schematic diagram of the business data processing process of the business data processing method provided in this embodiment of the invention. Participant A and Participant B, with the help of a third party or a trusted execution environment as the target process, perform Security Sample ID Intersection (PSI) to generate Sample ID Intersection I1. The Sample ID Intersection I1 is the intersection of real public sample IDs and does not include virtual sample IDs.
[0116] Here, the third party will be referred to as participant C, such as Figure 6 As shown in the example. In this step, participants A and B can choose to encrypt (or hash) their sample IDs before sending them to participant C. If encrypted transmission is chosen, participants A and B need to first negotiate a key and choose the same key, for example, the same RSA public key. It should be noted that if encryption is chosen, participant C receives the encrypted sample ID, and participant C cannot decrypt the encrypted sample ID.
[0117] Participant C calculates the intersection of the sample ID sets sent by Participant A and Participant B, which can be done through a simple comparison. After obtaining the sample ID intersection I1, Participant C does not send the specific details of the ID intersection I1 to Participants A and B, but only tells them the number of elements in the ID intersection I1. Therefore, neither Participants A nor Participants B know the specific sample IDs in their common sample ID intersection I1. When the number of elements in the intersection I1 is too small, vertical federated learning cannot be performed.
[0118] Participant A and Participant B each generate a virtual sample ID (and corresponding virtual sample features). Participant A and Participant B use their real sample ID sets and the generated virtual ID sets to perform a two-party secure set intersection, obtaining intersection I2. The sample ID intersection I2 contains the virtual sample IDs. Participant A and Participant B both know the IDs in the sample ID intersection I2. Because the sample ID intersection I2 contains the virtual sample IDs, neither Participant A nor Participant B knows the other's exact sample ID.
[0119] In some embodiments of the present invention, in order to ensure that the intersection I2 of the sample IDs includes virtual sample IDs, the virtual sample IDs generated by participant A and participant B need to have an intersection with each other's real sample IDs. To ensure this, participant A and participant B can be required to randomly generate virtual sample IDs within the same ID value space. For example, participant A and participant B can randomly generate mobile phone numbers within the same mobile phone number range.
[0120] Step 304: The business data processing device determines a first key set that matches the first business party device and a second key set that matches the second business party device.
[0121] Step 305: The business data processing device processes the intersection of the sample set using the first key set and the second key set to determine the training samples that match the business data processing system.
[0122] The process of processing the intersection of the sample sets using the first key set and the second key set to determine training samples matching the business data processing system includes: exchanging different public keys to corresponding business terminals based on the first key set and the second key set to obtain initial parameters for the federated model; determining the number of samples matching the business data processing system; and processing the intersection of the sample sets according to the number of samples to determine training samples matching the business data processing system. Specifically, processing the intersection of the sample sets according to the number of samples corresponding to the mini-batch gradient descent algorithm includes selecting between batch processing and mini-batch processing. In particular, participant A and participant B generate their respective public and private key pairs. and The public key is then sent to the other party. No participant discloses their private key to any other participant. The public key is used to perform additive homomorphic encryption on intermediate computation results, for example, using the Paillier homomorphic encryption algorithm.
[0123] Participants A and B each generate a random mask. and No participant will disclose any random mask in plaintext to other participants. Participants A and B will each randomly initialize their local model parameters. and In the SGD algorithm, to reduce computation, accelerate model training, and achieve better training results, each SGD iteration processes only a small batch (mini-batch) of training data; for example, each mini-batch includes 64 training samples. In this case, participant A and participant B need to coordinate the batching and selection of training samples so that the training samples selected by the two participants in each iteration are aligned.
[0124] Step 306: The business data processing device trains the federated model corresponding to the business data processing system based on training samples that match the business data processing system, and determines the parameters of the federated model.
[0125] In some embodiments of the present invention, the federated model corresponding to the business data processing system is trained based on training samples that match the business data processing system to determine the parameters of the federated model. This can be achieved in the following ways:
[0126] The training samples matched with the business data processing system are substituted into the loss function corresponding to the federated model of the business data processing system; the model update parameters corresponding to the federated model of the business data processing system are determined when the loss function satisfies the corresponding convergence condition; and the model parameters of the federated model are determined based on the model update parameters corresponding to the federated model. To adjust the influence of the virtual samples on the model parameters of the federated model, when training the federated model of the business data processing system based on the training samples matched with the business data processing system, the residuals corresponding to the virtual samples matched by the model update parameters can be adjusted through the first business device, or the target application process can be triggered; and the residuals corresponding to the virtual samples matched by the model update parameters can be adjusted based on the target application process. The SGD-based model training method requires multiple gradient descent iterations, each of which can be divided into two stages: (i) forward calculation of the model output and residuals (also called gradient multipliers); (ii) backpropagation and calculation of the gradient of the model's loss function with respect to the model parameters, and updating the model parameters using the calculated gradient. The above iterations are repeated until a stopping condition is met (e.g., the model parameters converge, or the model loss function converges, or the maximum allowed number of training iterations is reached, or the maximum allowed model training time is reached).
[0127] When the residuals corresponding to the virtual samples matched by the model update parameters are adjusted through the first service provider's device, participant A and participant B are at the intersection of the samples. Federated model training is performed based on the training samples, with participant A responsible for selecting the batches and mini-batches of training samples. To protect participant A's sample IDs, participant A can select some real sample IDs and some virtual sample IDs from the sample intersection I to form a mini-batch. For example, 32 virtual samples and 32 real samples form a mini-batch of 64 samples. Where m represents the m-th sub-batch. Figure 7 This is an optional flowchart illustrating the business data processing method in an embodiment of the present invention. (Refer to...) Figure 7 When participant A and participant B train a federated model based on the intersection of the samples, the business data processing may include the following steps:
[0128] Step 701: Generate a set of keys that match the devices of different business parties.
[0129] Step 702: Transmit public key information.
[0130] Step 703: Participants A and B randomly initialize model parameters W1 and W2 respectively, and generate random masks R2 and R1.
[0131] Step 704: Participants A and B respectively perform homomorphic encryption on random masks R2 and R1, and send them to each other.
[0132] Step 705: Participant A calculates .
[0133] in, This is the m-th batch of training samples owned by participant A. Participant A generates a random number r1 and sends it in step 705. Give it to participant B.
[0134] Step 706: Participant A decrypts and obtains... Participant B decrypted and obtained .
[0135] Step 707: Participant A and Participant B perform calculations respectively.
[0136] Therefore, it can be obtained ,and .
[0137] Step 708: Participant A calculates S, loss function, and gradient multiplier. (Also known as residual)
[0138] Where S and gradient multipliers These are all row vectors, with each element corresponding to a sample in a mini-batch. For example, participant A calculates z = S1 + S2 and the output of the logistic regression (LogR) model. :
[0139]
[0140] And the gradient operator (also known as the residual). .
[0141] In this process, participant A selects only one gradient multiplier corresponding to a real sample in a mini-batch to calculate the gradient and update the model parameters. Participant A sets all elements in the gradient multiplier δ corresponding to the virtual sample to zero. For example, participant A generates a row vector. = [0, δ1, 0, δ3, … ], where it is assumed that the first and third samples are virtual samples.
[0142] In some embodiments of the present invention, participant A calculates... Where N is the small batch The number of real samples. This is for calculating the average gradient of the mini-batch. Participant A uses pk1 to... Encrypt, obtain pk1 ( ).
[0143] In step 707, participant A sends... Give it to participant B.
[0144] Step 708 Participant B calculates Here we assume It is a mini-batch data matrix (each row of the matrix represents a sample). Where r... B It is a random vector generated by participant B.
[0145] Step 709: Participant B sends Give it to participant A.
[0146] Step 710: Participants transmit S2.
[0147] In some embodiments of the present invention, reference continues to be made. Figure 8 ,exist Figure 7After the steps shown are executed, the target application process is triggered. Based on the target application process, when adjusting the residuals corresponding to the virtual samples matched by the model update parameters, participant A and participant B perform federated model training on the sample intersection I2, and participant A is responsible for selecting the batches and mini-batches of training samples. To protect the sample IDs of participant A, participant A can select some real sample IDs and some virtual sample IDs from the intersection I to form a mini-batch. For example, 32 virtual samples and 32 real samples form a mini-batch of 64 samples. .
[0148] Steps 701-710 of the federated model training process Figure 7 The steps described are exactly the same and can be executed iteratively. For example... Figure 7 As shown, in step 711, participant A calculates S, the loss function, and the gradient multiplier δ (also called the residual). Here, z and the gradient multiplier δ are both row vectors, with each element corresponding to a sample in a mini-batch. For example, participant A calculates z = S1 + S2 and calculates the output of the logistic regression (LogR) model. :
[0149]
[0150] And the gradient operator (also known as the residual). The subsequent steps require the assistance of participant C. These include:
[0151] Step 712: Participant A sends gradient multipliers Give it to participant C.
[0152] In this scenario, participant C sets all elements corresponding to the virtual samples in the received gradient multipliers δ to zero, for example, =[0, δ1, 0, δ3, … ], where it is assumed that the first and third samples are dummy samples, and participant C knows the mini-batch of the samples. The sample ID (either an encrypted sample ID or a hashed sample ID) is used. Participant C can identify the virtual sample through the intersection I1.
[0153] In some embodiments of the present invention, participant C calculates... Where N is the small batch
[0154] The number of real samples in the batch. N is chosen as the minimum batch size. The number of real samples is used to calculate the mini-batch average gradient and improve data processing speed. Participant C uses their public key pk3 to... Encrypt, obtain .
[0155] Step 713: Participant C sends Give to participant A and participant B.
[0156] Step 714: Participant A calculates and send Give it to participant C.
[0157] Where, r A This is a random vector generated by participant A. Correspondingly, participant B calculates... and send Give it to participant C. Where r B It is a random vector generated by participant B.
[0158] Step 715: Participant A decrypts .
[0159] Step 716: Participant A sends Give it to participant B.
[0160] In step 715, participant C... Decrypt and send. Give it to participant A. Correspondingly, participant C gives it to... Decrypt and send. Give it to participant B.
[0161] In some embodiments of the present invention, participant A calculates the gradient of the model loss function with respect to the model parameters W1. For a logistic regression (LogR) model, the gradient of the model loss function with respect to the model parameters W1 is:
[0162] Participant A updates its model parameters locally: .in It is the learning rate, for example, .
[0163] Participant B calculates the gradient of the model loss function with respect to the model parameters W2. For the logistic regression (LogR) model, the gradient of the model loss function with respect to the model parameters W2 is: Participant B updates the model parameters locally: .in It is the learning rate, for example, .
[0164] In some embodiments of the present invention, participant A and participant B may use different learning rates to update their local model parameters respectively.
[0165] In some embodiments of the present invention, when the business party equipment (business data holder) of the business data processing system migrates or reconfigures the system, it can purchase blockchain network services to obtain the information stored in the blockchain network and realize a fast business data processing device. For example, business participants A and B in the embodiments can both purchase blockchain network services and become corresponding nodes in the blockchain network through the deployed business party equipment. Virtual samples, sample set intersections, first key sets, second key sets, federated model parameters, and target business data can be sent to the blockchain network so that the nodes of the blockchain network fill the virtual samples, sample set intersections, first key sets, second key sets, federated model parameters, and target business data into a new block. When the consensus on the new block is consistent, the new block is appended to the end of the blockchain. In some embodiments of the present invention, when a data synchronization request is received from other nodes in the blockchain network, the permissions of the other nodes can be verified in response to the data synchronization request; when the permissions of the other nodes are verified, the current node is controlled to synchronize data with the other nodes, so as to enable the other nodes to obtain virtual samples, the intersection of sample sets, the first key set, the second key set, federated model parameters, and target business data.
[0166] In some embodiments of the present invention, in response to a query request, the query request can be parsed to obtain a corresponding object identifier; based on the object identifier, permission information within the target block in the blockchain network can be obtained; the matching between the permission information and the object identifier can be verified; when the permission information matches the object identifier, the corresponding virtual sample, sample set intersection, first key set, second key set, federated model parameters, and target business data can be obtained in the blockchain network; in response to the query instruction, the obtained corresponding virtual sample, sample set intersection, first key set, second key set, federated model parameters, and target business data can be pushed to the corresponding client.
[0167] In this invention, embodiments can be implemented using cloud technology. Cloud technology refers to a hosting technology that unifies hardware, software, and network resources within a wide area network (WAN) or local area network (LAN) to achieve data computation, storage, processing, and sharing. It can also be understood as a general term for network technologies, information technologies, integration technologies, management platform technologies, and application technologies based on cloud computing business models. The backend services of network systems require substantial computing and storage resources, such as video websites, image websites, and many portal websites; therefore, cloud technology needs cloud computing as its support.
[0168] It's important to note that cloud computing is a computing model that distributes computing tasks across a resource pool comprised of numerous computers, enabling various application systems to access computing power, storage space, and information services as needed. The network providing these resources is called the "cloud." From the user's perspective, resources in the "cloud" are infinitely scalable, readily available, and can be used on demand, expanded at any time, and paid for based on usage. As the foundational providers of cloud computing capabilities, they establish cloud resource pool platforms, often referred to as cloud platforms or Infrastructure as a Service (IaaS). These platforms deploy various types of virtual resources within the resource pool for external customers to choose from. The cloud resource pool primarily includes: computing devices (which can be virtualized machines containing operating systems), storage devices, and network devices.
[0169] Combined with the preceding sequence Figure 1 As shown, the data processing method provided in this embodiment of the invention can be implemented through corresponding cloud devices. For example, different business devices (including business device 10-1 and business device 10-2) can be directly connected to business device 200 located in the cloud. It is worth noting that business device 200 can be a physical device or a virtualized device.
[0170] The following further explains the business data processing method provided in this application in different real-time scenarios. In the financial risk control scenario, cross-industry cooperation scenarios include business devices corresponding to credit company A and bank B, respectively. Credit company A receives loan credit verification from users as shown in Table 1:
[0171] Table 1
[0172]
[0173] In order to further control risks, credit company A wants to screen out users with low or unknown deposits before officially issuing loans, and the users' deposit information is outside the scope of credit company A's business.
[0174] Meanwhile, Bank B possesses a set of user IDs for those with deposits exceeding 10,000 yuan, among which... See Table 2 for reference.
[0175] Table 2
[0176]
[0177] Bank B can use the data from credit company A for further risk control, namely, calculation. To obtain final advice, please refer to the following: Figure 9 , Figure 9An optional flowchart illustrating the business data processing method provided in this embodiment of the invention may include the following steps:
[0178] Step 901: The business data processing device acquires a first sample set that matches the first business party device in the business data processing system, and a second sample set that matches the second business party device in the business data processing system.
[0179] Step 902: Determine the virtual sample that matches the first business device A.
[0180] Step 903: Determine the intersection of sample sets A and B.
[0181] Step 904: Exchange the public keys in the key set to determine the training samples.
[0182] Step 905: Train the federated model corresponding to the business data processing system and determine the parameters of the federated model.
[0183] Step 906: Deploy the trained federated model to enable business data processing.
[0184] This invention provides an embodiment that obtains a first sample set matching a first business device in a business data processing system, and a second sample set matching a second business device in the same system. The business data processing system includes at least a first business device and a second business device. Based on the first sample set, virtual samples matching the first business device are determined. The intersection of the sample sets is determined based on the virtual samples matching the first business device and the second sample set matching the second business device. A first key set matching the first business device and a second key set matching the second business device are determined. The intersection of the sample sets is processed using the first and second key sets to determine training samples matching the business data processing system. Based on the training samples matching the business data processing system, a federated model corresponding to the business data processing system is trained to determine the federated model parameters. Therefore, while ensuring no data exchange, the computational cost is reduced, the task of determining the federated model parameters is completed, the efficiency of business data processing is improved, business data processing can be implemented in mobile devices, saving user waiting time and ensuring that privacy data is not leaked.
[0185] The above description is merely an embodiment of the present invention and is not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A service data processing method characterized by, The method includes: Obtain a first sample set that matches a first business party device in the business data processing system, and a second sample set that matches a second business party device in the business data processing system, wherein the business data processing system includes at least the first business party device and the second business party device; Based on the first sample set, determine the virtual sample that matches the first business device; Based on the virtual sample matching the first service provider's device and the second sample set matching the second service provider's device, the intersection of the sample sets is determined; Determine a first set of keys that matches the first service provider's device and a second set of keys that matches the second service provider's device; By processing the intersection of the sample set using the first key set and the second key set, training samples that match the business data processing system are determined. The training samples that match the business data processing system are substituted into the loss function corresponding to the federated model of the business data processing system. Determine the model update parameters of the federated model corresponding to the business data processing system when the loss function satisfies the corresponding convergence condition. The residuals corresponding to the virtual samples matched by the model update parameters are adjusted through the first service provider's device; or, the target application process is triggered, and the residuals corresponding to the virtual samples matched by the model update parameters are adjusted based on the target application process. Based on the model update parameters corresponding to the federated model, the model parameters of the federated model are determined.
2. The method of claim 1, wherein, The acquisition of a first sample set matching a first business party device in the business data processing system, and a second sample set matching a second business party device in the business data processing system, includes: Based on the service type of the first service provider's device in the business data processing system, a sample set matching the first service provider's device is determined; Based on the service type of the second service provider's equipment in the business data processing system, a sample set matching the second service provider's equipment is determined; The sample set matching the first service provider's device and the sample set matching the second service provider's device are subjected to sample alignment processing to obtain a first sample set matching the first service provider's device and a second sample set matching the second service provider's device.
3. The method of claim 1, wherein, The step of determining a virtual sample matching the first service provider's device based on the first sample set includes: The first service provider's device determines the value parameters and distribution parameters of the sample IDs in the first sample set; Based on the value parameters and distribution parameters of the sample IDs in the first sample set, a virtual sample matching the first service provider's device is generated.
4. The method of claim 3, wherein, The step of determining the intersection of sample sets based on the virtual sample set matching the first service provider's device and the second sample set matching the second service provider's device includes: The virtual samples are merged with the first sample set to form a first sample set containing virtual samples; Iterate through the first set of samples containing virtual samples to determine the set of IDs for the virtual samples; Traverse the second sample set to determine the intersection of the first and second sample sets containing virtual samples.
5. The method of claim 1, wherein, The step of determining a virtual sample matching the first service provider's device based on the first sample set includes: The target application process is triggered in response to the device type of the first service provider's device and the second service provider's device. Based on the target application process, determine the data intersection set between the first sample set and the second sample set; The first virtual sample set corresponding to the first business party device and the second virtual sample set corresponding to the second business party device are obtained through the target application process. Based on the data intersection set of the first sample set and the second sample set, the first virtual sample set and the second virtual sample set, a virtual sample matching the first service provider's device is determined through the target application process.
6. The method of claim 5, wherein, The step of determining the intersection of sample sets based on the virtual sample set matching the first service provider's device and the second sample set matching the second service provider's device includes: The virtual samples are merged with the first sample set to form a first sample set containing virtual samples; Iterate through the first set of samples containing virtual samples to determine the set of IDs for the virtual samples; Traverse the second sample set to determine the intersection of the first and second sample sets containing virtual samples.
7. The method of claim 1, wherein, The step of processing the intersection of the sample set using the first key set and the second key set to determine the training samples that match the business data processing system includes: Based on the first key set and the second key set, different public keys are exchanged to the corresponding business device to obtain the initial parameters of the federated model; Determine the number of samples that match the business data processing system; The intersection of the sample sets is processed based on the number of samples to determine training samples that match the business data processing system.
8. The method of claim 1, wherein, The method further includes: When the first business device and / or the second business device use the trained federated model to process business data, the virtual samples are set to zero to adapt to the corresponding business data processing environment.
9. The method according to any one of claims 1 to 7, characterized in that, The method further includes: The virtual sample, the intersection of the sample sets, the first key set, the second key set, and the model parameters of the federated model are sent to the cloud network so that the corresponding business device can obtain the virtual sample, the intersection of the sample sets, the first key set, the second key set, and the model parameters of the federated model from the cloud network.
10. A service data processing apparatus characterized by comprising: The device includes: An information transmission module is used to acquire a first sample set that matches a first business party device in the business data processing system, and a second sample set that matches a second business party device in the business data processing system, wherein the business data processing system includes at least a first business party device and a second business party device. The information processing module is used to determine a virtual sample that matches the first business device based on the first sample set. The information processing module is further configured to determine the intersection of sample sets based on the virtual sample matching the first business device and the second sample set matching the second business device; The information processing module is further configured to determine a first key set matching the first business party device and a second key set matching the second business party device; The information processing module is further configured to process the intersection of the sample set using the first key set and the second key set to determine training samples that match the business data processing system; The information processing module is further configured to: substitute training samples matching the business data processing system into the loss function corresponding to the federated model of the business data processing system; determine the model update parameters corresponding to the federated model of the business data processing system when the loss function satisfies the corresponding convergence condition; adjust the residuals corresponding to the virtual samples matched by the model update parameters through the first business device; or, trigger the target application process, adjust the residuals corresponding to the virtual samples matched by the model update parameters based on the target application process; and determine the model parameters of the federated model based on the model update parameters corresponding to the federated model.
11. The apparatus according to claim 10, characterized in that, The information processing module is further configured to determine a sample set that matches the first business party device based on the business type of the first business party device in the business data processing system. Based on the service type of the second service provider's equipment in the business data processing system, a sample set matching the second service provider's equipment is determined; The sample set matching the first service provider's device and the sample set matching the second service provider's device are subjected to sample alignment processing to obtain a first sample set matching the first service provider's device and a second sample set matching the second service provider's device.
12. The apparatus according to claim 10, characterized in that, The information processing module is also used to determine the value parameters and distribution parameters of the sample ID in the first sample set; Based on the value parameters and distribution parameters of the sample IDs in the first sample set, a virtual sample matching the first service provider's device is generated.
13. The apparatus according to claim 12, characterized in that, The information processing module is further configured to merge the virtual sample with the first sample set to form a first sample set containing the virtual sample. Iterate through the first set of samples containing virtual samples to determine the set of IDs for the virtual samples; Traverse the second sample set to determine the intersection of the first and second sample sets containing virtual samples.
14. The apparatus according to claim 10, characterized in that, The information processing module is also used to trigger the target application process in response to the device type of the first business device and the second business device; Based on the target application process, determine the data intersection set between the first sample set and the second sample set; The first virtual sample set corresponding to the first business party device and the second virtual sample set corresponding to the second business party device are obtained through the target application process. Based on the data intersection set of the first sample set and the second sample set, the first virtual sample set and the second virtual sample set, a virtual sample matching the first service provider's device is determined through the target application process.
15. The apparatus according to claim 14, characterized in that, The information processing module is further configured to merge the virtual sample with the first sample set to form a first sample set containing the virtual sample. Iterate through the first set of samples containing virtual samples to determine the set of IDs for the virtual samples; Traverse the second sample set to determine the intersection of the first and second sample sets containing virtual samples.
16. The apparatus according to claim 10, characterized in that, The information processing module is also used to exchange different public keys to the corresponding business device based on the first key set and the second key set, so as to obtain the initial parameters of the federated model; Determine the number of samples that match the business data processing system; The intersection of the sample sets is processed based on the number of samples to determine training samples that match the business data processing system.
17. An electronic device, comprising: The electronic device includes: Memory, used to store executable instructions; A processor, when executing executable instructions stored in the memory, implements the business data processing method according to any one of claims 1 to 9.
18. A computer program product comprising computer programs or instructions, characterized in that, When the computer program or instructions are executed by the processor, they implement the business data processing method according to any one of claims 1 to 9.
19. A computer-readable storage medium storing executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform operations comprising: When the executable instructions are executed by the processor, they implement the business data processing method according to any one of claims 1 to 9.