Data screening method, device, equipment and readable storage medium

A technology for data screening and storage media, which is applied in electronic digital data processing, digital data information retrieval, special data processing applications, etc. It can solve the problem of not being able to select good training data or participants in horizontal federated learning.

Active Publication Date: 2021-07-09
WEBANK (CHINA)
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The main purpose of the present invention is to provide a data screening method, device, equipment and readable storage medium, aiming at solving the problem that in the existing horizontal federated learning, the coordinator cannot select good training data or participants' technology for horizontal federated learning question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data screening method, device, equipment and readable storage medium
  • Data screening method, device, equipment and readable storage medium
  • Data screening method, device, equipment and readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0071] Based on the first embodiment, a second embodiment of the command line string processing method of the present invention is proposed. In this embodiment, step S20 also includes:

[0072] Step S23, determining the total reconstruction error corresponding to the data set based on each reconstruction error in the reconstruction error set;

[0073] In this embodiment, the data screening result corresponding to the participant may also be determined according to the total reconstruction error corresponding to the data set owned by the participant. An alternative participant has many pieces of data (that is, has a training data set), each training data corresponds to a reconstruction error, and the total reconstruction error corresponding to the data set of an alternative participant can be The minimum reconstruction error, or the maximum reconstruction error, or the average reconstruction error, or the median of the reconstruction errors among the reconstruction errors corre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data screening method, comprising the following steps: the first participant of the present invention receives the data detection model sent by the coordinator, and detects the data set owned by the first participant based on the data detection model, A reconstruction error set corresponding to the data set is obtained, and then based on the reconstruction error set and the error range configured by the coordinating party, a data screening result corresponding to the first participant is obtained. The invention also discloses a device, equipment and a readable storage medium. Use the data detection model to detect the data sets owned by the participants, so as to screen out the participants and training data that have the same statistical distribution as the training set of the data detection model for federated training. The training data of these participants are similar but different. It can make full use of the diversity of training data owned by the participants, maximize the advantages of federated learning, and train better models.

Description

technical field [0001] The present invention relates to the technical field of machine learning, in particular to a data screening method, device, equipment and readable storage medium. Background technique [0002] In horizontal federated learning, if the training data owned by other participants cannot help themselves, then the participant will not gain anything from participating in horizontal federated learning. Therefore, before training the horizontal federated learning model, you need to choose Participants in horizontal federated learning. [0003] In the prior art, there is a common scheme: the coordinator randomly selects the participants of the horizontal federated learning from the candidate participants (or potential participants), expecting to select training data with relatively balanced statistical distribution. This scheme is very simple, but there is no guarantee that training data with a relatively balanced statistical distribution can be selected. Becau...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06F16/215
CPCG06F16/215G06F18/214
Inventor 程勇刘洋陈天健
Owner WEBANK (CHINA)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products