Method and device for sampling unbalanced data under federated learning

A technology of balanced data and federation, applied in the field of data processing, can solve problems such as difficulty in ensuring update and data balance of participants, and achieve the effect of automatic balance and update

Active Publication Date: 2021-02-05
BEIJING UNIV OF POSTS & TELECOMM
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008]Because in the real scenario of federated learning collaborative modeling, the data distribution of each participant is different, and the unbalanced data processing method adopted will also There are differences, and it is difficult to ensure the data balance of each participant
And when new data is added, it is difficult to ensure that all participants can update in time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for sampling unbalanced data under federated learning
  • Method and device for sampling unbalanced data under federated learning
  • Method and device for sampling unbalanced data under federated learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0031] Federated learning is a distributed machine learning method aimed at building machine learning models through decentralized and independent data. It avoids the conflict of interest and the risk of privacy data leakage caused by centralized data, and combines encryption technology to further protect data security and promote the promotion and implementation of artificial intelligence technology.

[0032] Under federated learning, each participant trains a model based on local data, uploads the encrypted model parameters to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an unbalanced data sampling method and device under federated learning, and the device comprises a data monitoring module, a data balancing module, a local training module, anda central server. An unbalanced data balancing scheme based on mixed sampling comprises the steps: according to the unbalance proportions of different data sets, obtaining a balance data set based ona mixed sampling method combining a few synthesized samples and a clustering down-sampling integration method; in combination with data set real-time monitoring, automatically setting an unbalanced data set processed and timely updated in a federated learning scene; therefore, through an unbalanced data set processing method combining a data level and an integration mode, the capacity of the dataset is fully utilized, and automatic equalization and updating of the unbalanced data set are realized by detecting the data set through data change detection.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a sampling method and device for unbalanced data under federated learning. Background technique [0002] At present, the problem of unbalanced data under federated learning mainly relies on the participants to consciously carry out local data before joint modeling to ensure the quality of local data. There are three main methods to deal with the problem of sample imbalance, (1) to obtain a balanced data set by modifying the data set; (2) to reduce the tendency to the majority class by modifying the machine algorithm; (3) to modify the previous One of the two methods is combined with ensemble learning to obtain a powerful ensemble classifier: [0003] (1) Data-level approach. This method obtains a sample-balanced data set by undersampling, that is, deleting the majority category to make it the same number as the minority category, or oversampling, that is, adding the min...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N20/00
CPCG06N20/00G06F18/23213G06F18/214G06F18/24G06F18/25
Inventor 李剑欧中洪宋美娜
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products