Data selection method in federated learning scene

A data selection and scenario technology, applied in machine learning, digital data protection, electronic digital data processing, etc., can solve the problems that data cannot be directly accessed by third parties, reduce model performance, and not be considered, so as to achieve efficient and efficient selection strategies. Efficient and accurate results of data selection strategies

Pending Publication Date: 2021-03-09
德清阿尔法创新研究院
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But they cannot be directly used in federated learning: 1) Existing methods require direct access to all training samples, while in federated systems, data cannot be directly accessed by third parties
2) Computing the importance of each sample directly creates an unacceptable overhead for actors with limited resources
3) Existing methods do not consider the impact of non-IID or wrong samples on the sample selection strategy, and may give higher importance to wrong samples, thereby reducing model performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data selection method in federated learning scene

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention will be described in detail below in conjunction with accompanying drawing: figure 1 As shown, the data selection method in the federated learning scenario proposed by the present invention is mainly divided into the following modules: filtering out users and data related to tasks, user selection before training, user and data selection during training, and model training.

[0019] (1) Task-related user and data filtering: when a FL task arrives, the server first calculates each user C k , the label set Y of k ∈ [K] k ={y k |(x k ,y k )∈D k} and the intersection of the target label set Y {(x k ,y k )|y k ∈ Y k ∩Y} to filter out users with data of the target category. If the number of samples in the intersection set exceeds the minimum number of target models|{(x k ,y k )|y k ∈ Y k ∩Y}|>v, the user is relevant. In order to meet the needs of privacy protection, we use privacy protection intersection technology (PSI).

[0020] (2) User ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A data selection method in a federated learning scene comprises the steps of filtering out users and data related to tasks, user selection before training, user and data selection in the training process and model training. According to the invention, a vector sketch and a random response mechanism are adopted, a user selection strategy is efficient, and privacy protection is achieved; meanwhile,the server side log information is adopted to dynamically select the user; data is selected based on the gradient upper bound value, and the influence of error data on the gradient is considered, so that the data selection strategy is efficient and accurate.

Description

technical field [0001] The data selection method in the federated learning scene involved in the present invention belongs to the field of data analysis and data quality evaluation. Background technique [0002] How to obtain large amounts of high-quality datasets has become a common bottleneck for many machine learning models and AI applications. This is not only because collecting and labeling large numbers of samples is expensive, but also because privacy concerns hinder data sharing in many fields, such as medicine and economics. The emergence of federated learning makes it possible for end users to jointly train network models using local data. In the process of federated learning, the quality of the user's local data affects the performance of the global model, and low-quality data (eg, wrongly labeled data, non-uniformly distributed data) will seriously hinder the global model from achieving good results. [0003] The present invention aims to select a group of high...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/60G06F21/62G06N20/00
CPCG06F21/602G06F21/6245G06N20/00
Inventor 张兰李向阳李安然
Owner 德清阿尔法创新研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products