Data selection method in federated learning scene

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A data selection and scenario technology, applied in machine learning, digital data protection, electronic digital data processing, etc., can solve the problems that data cannot be directly accessed by third parties, reduce model performance, and not be considered, so as to achieve efficient and efficient selection strategies. Efficient and accurate results of data selection strategies

Pending Publication Date: 2021-03-09

德清阿尔法创新研究院

View PDF0 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

But they cannot be directly used in federated learning: 1) Existing methods require direct access to all training samples, while in federated systems, data cannot be directly accessed by third parties

2) Computing the importance of each sample directly creates an unacceptable overhead for actors with limited resources

3) Existing methods do not consider the impact of non-IID or wrong samples on the sample selection strategy, and may give higher importance to wrong samples, thereby reducing model performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0018] The present invention will be described in detail below in conjunction with accompanying drawing: figure 1 As shown, the data selection method in the federated learning scenario proposed by the present invention is mainly divided into the following modules: filtering out users and data related to tasks, user selection before training, user and data selection during training, and model training.

[0019] (1) Task-related user and data filtering: when a FL task arrives, the server first calculates each user C k , the label set Y of k ∈ [K] k ={y k |(x k ,y k )∈D k} and the intersection of the target label set Y {(x k ,y k )|y k ∈ Y k ∩Y} to filter out users with data of the target category. If the number of samples in the intersection set exceeds the minimum number of target models|{(x k ,y k )|y k ∈ Y k ∩Y}|>v, the user is relevant. In order to meet the needs of privacy protection, we use privacy protection intersection technology (PSI).

[0020] (2) User ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A data selection method in a federated learning scene comprises the steps of filtering out users and data related to tasks, user selection before training, user and data selection in the training process and model training. According to the invention, a vector sketch and a random response mechanism are adopted, a user selection strategy is efficient, and privacy protection is achieved; meanwhile,the server side log information is adopted to dynamically select the user; data is selected based on the gradient upper bound value, and the influence of error data on the gradient is considered, so that the data selection strategy is efficient and accurate.

Description

technical field [0001] The data selection method in the federated learning scene involved in the present invention belongs to the field of data analysis and data quality evaluation. Background technique [0002] How to obtain large amounts of high-quality datasets has become a common bottleneck for many machine learning models and AI applications. This is not only because collecting and labeling large numbers of samples is expensive, but also because privacy concerns hinder data sharing in many fields, such as medicine and economics. The emergence of federated learning makes it possible for end users to jointly train network models using local data. In the process of federated learning, the quality of the user's local data affects the performance of the global model, and low-quality data (eg, wrongly labeled data, non-uniformly distributed data) will seriously hinder the global model from achieving good results. [0003] The present invention aims to select a group of high...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F21/60G06F21/62G06N20/00

CPCG06F21/602G06F21/6245G06N20/00

Inventor 张兰李向阳李安然

Owner 德清阿尔法创新研究院

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Data selection method in federated learning scene

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology