Data selection method and device

A technology for data selection and module selection, applied in the field of data processing, can solve the problem of sample data skew, affecting the effect of model training and optimization, and achieve the effect of avoiding skew and improving diversity

Pending Publication Date: 2021-12-03
ALIBABA GRP HLDG LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing active learning mainly considers the uncertainty of the sample itself (that is, the degree to which the sample cannot be effectively identified and distinguished by the model) and the correlation between the sample and other samples (that is, the sample The degree of approximation between them), when it is necessary to select a large number of samples of different categories, if the number of samples of a certain type and the number of uncertainties are large, the sample selected through active learning tends to be similar to that type of sample, resulting in the sample data Tilt, which affects the optimization effect of model training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data selection method and device
  • Data selection method and device
  • Data selection method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0054] The embodiment of the present invention provides a data selection method, which is an optimization and improvement for active learning in the model training process, so that the samples selected through active learning are more diverse, and after manual labeling, the model can be trained more effectively , to improve the training effect of the model. The specific steps of this method are as follows figure 1 As shown, the method includes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data selection method and device, relates to the technical field of data processing, and mainly aims at optimizing sample selection in an active learning process and avoiding the problem of selection inclination of sample data. The main technical scheme of the invention is as follows: obtaining a candidate sample set, wherein the candidate sample set comprises a plurality of candidate samples belonging to different categories; according to the candidate samples belonging to the same category, calculating the uncertainty of the category; according to the uncertainty of different categories, calculating category uncertainty distribution of the candidate sample set; and selecting a first candidate sample in a first category to enter a sample pool according to the category uncertainty distribution of the candidate sample set.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a data selection method and device. Background technique [0002] With the popularity of artificial intelligence, deep learning has made great breakthroughs in various practical applications. Many problems are no longer limited, but obtaining a large amount of accurately labeled data still requires high costs, and model training still requires a lot of time. And energy, these have also become the limitations of current deep learning. Active learning can use fewer labeled samples to achieve higher model learning accuracy by screening unlabeled data. [0003] Active learning is a subfield of artificial intelligence, also known as query learning and optimal experimental design in the field of statistics. The algorithm consists of two basic modules: a learning module and a selection strategy. Active learning uses the "selection strategy" to actively select some samples fro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/241G06F18/214
Inventor 付彬孙健唐呈光李杨赵学敏
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products