Training data set generation method and device, electronic equipment and storage medium

A technology for training data sets and source data, applied in the field of data processing, can solve problems affecting model iterative optimization, low labeling accuracy, and huge data volume, and achieve the effects of reducing workload, improving labeling accuracy, and improving understanding and awareness

Active Publication Date: 2021-03-09
创新奇智(合肥)科技有限公司
View PDF15 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the preparation of image classification data is mainly through manual classification and labeling of the collected full-sample images, but the amount of data that needs to be processed at one time may be huge, and full manual labeling often results in low labeling accuracy and high labeling costs. Iterative optimization that affects subsequent models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training data set generation method and device, electronic equipment and storage medium
  • Training data set generation method and device, electronic equipment and storage medium
  • Training data set generation method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

[0045] Like numbers and letters denote similar items in the following figures, so that once an item is defined in one figure, it does not require further definition and explanation in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", etc. are only used to distinguish descriptions, and cannot be understood as indicating or implying relative importance.

[0046] figure 1 It is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device 100 may be used to execute the method for generating a training data set provided in the embodiment of the present application. like figure 1 As shown, the electronic device 100 includes: one or more processors 102 , and one or more memori...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a training data set generation method and device, electronic equipment and a storage medium. The method comprises the steps of obtaining a classified source data set and an unclassified target data set; extracting a first feature vector set of the source data set and a second feature vector set of the target data set through a feature extractor; determining a class center feature vector corresponding to the source data set according to the first feature vector set, and determining a clustering label of the target data set and an average feature vector in a clustering cluster according to the second feature vector set; iteratively optimizing the feature extractor, so that the overall difference between the feature vectors of samples in the source data set and the feature vectors of a class center and the overall difference between the feature vectors of the elements in a clustering cluster and average feature vectors in the clustering cluster are made to be minimum; and obtaining a training data set according to the clustering label of the target data set and the elements in the clustering cluster. According to the method, the workload of manual labeling can be reduced, the manual labeling cost is reduced, and the labeling precision is improved.

Description

technical field [0001] The present application relates to the technical field of data processing, in particular to a method and device for generating a training data set, electronic equipment, and a computer-readable storage medium. Background technique [0002] In the classification and identification of commodities in retail scenarios, it is often necessary to face the packaging differentiation of different product lines, the rapid iterative update of product packaging, as well as the differentiation of image features in the image collection process, the huge number of product categories, and the redundancy of some category data Therefore, when starting a new project, it is difficult to prepare the classification model training data with a small amount of data and a fast and concise algorithm, and it is necessary to manually classify and label a large amount of data to form an initial training set; then , how to pre-divide massive unlabeled data, improve the quality of loc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213G06F18/214
Inventor 张发恩纪双西
Owner 创新奇智(合肥)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products