Generation method and device for training samples

A technology for training samples and samples, applied in the field of information processing, can solve the problems of unguaranteed accuracy, long labeling time period, and difficulty in obtaining training samples for manual labeling, so as to reduce workload and improve labeling efficiency.

Active Publication Date: 2018-01-23
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF6 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it is difficult to obtain manually labeled training samples, the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generation method and device for training samples
  • Generation method and device for training samples
  • Generation method and device for training samples

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0029] figure 1 It is a flow chart of a method for generating training samples provided in the first embodiment of the present invention. The method of this embodiment can be executed by a device for generating training samples, which can be implemented by means of hardware and / or software, and generally Can be integrated in the server. The method of this embodiment specifically includes:

[0030] 110. Use marked samples to train the benchmark scoring model to generate an adjusted training model, wherein the marked samples are pre-marked with sample scores.

[0031] In this embodiment, a semi-supervised sample labeling method is proposed, that is, firstly, the benchmark score model is trained using labeled samples manually marked with sample scores to generate an adjusted training model, and then based on the adjusted training model , to complete the process of generating sample scores for unlabeled samples.

[0032] As described in the background technology, the method of ...

no. 2 example

[0052] Figure 2a is a flowchart of a method for generating training samples according to the second embodiment of the present invention. This embodiment is optimized based on the above embodiments. In this embodiment, according to the correlation between the sample score of the labeled sample and the model score of the labeled sample, and the relationship between the sample to be labeled The corresponding model score, determining the labeled sample score corresponding to the sample to be labeled is specifically optimized as follows: according to the sample score and model score corresponding to each labeled sample, obtain the labeled sample corresponding to the same target sample score target model score; according to the proportion of each target model score in all target model scores, determine the model score frequency distribution curve corresponding to the target sample score; according to the model score frequency distribution curve, obtain the The high-frequency model...

no. 3 example

[0080] Figure 3a is a flowchart of a method for generating training samples according to the third embodiment of the present invention. This embodiment is optimized on the basis of the above embodiments. In this embodiment, it also preferably includes: merging the new marked samples generated after marking with the existing marked samples, and adjusting the training The model is used as a new benchmark scoring model; return to the operation of using the labeled samples to train the benchmark scoring model, generate and adjust the training model, until the end labeling condition is met.

[0081] Correspondingly, the method in this embodiment specifically includes:

[0082] 310. Use marked samples to train the benchmark scoring model to generate an adjusted training model, wherein the marked samples are pre-marked with sample scores.

[0083] 320. Input the sample to be labeled into the adjusted training model, and generate a model score corresponding to the sample to be labe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a generation method and device for training samples. The method comprises the steps of training a standard scoring model through utilization of a marked sample and generating an adjustment training model; inputting a to-be-marked sample into the adjustment training model and generating a model score corresponding to the to-be-marked sample; determining a marking sample score corresponding to the to-be-marked sample according to an association relationship between a sample score of the marked sample and the model score of the marked sample, and the model score corresponding to the to-be-marked sample; and marking the to-be-marked sample through utilization of the marking sample score and generating a newly marked sample. According to the technical scheme provided by the invention, the technical problem that a manual training sample marking mode is high in obtaining difficulty and long in marking time period and the accuracy cannot be ensured issolved, the manual training sample marking workload is reduced, and the training sample marking efficiency is improved.

Description

technical field [0001] Embodiments of the present invention relate to information processing technologies, and in particular, to a method and device for generating training samples. Background technique [0002] Today's society is an information society. With the development of modern science and technology, information has grown explosively. How to quickly and accurately find the information you need from the massive amount of information is the core problem that information retrieval technology needs to solve. Whether information retrieval can better meet the needs of users is directly related to whether the massive information is fully utilized, which is of great significance to economic and social development. [0003] As a core technical problem in the field of information retrieval, ranking has been widely used in information retrieval problems such as web search, recommendation, and online advertisement. The task of the ranking system is to build a ranking model and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 刘志慧彭卫华李双龙康泽宇刘海浪王媛琼李辰
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products