Search-ranking-oriented sample selection method based on noise-adding active learning

A noise injection and active learning technology, applied in the field of machine learning, can solve the problems of affecting the final effect, low accuracy of the sorting model, and inaccurate estimation of expected loss, so as to reduce the cost and improve the performance of the model

Inactive Publication Date: 2012-05-09
SHANGHAI JIAO TONG UNIV
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, one disadvantage of this method is that a set of ranking models is required to estimate the loss. When the initial training samples are insufficient, the accuracy of the ranking model will be very low, resulting in inaccurate estimation of the expected loss and affecting the final effect.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Search-ranking-oriented sample selection method based on noise-adding active learning
  • Search-ranking-oriented sample selection method based on noise-adding active learning
  • Search-ranking-oriented sample selection method based on noise-adding active learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0022] In this embodiment, for search ranking, the commercial search ranking data provided by Baidu is used for active learning sample selection. In this embodiment, the two most popular evaluation standards in the current information retrieval field, DCG10 and MAP (Mean Average Precision), are selected for effect evaluation, and experimental comparisons are made with existing representative sample selection techniques. The sample selection effect of the present invention can be fully tested. This embodiment includes the following steps:

[0023] In the first step, noise injection is performed for unlabeled samples.

[0024] Note that e∈[0, 1] is a d-dimensional unlabeled sample after 0-1 normalization, and the noise injection is expressed as follows:

[0025] e m =e+η

[0026] Among them, e m Indicates that m noise samples are generated after injecting noise from one sample, η is a d-dimensional vector, and obeys the Gaussian distribution p(η)~(μ, Σ), that is:

[0027] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a search-ranking-oriented sample selection method based on noise-adding active learning. The method comprises the following steps of: carrying out noise adding on unmarked samples so as to generate noise samples; carrying out prediction on the noise samples by using a ranking model trained by a training set so as to obtain the fraction distribution of the sample under the current ranking model; converting the fraction distribution into the ranking distribution, and using the variance of DCG (discounted cumulative gain) to measure ranking distribution so as to characterize uncertainty; and carrying out sample selection by using the uncertainty. By using the method disclosed by the invention, effective sample selection can be performed under the condition of insufficient samples in search ranking, and an effect of more effectively enhancing the performance of a model by using fewer samples can be achieved, thereby achieving the purpose of reducing the cost for sample marking.

Description

technical field [0001] The invention belongs to the fields of machine learning and information retrieval, and specifically relates to a sample selection method based on noise injection active learning for search and sorting. Background technique [0002] Sorting is a core problem in the field of information retrieval, such as recommendation, online advertising, etc., and its task is to build a ranking model. Ranking learning belongs to supervised learning. Similar to other supervised learning problems, the quality of the ranking model is highly related to the number of training samples. Usually building a high-quality ranking model requires labeling a large amount of training data. However, in many practical applications, although collecting unlabeled samples is relatively easy, labeled samples are very expensive. The problem of training data labeling has become a bottleneck in building high-quality ranking models. In fact, the amount of information contained in different...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 蔡文彬张娅
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products