Sample selection method and device and electronic equipment

A screening method and sample technology, applied in the computer field, can solve the problem of low accuracy of screening samples, and achieve the effect of avoiding inconsistent description and improving accuracy.

Active Publication Date: 2017-09-22
BEIJING SANKUAI ONLINE TECH CO LTD
View PDF2 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present application provides a sample screening method to solve the problem of low accuracy of screening samples in the sample screening method in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sample selection method and device and electronic equipment
  • Sample selection method and device and electronic equipment
  • Sample selection method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] A sample screening method disclosed in the present application, such as figure 1 As shown, the method includes: step 100 to step 120.

[0025] Step 100, clustering all samples based on sample features.

[0026] The samples used in the embodiments of this application are the historical behavior logs of users in the current system or platform, such as the user's click or purchase behavior logs on the O2O platform, and the user's click or browsing logs in the search system. The specific method for obtaining user behavior logs, that is, samples used for training models is an existing technology, and will not be repeated here.

[0027] Before model training, firstly, manually screen the training samples and set the sample labels. The purpose is to screen out the samples that obviously do not meet the requirements of the model, and mark the positive samples and negative samples. The samples with positive and negative sample labels are set as Alternative samples.

[0028] W...

Embodiment 2

[0036] A sample screening method disclosed in this embodiment, such as figure 2 As shown, the method includes: Step 200 to Step 230.

[0037] Step 200, clustering all samples based on sample features.

[0038] The samples used in the embodiments of this application are the historical behavior logs of users in the current system or platform, such as the user's click or purchase behavior logs on the O2O platform, and the user's click or browsing logs in the search system. The specific method of obtaining user behavior logs as training samples, manually screening the training samples and setting positive and negative sample labels to obtain candidate samples can be found in Embodiment 1, and will not be repeated here.

[0039] When implementing this application, it is assumed that the feature dimensions of the sample include: time, geographic location, user age, user behavior type, and product category. After labeling the samples with positive and negative sample labels, the f...

Embodiment 3

[0055] A sample screening device disclosed in this embodiment, such as Figure 4 As shown, the device includes:

[0056] A sample clustering module 400, configured to cluster all samples based on sample characteristics;

[0057] The confusion degree metric determination module 410 is used to determine the sample chaos degree metric of the cluster where the candidate sample is located according to the clustering result of the sample clustering module 400;

[0058] The sample ratio determination module 420 is configured to determine the sample selection ratio of the corresponding cluster according to the sample confusion degree metric determined by the confusion degree metric determination module 410 .

[0059] During specific implementation, samples can be clustered using local centroid clustering methods such as kmeans and hierarchical clustering.

[0060] Optionally, the sample ratio determination module 420 is specifically configured to: determine the sample selection rati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a sample selection method, belongs to the technical field of computers, and aims at solving the problem that the sample selection correctness is relatively low in the prior art. The sample selection method comprises the following steps of: clustering all the samples on the basis of sample features; determining a sample confusion degree measurement indexes of clusters where alternative sample are located according to the clustering result; and determining sample selection proportions of corresponding clusters according to the sample confusion degree measurement indexes so as to training models. Through clustering all the alternative samples on the basis of preset dimension features, and determining the proportion of alternative samples selected in each cluster according to the a sample distribution confusion degree in each cluster which is obtained through clustering, so that the sample selection correctness is improved.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a sample screening method and device, and electronic equipment. Background technique [0002] Data preprocessing plays an important role in many machine learning algorithms. No matter which algorithm is selected, the preprocessing of the sample data is very critical, and the quality of the data input to the model will directly determine the performance of the algorithm. Taking search or recommendation technology as an example, before performing search and recommendation, it is first necessary to train a ranking model based on the user’s behavior log training as a sample, and then use the trained ranking model to sort the candidate search or recommendation results to present to the audience. Accurate, comprehensive results for users. In the prior art, when screening samples, manual labeling of positive and negative samples is usually used, and then the positive sampl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 张钦
Owner BEIJING SANKUAI ONLINE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products