Supercharge Your Innovation With Domain-Expert AI Agents!

Sample data set acquisition method and apparatus, equipment and storage medium

A technology of sample data set and acquisition method, applied in the field of sample data set acquisition method, device, equipment and storage medium, can solve problems such as low accuracy of sorting model

Pending Publication Date: 2020-10-16
BEIJING SANKUAI ONLINE TECH CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, since the number of data browsed by users will be much greater than the number of clicked data, the positive sample data obtained by the above method is far less than the negative sample data. In the process of training the ranking model, the features learned by the ranking model will be more biased towards negative sample data. The characteristics of the sample data lead to low accuracy of the trained ranking model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sample data set acquisition method and apparatus, equipment and storage medium
  • Sample data set acquisition method and apparatus, equipment and storage medium
  • Sample data set acquisition method and apparatus, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0078] In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

[0079] The sample data set acquisition method provided by the present application can obtain at least one target negative sample data and at least one target positive sample data from multiple initial positive sample data and multiple initial negative sample data in the sample data set, thereby improving the training efficiency. The accuracy of the ranking model, and can be applied in the following scenarios:

[0080] For example, the sample data set acquisition method provided by this application is applied in the search scenario. When any user needs to view some data, he needs to enter a search term in the terminal, and the terminal obtains multiple pieces of corresponding data based on the search term. Arrange and display mult...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a sample data set acquisition method and apparatus, equipment and a storage medium, belonging to the technical field of the Internet. The method comprises the steps of acquiring a first sample data set corresponding to any search term, and selecting at least one piece of target negative sample data according to the position of initial positive sample data in a same search result interface to which the initial positive sample data belong in the same search result interface; and according to the historical click rate of a user identifier, selecting at least one piece of target positive sample data, and forming a second sample data set corresponding to the any search term by using the target negative sample data and the target positive sample data, and applying the second sample data set to training of a sorting model. The number of the negative sample data is reduced, the situation that the number of the negative sample data is far larger than the number of the positive sample data is avoided, and therefore, the situation that the sorting model of follow-up training is closer to the characteristics of the negative sample data is avoided. The second sample dataset is subsequently adopted to train the sorting model, so the accuracy of the sorting model is improved.

Description

technical field [0001] The present application relates to the technical field of the Internet, and in particular to a method, device, equipment and storage medium for acquiring a sample data set. Background technique [0002] In order to ensure the accuracy of the search results, the sorting model is usually invoked during the search to sort the multiple pieces of data obtained from the search. How to train an accurate ranking model has become an urgent problem to be solved. [0003] In related technologies, for any user, when the user searches based on the search term and obtains at least one piece of data, each piece of data displayed can be regarded as the data seen by the user and can be used as sample data. If there is one piece of data, record the data as positive sample data. If the user does not click on any of the displayed data, record the data as negative sample data. Using the above method, positive sample data and negative sample data can be obtained based on t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9535G06F16/9538G06K9/62
CPCG06F16/9535G06F16/9538G06F18/214
Inventor 王步霖杨一帆李悦郭圣昱屠川川陶然
Owner BEIJING SANKUAI ONLINE TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More