Lexical item weight labeling method and device

A term and weight technology, applied in the field of network search, can solve problems such as unsatisfactory results, and achieve the effect of improving the quality of search sorting, improving results, and improving accuracy

Active Publication Date: 2016-09-28
BEIJING QIYI CENTURY SCI & TECH CO LTD
View PDF4 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the term weight calculated by the TF-IDF method is not ideal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lexical item weight labeling method and device
  • Lexical item weight labeling method and device
  • Lexical item weight labeling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] refer to figure 1 , which shows a flow chart of the steps of an embodiment of a weight labeling method for terms of the present application, which may specifically include the following steps:

[0049] Step 110, acquiring each term whose weight is to be determined.

[0050] In the embodiment of the present invention, word segmentation is performed on all user search terms in the search log, and then the obtained word segmentation results are used as terms to be weighted. For example, in the search log, there is a search term of "good-looking movie", and the word segmentation results are three terms of "good-looking", "of", and "movie".

[0051] Certainly, the term items whose weights are to be determined can be generated in various ways, for example, word segmentation is performed on the document to be searched, and then the term items are extracted. The object to be searched for example describes a video page on a video website, a product page on an e-commerce platfo...

Embodiment 2

[0067] refer to figure 2 , which shows a flow chart of the steps of an embodiment of a weight labeling method for terms of the present application, which may specifically include the following steps:

[0068] Step 210, acquiring each term whose weight is to be determined.

[0069] This step is the same as step 110 in the first embodiment, and will not be described in detail here.

[0070] Step 220, extracting the term feature of each term; the term feature includes a term search feature, and the term search feature is acquired through the search log.

[0071] In the embodiment of the present invention, for each term, its feature of the term may be extracted in combination. Wherein, the term search feature in the term feature can be extracted through the search log. Of course, features can also be extracted for the term itself.

[0072] For search logs, take a video website as an example. The user logs in to the webpage of the video website in the client, and then the user...

Embodiment 3

[0110] refer to image 3 , which shows a flow chart of the steps of an embodiment of a weight labeling method for terms of the present application, which may specifically include the following steps:

[0111] Step 310, acquiring a term training set; the term training set includes terms and the term search weights corresponding to the terms.

[0112] The word segmentation is performed on the document collection, and the result after word segmentation is a term set, and a certain number of terms in the term set are extracted as a data set, and the certain number can be greater than 100. Then manually label each term in this data set, mark the search term weight of this term, and use the marked data set as a training set to train the term search weight labeling model. In practical applications, the term items in the training set can be obtained from the document collection to be searched, search logs and other materials that can provide the search task, which is not limited in t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of invention provides a lexical item weight labeling method and device, and relates to the technical field of internet search. The method includes the steps that all lexical items with weights to be determined are obtained; lexical item search weights of all the lexical items are calculated by combining search logs; according to occurrence frequencies of all the lexical items in a document set, reverse documents frequencies of all the lexical items are calculated; according to the lexical item search weights and reverse documents frequencies of all the lexical items and the, lexical item weights are calculated. The problem that in the method of using TF-IDF for calculating the lexical item weights in the internet search field, when a search task with a short text as the principal thing based on search words is implemented, important lexical items are marked with smaller weights is solved, the lexical item weights are calculated mainly based on the search logs, while connection between the lexical items and the search words is deepened, precision of correlation calculation between the lexical items and documents is improved in the search environment, and the search ranking quality is improved.

Description

technical field [0001] The present application relates to the technical field of network search, in particular to a method for labeling weight of words and a device for labeling weight of words. Background technique [0002] With the popularization of web search technology, web search is involved in all aspects of daily life, and after a user inputs a search term in a search website, the search website will list search results related to the search term. The search results are arranged according to the correlation between the search terms and the search results. The higher the correlation between the search terms and the search results, the higher the quality of the search results obtained by the user, which can better meet the user's search needs. Therefore, how to measure the correlation between the user's search term and the document collection (the collection of searched objects) is a very important link in the network search technology. The precision of the document co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30G06F15/18
CPCG06F16/93G06F16/951G06F40/216G06N20/00
Inventor 胡军陈英傑王天畅叶澄灿
Owner BEIJING QIYI CENTURY SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products