Index building method, searching method and searching result sorting method and corresponding device

A technology for index building and search results, applied in the computer field, which can solve the problems of low search accuracy, inability to identify and satisfy, and poor search effect.

Active Publication Date: 2012-10-17
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF3 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing sorting method still has the problem of low search accuracy. For example, if the query entered by the user is "Andy Lau's date of birth", when the search engine recalls the page, there may be some pages that contain "Andy Lau" and "birth date". Dat

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index building method, searching method and searching result sorting method and corresponding device
  • Index building method, searching method and searching result sorting method and corresponding device
  • Index building method, searching method and searching result sorting method and corresponding device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0125] figure 1 The flow chart of the index building method provided by Embodiment 1 of the present invention, as shown in figure 1 As shown, perform the following steps on each captured page:

[0126] Step 101: Perform word segmentation and part-of-speech tagging on the page.

[0127] In addition, after word segmentation and part-of-speech tagging are performed on the page, the query after word segmentation can be filtered based on the stop word list, which can include: adverbs, function words, particles, interrogative words, modal particles, etc. Filter out those words with high frequency but low expressive ability in the page.

[0128] Step 102: Based on the semantic analysis, determine the entity word and the attribute word corresponding to the entity word from each word obtained after word segmentation, and mark them respectively.

[0129] In the present invention, nouns that meet the preset entity word conditions can be determined as entity words, wherein the preset e...

Embodiment 2

[0152] figure 2 The flow chart of the method for analyzing query provided by Embodiment 2 of the present invention, such as figure 2 As shown, the method includes the following steps:

[0153] Step 201: Segment the received query.

[0154] Step 202: Perform part-of-speech tagging on each word obtained after word segmentation.

[0155] For example, after receiving the query of "Andy Lau's date of birth", the word segmentation process is performed on the query to obtain two words, "Andy Lau" and "date of birth", which are both marked as nouns. The above two steps are mature technologies in the prior art and will not be described in detail.

[0156] In addition, after word segmentation and part-of-speech tagging are performed on the query, the word-segmented query can be filtered based on the preset stop word list, and the words contained in the stop word list can be filtered out. The stop word list can include : Adverbs, function words, auxiliary words, interrogative words...

Embodiment 3

[0180] After the query is analyzed as shown in Embodiment 2, only the pages corresponding to the index matching the words in the query and the tags (entity word or attribute word tags) in the query can be recalled when the page is searched and recalled.

[0181] That is, when searching, search the index for each word obtained after the word segmentation process, find the page corresponding to the index matching each word and the label of the word, and then take the intersection of the pages found by using each word.

[0182] For example, for the query of "Andy Lau's date of birth", for the words "Andy Lau" and "date of birth" obtained after word segmentation, since "Andy Lau" has been analyzed as an entity word and "date of birth" is an attribute word, when searching , find the page corresponding to the index marked with entity words for "Andy Lau", and the page corresponding to the index marked with attribute words for "Date of Birth", and the intersection of the obtained page...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an index building method, a searching method and a searching result sorting method and a corresponding device, wherein the index building method comprises the following steps of: performing word segmentation and part-of-speech tagging on a page; based on semantic analysis or at least one of visual features in the page, determining an entity word and the corresponding attributive word of the entity word in each word obtained after word segmentation, and respectively tagging the entity word or the attributive word; and when building the index of the page, simultaneously adding the entity words or the attributive words of the words used for index building into the index. While searching the query, matching the entity word or attributive word tags of the words, or when sorting the search results of the query, improving a sorting weight of the page corresponding to the index matched with each word in the query and the entity word or attributive word tag of each word. By the methods and the device provided by the invention, the searching accuracy can be improved, and the searching effect is perfected.

Description

【Technical field】 [0001] The invention relates to the field of computer technology, in particular to an index establishment method, a search method, a search result sorting method and a corresponding device. 【Background technique】 [0002] With the rapid development of search technology in computer technology, search engines have become the main means for people to obtain information. After the user inputs a search request (query), the search engine can return search results matching the query to the user, that is, pages containing each word in the query will be included in the search results. [0003] In the existing search technology, in the recalled search results, there may be some pages that are less relevant to the query entered by the user. Therefore, when sorting the search results, it is mainly based on the feature vector and query of the pages in the search results. The correlation degree is carried out, and the search results with a high correlation degree betwee...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 喻宏勇石远
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products