Method for applying phrase index technology into internet search engine

A search engine and indexing technology, applied in the underlying key technical fields, can solve the problems of not limiting the scope of multiple keywords to appear at the same time, poor results, and flooding of results, achieving accurate semantic matching results, narrowing the semantic scope, and easy to use. processing effect

Inactive Publication Date: 2008-06-11
新百丽鞋业(深圳)有限公司 +1
View PDF0 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Second, the query method of this search engine is essentially based on keywords, and the effect of entering a complete sentence for query is very unsatisfactory. In the case of input sentence query, it cannot reflect the correlation between the search sentence and the text content well, see figure 1
[0006] Third, the existing search engines are fuzzy about the keyword matching of the query, which is conducive to getting more results, but it leads to a lot of useless results, and even interferes with the appearance of bett...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for applying phrase index technology into internet search engine
  • Method for applying phrase index technology into internet search engine
  • Method for applying phrase index technology into internet search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The present invention mainly realizes through the following steps:

[0034] Step 1: Automatically accumulate web page information, see Figure 4 :

[0035] First, through web page collection programs, such as crawler or spider programs, a large number of original web page texts on the Internet are automatically obtained through hyperlink analysis; while these texts are obtained, the web page texts are decomposed by a word segmentation program into individual words, use the word frequency statistical program to count the word frequency, and mark the words whose word frequency exceeds the threshold as keywords; then, use the phrase generation program to center on each keyword, and add some other words before and after it respectively Words are combined into a series of phrases with different numbers of words and different collocations. In the process of combination, according to the part of speech marked by the word segmentation program, some meaningless combinations, su...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention applies phrase index technology to Internet search engine, decomposes the sentences in page documents into words and expressions, adds a plurality of other phrases to compose index phrase set in front of and behind key words which are taken as head words and generates index documents of web contents with phrases as unit; extracts the content words in query information submitted by user through word segmentation procedure and performs reasonable and possible combination of the words to gain the phrase set for search; precisely matches the phrases in the phrase set for search with the phrases in the index document in turn to gain search results; the phrase emphasizes single words in the aspect of expressing semantics, which facilitates the search result embody the possible intention of query more precisely.

Description

technical field [0001] The present invention mainly relates to the innovation of "text index" mode, which is the bottom key technology of Internet search engine, and some innovations of front-end processing required to perfect this technology. The present invention is to apply the theory of phrase index to the index construction of Internet search engine. Since the phrase index can greatly improve the semantic correlation between the retrieved content and the retrieved content, it can also be said to be an intelligent retrieval method provided for Internet search engines. Background technique [0002] Internet search engine (hereinafter referred to as search engine) is a tool for searching web pages and websites. The basic principle of the current search engine is to automatically collect webpage addresses and texts from the Internet through the collection program of websites or webpages, and then hand over the collected webpage texts to the index and retrieval system, and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 邓剑波戴云川詹天荣张潘高潮周波张森胡显如
Owner 新百丽鞋业(深圳)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products