Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Clustering method of WEB objects in search engine

A technology of search engine and clustering method, which is applied in the field of vertical search engine text retrieval, can solve the problems of reduced user affinity and cluster granularity not reaching a high standard, and achieve fine-grained, convenient and fast fine-grained, good performance effect

Inactive Publication Date: 2010-05-12
ZHEJIANG UNIV
View PDF0 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although most vertical search systems can cluster search results to a certain extent, the granularity of this clustering is far from reaching a very high standard. Under the influence of noise, user affinity is greatly reduced.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method of WEB objects in search engine
  • Clustering method of WEB objects in search engine
  • Clustering method of WEB objects in search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention will be further described in conjunction with accompanying drawing and embodiment now.

[0024] Such as figure 1 and figure 2 Shown, the specific implementation process and working principle of the present invention are as follows:

[0025] 1) According to the specific application environment of the vertical search engine, select the representation of the WEB object and the selection of the degree of relevance for the high-fine-grained requirements of the WEB object clustering results contained in the search results;

[0026] 2) According to the defined modeling method, establish a new type of WEB object feature mark method;

[0027] 3) Define the scale to measure the importance of lexical information, and the similarity of WEB objects based on this scale;

[0028] 4) Establish an adaptive record merging model, combining the information distribution model of vocabulary and the high similarity association of WEB objects;

[0029] 5) According t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a clustering method of WEB objects in a search engine. Through finding out the hierarchical relationship of the arborescence probability among words, the information distribution concentration degree of words in dataset can be circulated to be used as the mark of the identification objects. The method establishes a novel information transferring directed acyclic graph model for accurately extracting characteristic words which play a key role in identification objects and enhancing the accuracy of similarity calculation, and establishes a novel adaptive recording and combining model for efficiently enhancing the high similarity ratio among the records of the recording cluster and reducing the influence on the combining process by noise. The invention has the advantages of high accuracy, stability and commonality. The invention makes full use of the existing research and the actual achievements of the retrieval system of the vertical search engine environment, and can conveniently and rapidly improve the clustering result fine grit of WEB objects without depending on one specific text searching technology. Users can choose the most suitable merging cluster technology to provide the best performance according to the application need.

Description

technical field [0001] The invention relates to a vertical search engine text retrieval technology, in particular to a clustering method including WEB objects in a search engine. Background technique [0002] With the exponential growth of the scale and complexity of data on the Internet, traditional search engines are gradually unable to meet people's needs in the way that users present messy search results. High-fine-grained clustering methods have emerged to solve this problem. At present, research institutions and large Internet companies have taken high-fine-grained clustering methods as a research hotspot. [0003] WEB object-oriented data refers to text data that has undergone simple pre-processing on web pages but has not been marked with attributes. This type of data describes a large amount of object information, such as products, addresses, events, etc. Although most vertical search systems can cluster search results to a certain extent, the granularity of this c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 陈珂陈刚寿黎但胡天磊盛振华
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products