Search engine technology based on relevance feedback and clustering

A search engine and relevant feedback technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as increasing user burden, increasing difficulty for users, and inapplicability, so as to optimize retrieval results and increase clustering speed Effect

Inactive Publication Date: 2010-10-06
NORTH CHINA ELECTRIC POWER UNIV (BAODING)
View PDF5 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method mainly considers the effectiveness of text clustering, but requires the user to input feedback information multiple times, which increases the burden on the user, especially when the first clustering requires the user to specify the example documents belonging to some clusters to guide the clustering process. The user increases the difficulty; and the clustering process takes a long time, which is not suitable for the clustering of Web information retrieval results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Search engine technology based on relevance feedback and clustering
  • Search engine technology based on relevance feedback and clustering
  • Search engine technology based on relevance feedback and clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Step S101: the user selects relevant documents and irrelevant documents from the retrieval results of the search engine;

[0032] Step S102: Determine the number of initial cluster categories and the initial cluster center;

[0033] Assume that the documents in the search result list are d1, d2, ..ds (s is the number of documents), assuming that the keywords indexed in the index library of the retrieval system do not include stop words, select documents d1 and d2 in the index library of the retrieval system , the keyword weight in ..ds, that is, the frequency of keywords appearing in the document, the keywords t1, t2, t3, ..tn (n is the number of keywords) greater than the preset threshold δk, constitute the vector space model The dimension of the vector, then the feature vector di of the document di is defined as:

[0034] di=(w i1 ,w i2 ,...,w in ) (1)

[0035] Among them, w ij =tf ij (i=1, 2, ... s, j = 1, 2, ... n), tf ij is the frequency ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a search engine technology based on relevance feedback and clustering. By simultaneously utilizing user relevance feedback information and relavancy sequencing to direct the clustering of retrieval results, the invention ensures that the final partitioning of the retrieval results meet user query requirements; and in a clustering process, a large amount of documents and repeated webpage which are irrelevant to a user are removed, the clustering speed is improved and the retrieval results are optimized at the same time. In the clustering process, a clustering center is not modified by a clustering cluster irrelevant to the user, thereby result documents relevant to the user are ensured not to be lost when noise is introduced in irrelevant document clustering.

Description

technical field [0001] The invention relates to the technical field of Internet information retrieval, in particular to a method for optimizing Web retrieval results based on correlation feedback and clustering. Background technique [0002] At present, most search engines perform indexing and retrieval based on keywords. According to the list of keywords input by users, the search engines search the index library, and sort and display the matching documents according to the degree of relevance to the user query. Because keywords have polysemy, and users often only enter a few keywords for retrieval, the search result list returned by the search engine usually contains many irrelevant and mixed documents, and users must browse and retrieve them one by one. In order to find relevant documents in the search results list, there are many web pages with duplicate content. Browsing information from such retrieval results will waste a lot of time and a lot of energy for users. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李新叶
Owner NORTH CHINA ELECTRIC POWER UNIV (BAODING)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products