Query expansion method based on semi-supervised clustering

A semi-supervised clustering and query expansion technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as unlabeled samples, and achieve high efficiency, high accuracy, and improved quality.

Inactive Publication Date: 2013-09-25
HARBIN ENG UNIV
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a query expansion method for semi-supervised clustering, which can accurately estimate the categories of a large number of unknown samples by learning from a small number of labeled samples, and can solve the problem of unlabeled problems while greatly reducing manual labor. The problem of sorting out samples

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Query expansion method based on semi-supervised clustering
  • Query expansion method based on semi-supervised clustering
  • Query expansion method based on semi-supervised clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The implementation process of the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0040] refer to figure 1 , figure 2 , the present invention proposes a query expansion method based on semi-supervised clustering, the method includes the following steps:

[0041] Step 1: The query likelihood estimation language module performs initial retrieval on the user query, and returns the first n documents of the retrieval results, specifically including the following steps:

[0042] Step 1.1: Carry out preprocessing such as removing stop words and stemming to the entire document set, and construct a text database based on a vector space model and a total feature lexicon of the entire document set.

[0043] Step 1.2: Preprocess the input query content by removing stop words, stemming, etc., and the remaining words constitute the vector form Q of the query.

[0044] Step 1.3: Sort the document...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a query expansion method based on semi-supervised clustering. The query expansion method includes the steps: (1) initially retrieving user queries by a query likelihood estimation language module and returning n front documents of retrieved results; (2) manually annotating k front documents in the initial retrieved results and dividing the k front documents into a relevant document set and an irrelevant document set; (3) analyzing the n front documents by a semi-supervised clustering algorithm for constraint and distance integration and extracting the documents related to the queries as feedback documents; (4) selecting expansion words by an expansion word selection module according to the feedback documents and forming new queries by the aid of the expansion words and original queries. By learning relevancy of a small number of annotated documents and query, relevancy of a large number of unknown documents and query can be accurately estimated, the quality of the feedback documents is improved, and accordingly, the recall ratio and precision of retrieval are effectively improved.

Description

technical field [0001] The invention relates to a query expansion method based on semi-supervised clustering. Background technique [0002] With the development of information technology and the increase of information volume, information retrieval is becoming more and more important in work and life. Quickly find the information you need by searching, which is convenient for work and life. However, because people often do not know much about the information they need, the queries submitted by users are often too short to fully and accurately describe the information users need. It is the primary problem to be solved in the current retrieval field to return as much information as possible to the user, and at the same time minimize the occurrence of irrelevant or weakly relevant information to the query. Query expansion is an effective technical means to solve this problem. Query expansion solves the problem of mismatch between query words and document words in the retriev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 杨静刘宁张健沛
Owner HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products