Information searching method for discovering and clustering sub-topics of query statement

A query sentence and subtopic technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problem of not digging deeply into the inclusion of topics, not fully considering the vocabulary mismatch, and not fully meeting user needs, etc. question

Active Publication Date: 2012-04-18
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF3 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In the existing methods, it has not been found that query logs are used as a source of mining query subtopics, and when calculating the similarity between query sentences, the problem of vocabulary mismatch and vocabulary over-matching is not fully considered.
In addition, the existing clustering method is a clustering method based on lexical similarity, which does not dig deep into the inclusion relationship between topics, and it is difficult to establish a tree-like hierarchical structure between topics
Therefore, these clustering methods have certain defects when clustering and querying subtopics, and cannot fully meet user needs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information searching method for discovering and clustering sub-topics of query statement
  • Information searching method for discovering and clustering sub-topics of query statement
  • Information searching method for discovering and clustering sub-topics of query statement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be described in detail below through embodiments and in conjunction with the accompanying drawings.

[0027] figure 1 The flow chart of the information search method for mining query subtopics and clustering according to the present embodiment, the specific description of each step is as follows:

[0028] 1) Tokenize the original query statement and historical query statement:

[0029] a) Set the original query sentence as Q, segment it, and obtain a sequence of query words q 1 q 2 ...q n , where q i (i∈[0,n]) is a single query word;

[0030] b) Set all historical query statements in the query log as P={P 1 , P 2 ,...P k}, for each historical query statement P i Word segmentation, get a query word sequence p i1 p i2 ...p im , where p ij (j∈[0, m]) is a single query word; these query word sequences (still using P i representation) as candidate subtopics. The query log is a series of user behaviors recorded by the search service p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an information searching method for discovering sub-topics of a query statement and clustering the sub-topics. By adoption of the method, words of an original query statement and a historical query statement are divided respectively to obtain a query word sequence, and the similarity between the original query statement and the historical query statement is calculated. Moreover, the original query can be expanded through a semantic dictionary, the similarity between an expanded query statement and the historical query statement is calculated, and the similarity between the historical query statement and the original query statement is corrected; and the similarity between the historical query statement and the original query statement is further corrected according to click information of the historical query statement. Then, final sub-topics are selected according to a preset threshold value of the similarity, and are clustered, and a tree-shaped hierarchical structure is constructed for the sub-topics. A user acquires retrieving results with different classification granularities by selecting different leaf nodes of the tree-shaped hierarchical structure, so the information search method provides convenience for the user to browse the retrieving results according to topic categories.

Description

technical field [0001] The invention belongs to the technical field of computer information retrieval, and relates to an information search method for mining subtopics of user query sentences and clustering the subtopics. Background technique [0002] Mining sub-topics of query sentences, clustering sub-topics, and building a tree-like hierarchy based on topic inclusion relationships can provide users with more accurate query expansion and query suggestions, and classify and display them in the retrieval results according to the topics to which the documents belong. At present, the related research on mining query subtopics is very limited. One method is to extract key phrases from the result documents returned by search engines and use data mining algorithms to find out candidate subtopics (Reference: E. Uluhan and B. .Badur.Developmetn of a Framework for Sub-topic Discovery from the Web.2008.In Proceedings ofPICMET2008). [0003] When calculating the similarity between qu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 孙乐江雪
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products