Semi-supervised text clustering method and device fusing pairwise constraints and keywords

A text clustering and keyword technology, applied in the field of text clustering, can solve the problems of low accuracy and stability of clustering results, and achieve the effect of accurate clustering results

Inactive Publication Date: 2012-02-08
QINGDAO TECHNOLOGICAL UNIVERSITY
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the semi-supervised text clustering methods based solely on one of the restrictive information can effectively improve the clustering quality, they do not comprehensively consider the fusion of these two different types of information when processing, resulting in the accuracy of the clustering results and the Not very stable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised text clustering method and device fusing pairwise constraints and keywords
  • Semi-supervised text clustering method and device fusing pairwise constraints and keywords
  • Semi-supervised text clustering method and device fusing pairwise constraints and keywords

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] In order to express the object, technical solution and advantages of the present invention more clearly, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0016] The invention provides a semi-supervised text clustering method. The method first fuses the instance layer information in the form of pair constraints to assist in dividing the text data set, and learns to obtain initial feature word weights. Then, continue to add the attribute layer information in the form of keywords to effectively fuse the two different types of prior information for text clustering. Finally, evaluate the clustering quality of the above two steps according to user satisfaction, and select high-quality texts to be divided into the final clustering results.

[0017] Before text clustering, the text data set needs to be preprocessed to convert the text data set into a form that can be processed by the cluste...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semi-supervised text clustering method and device fusing pairwise constraints and keywords. The method comprises the following steps of: fusing pairwise constraints to assist in text clustering to obtain an initial feature word weight; fusing the pairwise constraints and keywords and performing the semi-supervised clustering at the same time based on the obtained initial feature word weight; and evaluating and selecting a clustering result according to a user satisfaction degree. The device provided by the invention comprises a pre-processing module, a text clustering module fusing pairwise constraints, a semi-supervised text clustering module fusing pairwise constraints and the keywords, and an evaluation and selecting result module. Since the semi-supervised text clustering method provided by the invention continuously adds keyword information on the basis of fusing pairwise constraint information, the keyword information is used for adjusting the corresponding feature word weight while applying the pairwise constraints to learning the feature word weight; and therefore, the two prior information can be mutually influenced and promoted to obtain a more accurate clustering result.

Description

technical field [0001] The present invention relates to a text clustering method, in particular to a semi-supervised text clustering method and device for fusing pairwise constraints and keywords. Background technique [0002] Traditional text clustering usually adopts an unsupervised learning mechanism, which automatically divides texts with similar topics together and divides texts with different topics. However, the performance of such text clustering methods is often unsatisfactory. This is caused by many reasons, such as inability to interact with the user, segmentation results are difficult to understand, and so on. [0003] In recent years, many researchers have adopted semi-supervised learning strategies to integrate prior information to assist text clustering, which has effectively improved the quality of clustering and the understandability of segmentation results. Prior information mainly includes instance layer information and attribute layer information. Inst...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 王金龙吴舜尧李刚
Owner QINGDAO TECHNOLOGICAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products