Text clustering method

A text clustering and clustering method technology, applied in the field of semantic analysis, can solve the problems of ignoring the semantic relationship of feature words, increasing the time complexity, only considering the weight value, etc., to reduce the dimension, improve the effect of clustering, clustering reasonable effect

Inactive Publication Date: 2016-03-30
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Due to the large amount of text and feature words, directly using hierarchical collaborative clustering will increase time complexity and

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering method
  • Text clustering method
  • Text clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0021] Clustering Analysis (Clustering Analysis) is to classify things according to the internal relationship between them, and divide them into a collection of things one by one, also known as cluster (Cluster). The result of clustering makes things in the same cluster as similar as possible, different The objects in the cluster should be as dissimilar as possible. Commonly used clustering analysis algorithms include hierarchical clustering, collaborative clustering, semi-supervised clustering, etc., which are described below.

[0022] Hierarchical clustering algorithm is to build a tree-like hierarchical structure by decomposing data sets, which can be divided into split (top-down) algorithm and agglomeration (bottom-up...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a text clustering method. The method comprises: finding out a pairwise constraint instance from frequent vocabularies; extracting a frequent vocabulary set from a feature word with a largest weight in each document, so as to find out a positive constraint set and a negative constraint set; expanding the constraint set according to a K nearest neighbor set; and performing clustering according to a division result of the constraint set. According to the method of the present invention, a semi-supervised clustering algorithm is added for clustering the feature word, so that dimensions of vector space are reduced, and experiment efficiency is improved, and feature word clustering becomes more reasonable and reliable with guidance of a small amount of supervision information. In addition, hierarchical collaborative clustering is used for clustering of texts and feature words, so that a clustering effect is improved.

Description

technical field [0001] The present invention relates to the technical field of semantic analysis, and more specifically relates to a text clustering method. Background technique [0002] In today's information age, network texts present a large number of characteristics. To extract effective information or obtain current hotspot information from the massive searched texts, it is necessary to cluster the texts so that the similarity between the texts in the same text cluster is as high as possible. , the similarity between texts in different clusters should be as low as possible. [0003] In text clustering, feature words are often used to express the characteristics of the text, and the most commonly used model is the vector space model. In the vector space model, each text is represented by a vector, and each value in the vector represents the weight of each feature word in the text. The text vector space model is a matrix model, the rows of the matrix represent the text,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F16/355G06F40/30
Inventor 黄菲菲
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products