Short text clustering method based on deep semantic feature learning

A technology of semantic features and clustering methods, applied in text database clustering/classification, semantic analysis, unstructured text data retrieval, etc., can solve high complexity problems, achieve good clustering performance and simple design

Active Publication Date: 2015-09-16
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF4 Cites 84 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the recurrent neural network needs to build an additional syntax tree and has high complexity, and the text se

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text clustering method based on deep semantic feature learning
  • Short text clustering method based on deep semantic feature learning
  • Short text clustering method based on deep semantic feature learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0055] The general idea of ​​the present invention is to reduce the dimensionality of the original features through the feature dimensionality reduction method under the constraints of local information preservation, and perform binarization on the low-dimensional real-valued vector, and use this binarization feature as the convolutional neural network structure The supervised information is used to perform error backpropagation to train the model. Use the trained convolutional neural network structure to perform feature mapping on the short text collection, obtain the deep semantic feature representation of the text, and then use the K-means clustering method to cluster the short text.

[0056] The short text clustering ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text clustering method based on deep semantic feature learning. The method includes the steps that dimensionality reduction representation is performed on original features under the restraint of local information preservation through traditional feature dimensionality reduction, binarization is performed on an obtained low-dimension actual value vector, and error back propagation is performed with the binarized vector being supervisory information of a convolutional neural network structure to train a model; non-supervision training is performed on a term vector through an outer large-scale corpus, vectorization representation is performed on all words in text according to the word order, and the vectorized words serve as implicit semantic features of initial input feature learning text of the convolutional neural network structure; after deep semantic feature representation is obtained, a traditional K-means algorithm is adopted for performing clustering on the text. By means of the method, extra natural language processing and other specialized knowledge are not needed, design is easy, deep semantic features can be learnt, besides, the learnt semantic features have unbiasedness, and good clustering performance can be achieved more effectively.

Description

technical field [0001] The invention relates to the field of vectorized representation of text features, in particular to a short text clustering method based on deep semantic feature learning. Background technique [0002] With the widespread popularity of social media, short text clustering has increasingly become an important task, and its main challenge lies in the sparsity of text representation. In order to overcome this difficulty, some researchers try to enrich and expand short text data through Wikipedia or ontology database. However, these methods require a lot of natural language processing knowledge, and still use high-dimensional feature representation, which is easy to waste storage and computing time. Other researchers have tried to develop complex models for clustering short text data. But how to design an effective model is an open problem, and most previous methods are based on the latent layer model of bag-of-words features. [0003] With the rise of de...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/30
Inventor 徐博许家铭郝红卫田冠华王方圆
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products