Short text classification method based on convolution neutral network

A convolutional neural network and classification method technology, applied in the field of text mining and deep learning, can solve problems such as insufficient context information of short texts, limited word co-occurrence information, data sparsity and semantic sensitivity, etc., to alleviate data sparsity Problems and Semantic Sensitivity Problems, Effects of Improving Semantic Sensitivity Problems, Improving Classification Performance

Active Publication Date: 2015-08-12
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF5 Cites 156 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] 1. Due to the insufficient context information of short texts and keywords appearing in different contexts to express different semantics, that is, there are data sparsity and semantic sensitivity problems in the representation of semantic feature vectors for short texts
[0009] 2. The short text information expansion technology based on the topic model can alleviate the impact of data sparsity to a certain extent, but the training of the topic model relies on a large-scale external auxiliary corpus, and the external auxiliary corpus requires the same short text data to be expanded. The collection maintains semantic consistency, making the collection process of the external auxiliary corpus time-consuming and labor-intensive
[0010] 3. The method of using search engines to expand short text content has a large time complexity and is difficult to adapt to the processing of massive data or online data
[0011] 4. The short text modeling method based on deep learning only uses limited context for semantic synthesis. Since the short text contains limited word co-occurrence information, it cannot effectively solve the problem of semantic sensitivity
[0012] The above-mentioned problems in the vectorized representation of short text semantic features all lead to the inability to accurately obtain the feature representation of short text to a certain extent, which in turn affects the performance of classification tasks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text classification method based on convolution neutral network
  • Short text classification method based on convolution neutral network
  • Short text classification method based on convolution neutral network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0026] The present invention proposes a short text classification method based on a convolutional neural network. Specifically, the short text is semantically extended based on a pre-trained word representation vector, which effectively solves the problem of insufficient context information of the short text. Then, a fixed-length semantic feature vector is extracted from the expanded short text through a convolutional neural network to fully mine the semantic information between words in the short text. The basic features of the present invention mainly include the following six aspects: one is to use the semantic vector representation of the pre-trained word to represent the vocabulary of the initialization convolutional ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text classification method based on a convolution neutral network. The convolution neutral network comprises a first layer, a second layer, a third layer, a fourth layer and a fifth layer. On the first layer, multi-scale candidate semantic units in a short text are obtained; on the second layer, Euclidean distances between each candidate semantic unit and all word representation vectors in a vector space are calculated, nearest-neighbor word representations are found, and all the nearest-neighbor word representations meeting a preset Euclidean distance threshold value are selected to construct a semantic expanding matrix; on the third layer, multiple kernel matrixes of different widths and different weight values are used for performing two-dimensional convolution calculation on a mapping matrix and the semantic expanding matrix of the short text, extracting local convolution features and generating a multi-layer local convolution feature matrix; on the fourth layer, down-sampling is performed on the multi-layer local convolution feature matrix to obtain a multi-layer global feature matrix, nonlinear tangent conversion is performed on the global feature matrix, and then the converted global feature matrix is converted into a fixed-length semantic feature vector; on the fifth layer, a classifier is endowed with the semantic feature vector to predict the category of the short text.

Description

technical field [0001] The invention relates to the technical field of text mining and deep learning, and is a short text classification method based on convolutional neural network, which can be applied to short text semantic vector representation, and perform massive short text classification, clustering tasks, and sentiment analysis, etc. , and finally applied to subfields such as user intent understanding, intelligent information retrieval, recommender systems, and social networks. Background technique [0002] Short text analysis is a basic task in the field of natural language processing, which can effectively help users discover useful information from massive short text resources. Especially with the maturity of the mobile Internet, a large amount of short text information is rich and complicated, such as user personal information, geographical location, WeChat, product reviews, news headlines, etc., and the useful information that specific users are concerned about ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F3/02
CPCG06F16/35G06N3/02
Inventor 徐博王鹏王方圆郝红卫
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products