Short text classification method based on topic word vectors and convolutional neural network

A technology of convolutional neural network and classification method, which is applied in the field of short text classification based on subject word vector and convolutional neural network, which can solve the problems of insufficient semantic information, sparse word co-occurrence, and short text length of short texts.

Active Publication Date: 2019-08-16
NANJING UNIV
View PDF5 Cites 76 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Purpose of the invention: The technical problem mainly solved by the present invention is to solve the problem of poor classification effect due to the short text length, insufficient semantic information, and sparse word co-occurrence data characteristics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text classification method based on topic word vectors and convolutional neural network
  • Short text classification method based on topic word vectors and convolutional neural network
  • Short text classification method based on topic word vectors and convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention All modifications of the valence form fall within the scope defined by the appended claims of the present application.

[0060] A short text classification method based on subject word vectors and convolutional neural networks, comprising the steps of:

[0061] Such as figure 1 as shown,

[0062] Step 1, dataset preprocessing: process the original text data in a unified format, and denoise the uniformly processed sample data;

[0063] Step 2, text segmentation, customized stop word filtering, and building a corpus D:

[0064] Step 3a) Perform topic-level feature representation on sh...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text classification method based on a topic word vector and a convolutional neural network, which comprises the following steps: 1) a data acquisition stage: acquiringshort text data according to requirements, and labeling the short text data as a training set; 2) a data preprocessing stage: performing word segmentation, stop word removal, useless text filtering and the like on the text; 3) representing short text features, namely respectively representing a theme level and a word vector level; 4) carrying out subject term vector joint training; 5) optimizing and iterating parameters of the convolutional neural network classification model; and 6) performing category prediction on the new sample. According to the invention, short text data characteristics are combined; in the feature representation stage, a topic vector and a word vector are combined for representation; semantic feature expansion is carried out on the data characteristics of the short text, text semantic information is further mined by utilizing the local sensitive information extraction capability of the convolutional neural network in the classification model training stage, and indexes such as short text classification task category prediction accuracy can be improved.

Description

technical field [0001] The invention relates to the field of text classification, in particular to a short text classification method based on subject word vectors and convolutional neural networks. Background technique [0002] With the emergence of large-scale text information on the Internet, more manpower and material resources are required to effectively mine and utilize massive text information. Text classification tasks have become an important method of processing text data and an important means of managing text corpus. . Text classification is one of the main research areas of natural language processing (NLP). The text classification task can be understood as the process of extracting the article mapping into the set label set by analyzing the structural features and semantic information of the text. [0003] With the popularity and explosive growth of real-time new applications such as online communication, news alerts, e-commerce, social media, and online ques...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06K9/62G06N3/04G06N3/08
CPCG06F16/35G06N3/08G06N3/044G06N3/045G06F18/2411
Inventor 张雷李博许磊顾溢谢俊元
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products