Text classification method based on feature information of characters and terms

A feature information and text classification technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of ignoring semantic information, achieve accurate classification, and improve the effect of insufficient semantic information

Inactive Publication Date: 2018-02-02
SUN YAT SEN UNIV
View PDF9 Cites 52 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] To sum up, the existing word vector-based text representation ignores the semantic information at the word level, so it is necessary to improve it

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method based on feature information of characters and terms
  • Text classification method based on feature information of characters and terms
  • Text classification method based on feature information of characters and terms

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0031] The basic idea of ​​the present invention is to train the word vector while using the corpus to pre-train the word vector, and express the short text into a matrix composed of the word vector of the word and a matrix composed of the word vector of the word in it, and use the convolutional neural network The word-level and word-level feature extractions are performed on these two matrices to obtain the vector representation of the text, and then the text is classified through a fully connected neural network to obtain the classification result.

[0032] see figure 1 , the present invention p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method based on feature information of characters and terms. The method comprises the steps that a neural network model is utilized to perform character and term vector joint pre-training, and initial term vector expression of the terms and initial character vector expression of Chinese characters are obtained; a short text is expressed to be a matrixcomposed of term vectors of all terms in the short text, a convolutional neural network is utilized to perform feature extraction, and term layer features are obtained; the short text is expressed tobe a matrix composed of character vectors of all Chinese characters in the short text, the convolutional neural network is utilized to perform feature extraction, and Chinese character layer featuresare obtained; the term layer features and the Chinese character layer features are connected, and feature vector expression of the short text is obtained; and a full-connection layer is utilized to classify the short text, a stochastic gradient descent method is adopted to perform model training, and a classification model is obtained. Through the method, character expression features and term expression features can be extracted, the problem that the short text has insufficient semantic information is relieved, the semantic information of the short text is fully mined, and classification of the short text is more accurate.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a text classification method based on feature information at two levels of characters and words. Background technique [0002] The performance of machine learning methods usually depends on the representation of features. In traditional machine learning methods, the most critical part is the selection of model features, and the selection of features requires experts in specific fields to be effectively completed, which makes the threshold of machine learning research It requires not only knowledge about machine learning, but also domain experts in task-related fields to help them design features, and designing features is also a process that consumes a lot of time and energy, which also reflects the weakness of traditional machine learning, that is, it is difficult to Extract and organize highly differentiated information from data. With the proposal and development of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/355
Inventor 杜婷婷常会友
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products