Text classification method based on feature information of characters and terms

A feature information and text classification technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of ignoring semantic information, achieve accurate classification, and improve the effect of insufficient semantic information
CN107656990AInactive Publication Date: 2018-02-02SUN YAT SEN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SUN YAT SEN UNIV
Publication Date
2018-02-02
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a text classification method based on feature information of characters and terms. The method comprises the steps that a neural network model is utilized to perform character and term vector joint pre-training, and initial term vector expression of the terms and initial character vector expression of Chinese characters are obtained; a short text is expressed to be a matrixcomposed of term vectors of all terms in the short text, a convolutional neural network is utilized to perform feature extraction, and term layer features are obtained; the short text is expressed tobe a matrix composed of character vectors of all Chinese characters in the short text, the convolutional neural network is utilized to perform feature extraction, and Chinese character layer featuresare obtained; the term layer features and the Chinese character layer features are connected, and feature vector expression of the short text is obtained; and a full-connection layer is utilized to classify the short text, a stochastic gradient descent method is adopted to perform model training, and a classification model is obtained. Through the method, character expression features and term expression features can be extracted, the problem that the short text has insufficient semantic information is relieved, the semantic information of the short text is fully mined, and classification of the short text is more accurate.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of natural language processing, in particular to a text classification method based on feature information at two levels of characters and words. Background technique

[0002] The performance of machine learning methods usually depends on the representation of features. In traditional machine learning methods, the most critical part is the selection of model features, and the selection of features requires experts in specific fields to be effectively completed, which makes the threshold of machine learning research It requires not only knowledge about machine learning, but also domain experts in task-related fields to help them design features, and designing features is also a process that consumes a lot of time and energy, which also reflects the weakness of traditional machine learning, that is, it is difficult to Extract and organize highly differentiated information from data. With the proposal and development of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More