Fused attention model-based Chinese text classification method

An attention model and text classification technology, applied in semantic analysis, special data processing applications, instruments, etc., can solve problems such as single text features, ignoring text word feature information, and difficulty in covering text semantic information.

Inactive Publication Date: 2018-09-28
中国科学院电子学研究所苏州研究院
View PDF4 Cites 52 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is to provide a Chinese text classification method based on the fusion attention model, which solves the problem that the existing Chinese text classification method ignores the word feature information of the text, resulting in a single extracted text feature, which is difficult to cover all texts. semantic information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fused attention model-based Chinese text classification method
  • Fused attention model-based Chinese text classification method
  • Fused attention model-based Chinese text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:

[0078] Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should also be understood that terms such as those defined in commonly used dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and will not be interpreted in an idealized or overly formal sense unless defined as herein explain.

[0079] like figure 1 Shown, the inventive method is mainly divided into VI stages:

[0080] Phase I is word segmentation preprocessing and word segmentation preprocessing. The text is segmented into corresponding word sets and word sets through the NLPIR tool;

[0081] Phase II is training word ve...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fused attention model-based Chinese text classification method. The method comprises the following steps of: respectively segmenting a text into a corresponding word set anda corresponding character set through word segmentation preprocessing and character segmentation preprocessing, and training a word vector and a character vector corresponding to the text by adoptionof a feature embedding method according to the obtained word set and character set; respectively carrying out semantic encoding on the word vector and the character vector by taking a bidirectional gate circulation unit neural network as an encoder, and obtaining a word attention vector and a character attention vector in the text by adoption of a word vector attention mechanism and a character vector attention mechanism; obtaining a fused attention vector; and predicting a category of the text through a softmax classifier. The method is capable of solving the problem that more redundant features exist in the classification process as existing Chinese text classification methods neglects character feature information of texts, the extracted texts are single in features, all the pieces of semantic information of the texts are difficult to cover and features having obvious contribution to the classification are not focused.

Description

technical field [0001] The invention belongs to the field of natural language processing, and in particular relates to a method for classifying Chinese texts. Background technique [0002] In recent years, with the rapid development of electronic information technology, massive amounts of information and data in the form of text flood the Internet. How to effectively classify these texts and then mine valuable information has become one of the hotspots in the field of natural language processing research. The purpose of text classification is to assign texts into some predefined topic category. Most of the traditional text classification algorithms are based on shallow machine learning models. Recently, with the great success of deep learning in computer vision and speech recognition, more and more research attempts to apply deep learning to Chinese text classification. Different from the traditional text classification method, the deep learning method learns the word feat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F40/30
Inventor 胡岩峰乔雪岳才杰范远来段贺陈星彭晨刘振
Owner 中国科学院电子学研究所苏州研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products