A text classification algorithm that combines statistical features and Attention mechanism

A technology of statistical features and text classification, applied in the field of text datasets, can solve the problem of inability to learn text statistical features, and achieve the effect of reducing training time, good classification effect, and improving accuracy.

Inactive Publication Date: 2019-02-12
WUHAN UNIV OF TECH
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At the same time, in order to solve the problem that the existing deep learning model cannot learn text statistical features, statistical features are added on the basis of Attention weight calculation. Compared with the existing model, the semantic information contained in the event structure and the corresponding statistics features improve the quality of text vector representations and achieve better classification performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text classification algorithm that combines statistical features and Attention mechanism
  • A text classification algorithm that combines statistical features and Attention mechanism
  • A text classification algorithm that combines statistical features and Attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0038] For a piece of input text, it is first segmented, stop words and synonyms are replaced. Then use the word2vec tool to train and generate word vectors for each word, calculate the tf-idf value for the acquired words, and assign relevant weights according to the part-of-speech and tf-idf values ​​of the words to obtain the statistical feature value of the word. Calculate the Attention weight based on the event, and calculate the statistical feature weight of the event at the same time. The two weights are fused, and the feature vector obtained based on this contains more semantic information. The specific algorithm logic steps are as follows:

[0039](1) For a document set, first perform word segmentation, part-of-speech tagging and stop word processing.

[0040] (2) Record the word frequency information of the word, and replace the synonyms in the document at the same time.

[0041] (3) Extract events in each document

[0042] (4) Calculate the statistical feature va...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text classification algorithm which combines statistical features and an Attention mechanism, and the Attention mechanism has been gradually applied to the field of naturallanguage processing. The prior method greatly increases the calculation amount when calculating the Attention weight value, and the invention proposes that the Attention weight value is calculated atthe structured event level. On the one hand, events contain richer semantics than words or phrases; On the other hand, the event-based Attention mechanism reduces computational complexity. At the sametime, statistical features are added to the calculation of Attention weights. Compared with the existing model, the semantic information contained in the event structure and the corresponding statistical features improve the quality of text vector representation and achieve better classification performance. The classification accuracy is evaluated, and the experimental results show that the model not only reduces the training time, but also achieves better results.

Description

technical field [0001] The invention relates to a novel text classification algorithm, especially for large text data sets, which reduces the time complexity of calculation while improving the classification accuracy. Background technique [0002] The rapid development of network and information technology has led to exponential growth of data. Text is the main form of Internet information expression. How to extract key and effective information from complicated text data is currently a research hotspot in the field of data mining. Text classification technology As a key technology in the field of data mining, text information can be preliminarily processed and classified. [0003] The main tasks of text classification are text representation, feature extraction, classification algorithm and effect evaluation. In order to be calculated and processed by the computer, the initial input text must first be represented by the corresponding feature extraction algorithm, and then ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F17/27
CPCG06F40/216G06F40/289G06F40/30
Inventor 程艳芬李超陈逸灵
Owner WUHAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products