End-to-end classification method of large-scale news text based on Bi-GRU and word vector

A classification method and large-scale technology, applied in semantic analysis, electrical digital data processing, biological neural network models, etc., can solve the problems of poor long text, disappearance, RNN model gradient explosion, etc., to achieve dimensionality reduction, efficiency and The effect of improving accuracy and improving performance

Inactive Publication Date: 2018-11-20
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT +1
View PDF3 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the cyclic neural network model is one of the main research methods for text classification because it can represent the sequence semantic and grammatical feat

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end classification method of large-scale news text based on Bi-GRU and word vector
  • End-to-end classification method of large-scale news text based on Bi-GRU and word vector
  • End-to-end classification method of large-scale news text based on Bi-GRU and word vector

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0062] Transcode the downloaded raw data, then label the text with categories, then make training and test data, and then control the text length, word segmentation, and remove punctuation marks. For the successfully marked news of 10 categories, count the distribution of categories, draw 2000 news from each category, and divide them into training and testing according to 4:1. The categories are: Finance, IT, Health, Sports, Travel, Military, Culture, Entertainment, Fashion, Auto. The results of model training make it possible to test the maximum probability of the classified category for any piece of news text. For example: "On March 30, Beijing time, according to US media reports, as the number one player in today's NBA, LeBron James always gets cheers from the fans of the away team when he plays in away games." The category of classification is "Sports: 0.76" , "Health: 0.12", "Culture: 0.06"..., finally take the one with the highest probability as the classification resul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an end-to-end classification method of a large-scale news text based on Bi-GRU and a word vector. The end-to-end classification method comprises the following steps: S1. word Embedding word-level semantic feature representation is performed; S2. the attention weight Bi-GRU word level sentence feature coding model is constructed; S3. the Bi-GRU sentence level feature coding model based on the attention weight is established; S4. hierarchical Softmax is applied to realize end-to-end classification implementation. According to the method, the dimension of the vector can bereduced and the problem that the features are too sparse can be effectively prevented. The final output vector is optimized and the effectiveness of model feature coding is enhanced. The problem thatthe model is difficult to train because of the high dimension can be avoided and the additional semantic information can also be provided. The feature extraction model and various common classifiers can be flexibly combined so as to facilitate replacement and debugging of the classifiers. The computational complexity is reduced from | K | to log | K | in comparison with that of Softmax.

Description

technical field [0001] The present invention relates to a text classification technology for large-scale news corpus, in particular to an end-to-end classification method for large-scale news texts based on Bi-GRU (Bi-directional Gated Recurrent Unit) and word vectors, which integrates text Vector representation technology and deep learning model Bi-GRU principle. Aiming at the feature selection problem of text classification, a Bi-GRU model is designed to extract features, which improves the long-term dependence of the sequence neural network model in the training of semantic features in long texts. At the same time, through the method of attention mechanism, the semantic code containing the attention weight distribution of the input sequence nodes is obtained, and it is used as the input of the classifier, which reduces the information loss and information redundancy in the feature vector extraction process. The invention belongs to the field of natural language processing....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27G06N3/04
CPCG06F40/30G06N3/045
Inventor 李雄张传新刘春阳张旭王萌王慧王利军李磊
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products