Automatic text classification method based on BERT and feature fusion

A technology of automatic classification and feature fusion, applied in the field of supervised text classification and deep learning, can solve the problems of word vector or word vector change, single information coverage, etc., and achieve the effect of improving accuracy and coding ability.

Active Publication Date: 2019-11-05
HUAIYIN INSTITUTE OF TECHNOLOGY
View PDF5 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the traditional text classification methods, most of them are based on deep learning methods. Most of them use CNN model or RNN model to solve the text classification problem. As input, the word vector or word vector cannot be changed according to its context, and the information coverage is relatively simple

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic text classification method based on BERT and feature fusion
  • Automatic text classification method based on BERT and feature fusion
  • Automatic text classification method based on BERT and feature fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] BERT (Bidirectional Encoder Representation from Transformers, Transformer's bidirectional encoding representation) language model: BERT uses the masked model to realize the bidirectionality of the language model, which proves the importance of bidirectionality for language representation pre-training. The BERT model is a two-way language model in the true sense, and each word can use the context information of the word at the same time. BERT is the first fine-tuning model to achieve the best results in both sentence-level and token-level natural language tasks. It is proved that pre-trained representations can alleviate the design requirements of special model structures for different tasks. BERT achieves the best results on 11 natural language processing tasks. And in BERT's extensive ablations proved that "BERT's bidirectionality" is an important innovation. The BERT language model realizes the conversion of text to dynamic word vectors and enhances the semantic inf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic text classification method based on BERT and feature fusion. The method comprises the following steps: firstly, cleaning text data, realizing conversion from a text to a dynamic word vector through BERT, extracting features of the text by utilizing CNN and BiLSTM, and respectively transmitting a word vector sequence output by the BERT to a CNN network and a BiLSTM network; then, splicing the output of a CNN network and the output of a BiLSTM network together, carrying out feature fusion, and finally, outputting a final prediction probability vector througha full connection layer and a softmax layer. The method is suitable for the general supervised text label prediction problem, and can effectively improve the prediction accuracy of the text data labels with prominent sequence information and local features.

Description

technical field [0001] The invention relates to the field of supervised text classification and deep learning, in particular to an automatic text classification method based on BERT and feature fusion. Background technique [0002] With the rapid increase of online text information data on the Internet, text classification plays a vital role in information processing. It is a key technology for processing large-scale text information and promotes the development of information processing in the direction of automation. Text classification is to automatically classify and mark text data according to a certain classification system or standard. It belongs to an automatic classification based on the classification system. Building a reasonable pre-trained language model and a downstream network structure can effectively solve the text classification problem, thereby improving the accuracy of the predicted label. [0003] In the traditional text classification methods, most of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F17/27G06N3/04G06N3/06
CPCG06F16/35G06N3/061G06N3/045
Inventor 高尚兵李文婷朱全银周泓陈晓兵相林陈浩霖李翔于永涛
Owner HUAIYIN INSTITUTE OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products