Supercharge Your Innovation With Domain-Expert AI Agents!

Text classification method, classifier and system based on key information and dynamic routing

A text classification and key information technology, applied in text database clustering/classification, neural learning methods, digital data information retrieval, etc., can solve the problems of text key information waste, key information auxiliary classification, BERT waste, etc. Accuracy, enhance expression ability, improve the effect of accuracy

Pending Publication Date: 2022-05-10
UNIV OF ELECTRONIC SCI & TECH OF CHINA +1
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The emergence of the pre-trained language model BERT has brought the research of text classification tasks into a new stage. Most of the research work is directly carried out in the downstream of BERT to build the text classification model, which has achieved quite good classification results.
However, the existing methods still have shortcomings: (1) directly input the text to be classified into the model for prediction, without using the key information in the original text for auxiliary classification, resulting in a waste of key information in the text; (2) in the BERT The downstream network simply piles up other neural network models to build the question answering model, only using the high-level features of the text to be classified after BERT's deep encoding, and not using the low-level features inside BERT, resulting in a waste of BERT's internal features; (3) Existing classification models use a large number of pooling operations in convolutional neural networks (CNN), which loses part of the feature information in the model, making the network unable to learn more advanced thinking

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method, classifier and system based on key information and dynamic routing
  • Text classification method, classifier and system based on key information and dynamic routing
  • Text classification method, classifier and system based on key information and dynamic routing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] In an exemplary embodiment, a text classification method based on key information and dynamic routing is provided, such as figure 1 As shown, the method includes the following:

[0059] Preprocess the text to be classified, and extract keywords in the text to be classified;

[0060] Input the pre-processed text to be classified and the keywords into the embedding layer of the pre-trained language model BERT to obtain a pre-classification output;

[0061] Inputting the pre-classification output into BERT for deep encoding to obtain reclassification output;

[0062] In BERT, the pooling operation in CNN is replaced by dynamic routing to build a downstream classifier, and the pre-classification output and re-classification output are respectively input into the classifier to obtain pre-classification results and re-classification results respectively;

[0063] The final text classification result is obtained by weighting the pre-classification results and the re-classifi...

Embodiment 2

[0068] Based on Embodiment 1, a text classification method based on key information and dynamic routing is provided, such as figure 2 As shown, the preprocessing of the text to be classified includes:

[0069] Perform data cleaning, word segmentation, removal of stop words and features on the text to be classified, and convert each text to be classified into a word sequence;

[0070] Let T = {t 1 ,t 2 ,...,t L} represents the preprocessed word sequence of the text to be classified, where t i Indicates the word at the i-th position in the word sequence, and L indicates the maximum length of the text to be classified allowed by the model.

[0071] Further, said extracting keywords in the text to be classified includes:

[0072] Use the TextRank algorithm to extract M keywords in the word sequence T of the text to be classified, let K={k 1 ,k 2 ,...,k M} represents the extracted M keywords and arranges them according to their relative positions in the original word seque...

Embodiment 3

[0087] In this embodiment, a BERT downstream text classifier is provided. In this embodiment, the text classifier is built based on a bidirectional long-short time sequence memory network and a capsule neural network, such as image 3 As shown, the text classifier includes sequentially connected input layer, bidirectional LSTM operation layer, main capsule layer, dynamic routing layer, classification capsule layer and Softmax layer; the main capsule layer is used as the parent capsule layer through voting dynamic routing The mechanism establishes a non-linear mapping relationship with the classification capsule layer.

[0088] Further, the bidirectional LSTM operation layer uses a bidirectional long-short-term sequence memory network to perform sequence modeling on the input features to capture bidirectional interaction relationships in the sequence.

[0089] Specifically, the steps to use this classifier are as follows:

[0090] (1) Let X∈R (L+M+3)×V Represents the input fe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text classification method, classifier and system based on key information and dynamic routing, and belongs to the technical field of text classification. The method comprises the steps that a to-be-classified text is preprocessed, and keywords in the to-be-classified text are extracted; inputting the preprocessed to-be-classified texts and the keywords into an embedded layer of a pre-training language model BERT to obtain pre-classification output; inputting the pre-classified output into BERT for depth coding to obtain a re-classified output; in BERT, pooling operation in the CNN is replaced by dynamic routing to build a downstream classifier, and pre-classification output and re-classification output are respectively input into the classifier. According to the method, the output feature of the embedded layer in the BERT and the final output feature of BERT depth coding are input into the downstream classifier for two times of classification prediction, and the final classification result is obtained through weighted addition of the two prediction results, so that the shallow feature expression in the BERT is fully utilized; and the classification accuracy is greatly improved while the internal structure of the BERT is not changed.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a text classification method, classifier and system based on key information and dynamic routing. Background technique [0002] With the popularization of smart phones and the development of Internet technology, online news media, social media platforms, online live streaming, self-media platforms and other applications that change people's lifestyles have sprung up, and the amount of text data generated by them has also increased. A blowout growth trend. Facing the increasing mass of text data, how to automatically classify it to help people quickly and accurately obtain the information they need or are interested in is an important research problem. The text classification task has begun to receive extensive attention from researchers and has become a An important task in natural language processing. Text classification is the process of automatically determining ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F16/215G06F40/289G06F40/30G06N3/04G06N3/08
CPCG06F16/35G06F16/215G06F40/289G06F40/30G06N3/084G06N3/045G06N3/044
Inventor 李晓瑜彭宇胡世杰冯旭栋张聪陆超
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More