Text classification method, classifier and system based on key information and dynamic routing

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A text classification and key information technology, applied in text database clustering/classification, neural learning methods, digital data information retrieval, etc., can solve the problems of text key information waste, key information auxiliary classification, BERT waste, etc. Accuracy, enhance expression ability, improve the effect of accuracy

Pending Publication Date: 2022-05-10

UNIV OF ELECTRONIC SCI & TECH OF CHINA +1

View PDF6 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The emergence of the pre-trained language model BERT has brought the research of text classification tasks into a new stage. Most of the research work is directly carried out in the downstream of BERT to build the text classification model, which has achieved quite good classification results.

However, the existing methods still have shortcomings: (1) directly input the text to be classified into the model for prediction, without using the key information in the original text for auxiliary classification, resulting in a waste of key information in the text; (2) in the BERT The downstream network simply piles up other neural network models to build the question answering model, only using the high-level features of the text to be classified after BERT's deep encoding, and not using the low-level features inside BERT, resulting in a waste of BERT's internal features; (3) Existing classification models use a large number of pooling operations in convolutional neural networks (CNN), which loses part of the feature information in the model, making the network unable to learn more advanced thinking

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0058] In an exemplary embodiment, a text classification method based on key information and dynamic routing is provided, such as figure 1 As shown, the method includes the following:

[0059] Preprocess the text to be classified, and extract keywords in the text to be classified;

[0060] Input the pre-processed text to be classified and the keywords into the embedding layer of the pre-trained language model BERT to obtain a pre-classification output;

[0061] Inputting the pre-classification output into BERT for deep encoding to obtain reclassification output;

[0062] In BERT, the pooling operation in CNN is replaced by dynamic routing to build a downstream classifier, and the pre-classification output and re-classification output are respectively input into the classifier to obtain pre-classification results and re-classification results respectively;

[0063] The final text classification result is obtained by weighting the pre-classification results and the re-classifi...

Embodiment 2

[0068] Based on Embodiment 1, a text classification method based on key information and dynamic routing is provided, such as figure 2 As shown, the preprocessing of the text to be classified includes:

[0069] Perform data cleaning, word segmentation, removal of stop words and features on the text to be classified, and convert each text to be classified into a word sequence;

[0070] Let T = {t 1 ,t 2 ,...,t L} represents the preprocessed word sequence of the text to be classified, where t i Indicates the word at the i-th position in the word sequence, and L indicates the maximum length of the text to be classified allowed by the model.

[0071] Further, said extracting keywords in the text to be classified includes:

[0072] Use the TextRank algorithm to extract M keywords in the word sequence T of the text to be classified, let K={k 1 ,k 2 ,...,k M} represents the extracted M keywords and arranges them according to their relative positions in the original word seque...

Embodiment 3

[0087] In this embodiment, a BERT downstream text classifier is provided. In this embodiment, the text classifier is built based on a bidirectional long-short time sequence memory network and a capsule neural network, such as image 3 As shown, the text classifier includes sequentially connected input layer, bidirectional LSTM operation layer, main capsule layer, dynamic routing layer, classification capsule layer and Softmax layer; the main capsule layer is used as the parent capsule layer through voting dynamic routing The mechanism establishes a non-linear mapping relationship with the classification capsule layer.

[0088] Further, the bidirectional LSTM operation layer uses a bidirectional long-short-term sequence memory network to perform sequence modeling on the input features to capture bidirectional interaction relationships in the sequence.

[0089] Specifically, the steps to use this classifier are as follows:

[0090] (1) Let X∈R (L+M+3)×V Represents the input fe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a text classification method, classifier and system based on key information and dynamic routing, and belongs to the technical field of text classification. The method comprises the steps that a to-be-classified text is preprocessed, and keywords in the to-be-classified text are extracted; inputting the preprocessed to-be-classified texts and the keywords into an embedded layer of a pre-training language model BERT to obtain pre-classification output; inputting the pre-classified output into BERT for depth coding to obtain a re-classified output; in BERT, pooling operation in the CNN is replaced by dynamic routing to build a downstream classifier, and pre-classification output and re-classification output are respectively input into the classifier. According to the method, the output feature of the embedded layer in the BERT and the final output feature of BERT depth coding are input into the downstream classifier for two times of classification prediction, and the final classification result is obtained through weighted addition of the two prediction results, so that the shallow feature expression in the BERT is fully utilized; and the classification accuracy is greatly improved while the internal structure of the BERT is not changed.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a text classification method, classifier and system based on key information and dynamic routing. Background technique [0002] With the popularization of smart phones and the development of Internet technology, online news media, social media platforms, online live streaming, self-media platforms and other applications that change people's lifestyles have sprung up, and the amount of text data generated by them has also increased. A blowout growth trend. Facing the increasing mass of text data, how to automatically classify it to help people quickly and accurately obtain the information they need or are interested in is an important research problem. The text classification task has begun to receive extensive attention from researchers and has become a An important task in natural language processing. Text classification is the process of automatically determining ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F16/35G06F16/215G06F40/289G06F40/30G06N3/04G06N3/08

CPCG06F16/35G06F16/215G06F40/289G06F40/30G06N3/084G06N3/045G06N3/044

Inventor 李晓瑜彭宇胡世杰冯旭栋张聪陆超

Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Text classification method, classifier and system based on key information and dynamic routing

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology