Unlock instant, AI-driven research and patent intelligence for your innovation.

Intelligent government affair text multi-classification method and system based on BERT

A smart government and multi-classification technology, applied in text database clustering/classification, unstructured text data retrieval, ticketing equipment, etc., can solve problems such as large number of words, impact on accuracy rate, slow training speed, etc., and achieve accurate testing The effect of high precision and strong reliability

Pending Publication Date: 2020-11-13
SHANDONG NORMAL UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, most e-government systems still rely on manual processing based on experience, which has problems such as heavy workload, low efficiency, and high error rate
[0004] The inventors found that most of the existing text classifications use word vectors, and most of the word vectors are trained by Word2Vec, GloVe and other methods. The number of words is large but the training speed is slow and the accuracy is affected by word cutting (that is, dividing a sequence of Chinese characters into individual words)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent government affair text multi-classification method and system based on BERT
  • Intelligent government affair text multi-classification method and system based on BERT
  • Intelligent government affair text multi-classification method and system based on BERT

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] The data used in this embodiment has a total of 9210 text information of mass messages, each message includes number, user, subject, time, message details and classification labels. The 9210 samples contained in the data set belong to seven different categories: urban and rural construction, environmental protection, transportation, education and sports, labor and social security, business tourism, health and family planning. The data distribution is shown in Table 1.

[0034] Table 1 Text data of mass messages

[0035]

[0036] This embodiment provides a BERT-based smart government text multi-classification method, the steps of which include:

[0037] Step 1: Obtain the government affairs text and convert it into a feature vector. The feature vector is composed of a word vector, a segmentation vector and a position vector, and marks the beginning and end of each sentence in the text.

[0038] Among them, the government affairs text includes message number, user, s...

Embodiment 2

[0070] This embodiment provides a BERT-based smart government text multi-classification system, including:

[0071] (1) A feature conversion module, which is used to obtain government affairs texts and convert them into feature vectors, the feature vectors are composed of word vectors, segment vectors and position vectors, and mark the beginning and end of each sentence in the text at the same time.

[0072] In a specific implementation, the word vector represents the encoding of the current word, and the segmentation vector represents the position encoding of the sentence where the current word is located. The position vector represents the position encoding of the current word. Each sentence uses CLS and SEP as the beginning and end marks. The Encoder feature extractor consists of a self-attention mechanism and a feed-forward neural network. Among them, the government affairs text includes message number, user, subject, time, message details and classification labels.

[00...

Embodiment 3

[0089] This embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps in the BERT-based smart government text multi-classification method described in Embodiment 1 are implemented.

[0090] In the present embodiment, a bidirectional Transformer encoding layer is used in the feature transformation module, and text features are extracted by this layer. Each feature extracted includes information about left and right contexts, which overcomes the neglect of the context relationship of words in the prior art. The model obtained by setting the number of training steps is verified and the model with the highest score is saved, and the model with the highest score is used for testing, which has the beneficial effects of high test accuracy, strong reliability and stability.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of text classification, and provides an intelligent government affair text multi-classification method and system based on BERT. The intelligent government affair text multi-classification method comprises the steps of obtaining government affair texts, converting the government affair texts into feature vectors, and marking the beginning and the end of each sentence in the texts, wherein the feature vectors are composed of word vectors, segmentation vectors and position vectors; inputting the feature vector into a trained BERT model, and outputting a classification result of the government affair text; wherein in the process of training the BERT model, an Encoder feature extractor in bidirectional Transformer coding is adopted to extract text features inthe feature vectors.

Description

technical field [0001] The invention belongs to the field of text classification, and in particular relates to a BERT-based smart government text multi-classification method and system. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] When processing public messages on the online political inquiry platform, classify the messages so that the public messages can be assigned to the corresponding functional departments in the future. At present, most e-government systems still rely on manual processing based on experience, which has problems such as heavy workload, low efficiency, and high error rate. [0004] The inventors found that most of the existing text classifications use word vectors, and most of the word vectors are trained by Word2Vec, GloVe and other methods. The number of words is large and the training speed is slow and the accu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06K9/62G06Q50/26G07B15/06
CPCG07B15/063G06F16/35G06Q50/26G06F18/2411G06F18/214
Inventor 王红韩书庄鲁贺李威张慧刘弘胡斌王吉华于晓梅
Owner SHANDONG NORMAL UNIV