Intelligent government affair text multi-classification method and system based on BERT
A smart government and multi-classification technology, applied in text database clustering/classification, unstructured text data retrieval, ticketing equipment, etc., can solve problems such as large number of words, impact on accuracy rate, slow training speed, etc., and achieve accurate testing The effect of high precision and strong reliability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0033] The data used in this embodiment has a total of 9210 text information of mass messages, each message includes number, user, subject, time, message details and classification labels. The 9210 samples contained in the data set belong to seven different categories: urban and rural construction, environmental protection, transportation, education and sports, labor and social security, business tourism, health and family planning. The data distribution is shown in Table 1.
[0034] Table 1 Text data of mass messages
[0035]
[0036] This embodiment provides a BERT-based smart government text multi-classification method, the steps of which include:
[0037] Step 1: Obtain the government affairs text and convert it into a feature vector. The feature vector is composed of a word vector, a segmentation vector and a position vector, and marks the beginning and end of each sentence in the text.
[0038] Among them, the government affairs text includes message number, user, s...
Embodiment 2
[0070] This embodiment provides a BERT-based smart government text multi-classification system, including:
[0071] (1) A feature conversion module, which is used to obtain government affairs texts and convert them into feature vectors, the feature vectors are composed of word vectors, segment vectors and position vectors, and mark the beginning and end of each sentence in the text at the same time.
[0072] In a specific implementation, the word vector represents the encoding of the current word, and the segmentation vector represents the position encoding of the sentence where the current word is located. The position vector represents the position encoding of the current word. Each sentence uses CLS and SEP as the beginning and end marks. The Encoder feature extractor consists of a self-attention mechanism and a feed-forward neural network. Among them, the government affairs text includes message number, user, subject, time, message details and classification labels.
[00...
Embodiment 3
[0089] This embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps in the BERT-based smart government text multi-classification method described in Embodiment 1 are implemented.
[0090] In the present embodiment, a bidirectional Transformer encoding layer is used in the feature transformation module, and text features are extracted by this layer. Each feature extracted includes information about left and right contexts, which overcomes the neglect of the context relationship of words in the prior art. The model obtained by setting the number of training steps is verified and the model with the highest score is saved, and the model with the highest score is used for testing, which has the beneficial effects of high test accuracy, strong reliability and stability.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


