Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A text classification method

A text classification and text data technology, applied in text database clustering/classification, neural learning methods, unstructured text data retrieval, etc., can solve problems such as the inability to provide better text information long-distance dependence, and achieve improved Effects of anti-interference, guaranteed effectiveness, and smooth weight update

Active Publication Date: 2022-08-02
CHENGDU UNIV OF INFORMATION TECH
View PDF20 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Although deep learning methods have made significant progress in the research of text classification tasks, a large amount of labeled data is required when building a model, including text data information in multiple fields, and the amount of text information is gradually increasing significantly. The form of long text is also gradually replacing short text to provide more detailed information, so in the processing of long text, the existing model cannot provide better long-distance dependence of text information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text classification method
  • A text classification method
  • A text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] Please refer to figure 1 , figure 1 A schematic flowchart of a text classification method, Embodiment 1 of the present invention provides a text classification method, and the method includes:

[0042] Build the first text classification model;

[0043] collecting sample text data, and processing the sample text data to obtain a training set;

[0044] Using the training set to train the first text classification model to obtain a second text classification model;

[0045] The text data to be classified is obtained, and the text data to be classified is input into the second text classification model, and the second text classification model outputs a classification result of the text data to be classified.

[0046] This method is described in detail below:

[0047] In view of the shortcomings and deficiencies of the existing text classification problem, this embodiment provides a text classification method based on Bert-DPCNN model improvement, word segmentation data ...

Embodiment 2

[0077] The technical scheme adopted by the present invention is: a text classification method based on Bert-DPCNN model improvement, word segmentation data enhancement, and confrontation learning, including the following steps:

[0078]Data preprocessing. Divide the used data set into training set, test set, and verification set according to the ratio of 8:1:1, and process the special characters, spaces, expressions, etc. that may appear in the data set. Based on the pre-training model, the above-mentioned The divided first text set is encoded with character vectors and a list of data structures and labels that can be recognized by the word segmentation model. Chinese word segmentation is carried out in units of words. For each token character, the word list index of the character is returned, and the flags [CLS] and [SEP] are added at the beginning and end of the text tagging sequence. The length of the sentences in the data set is uniform, and the sequence length is too small...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text classification method, which relates to the field of intelligent text processing. The method includes: constructing a first text classification model; collecting sample text data, processing the sample text data to obtain a training set; using the training set Train the first text classification model to obtain a second text classification model; obtain text data to be classified, input the text data to be classified into the second text classification model, and the second text classification model outputs the text to be classified Data classification results; this method can improve the extraction effect of text information from long texts.

Description

Technical field [0001] The present invention relates to the field of text intelligent processing, and specifically to a text classification method. Background technique [0002] With the continuous development of the mobile Internet, the information people come into contact with in daily life not only comes from TV newspapers, but also more text comes from major Internet websites and online platforms. Text is important information data. The development of the information age includes a large amount of text information in various fields. Based on text content, large-scale information resources can be distinguished from data in different fields, improving the efficiency of information retrieval, text mining and other applications. However, Internet text data is experiencing explosive growth. The content volume and complexity of texts in various fields have increased significantly, making text classification for long texts and even ultra-long texts a development problem. There...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/126G06K9/62G06N3/04G06N3/08
CPCG06F16/355G06F40/126G06N3/084G06N3/045G06F18/241G06F18/214
Inventor 岳希周涛何磊唐聃高燕刘斌
Owner CHENGDU UNIV OF INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products