Long text news automatic labeling method based on pre-training

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A long-text, pre-training technology, used in natural language data processing, special data processing applications, network data retrieval, etc.

Inactive Publication Date: 2021-08-06

CHANGCHUN UNIV OF TECH

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In view of the deficiencies of traditional manual and traditional algorithms in long text data labeling, the purpose of this invention is to provide a faster and more accurate long text news labeling method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example 1

[0031] Example 1 The present invention is tested by the long text news collected by oneself

[0032] This data set is a data set composed of 90,000 news long texts. It is a data set for Chinese news classification, including financial, real estate, education, technology, military, automobile, sports, games, and entertainment data.

[0033] The present invention selects the Bert_CNN model as the basic model of the text representation model, and uses three indicators to evaluate its performance, which are completeness (completeness) Rand index MI (MRand index), mutual information AMI (MutualInformation based scores), and at the same time with 3 The three existing methods are compared, namely bertRCNN, bertRNN, and bert. The three existing methods all run under their respective optimal parameters. The relevant parameters of the method of the present invention are set as follows: the number of epochs is 5, the size of the mini-batch is 128, the learning rate is 0.00005, and the dr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a long text news automatic labeling method based on pre-training. The method aims to automatically label categories for news, and a model mainly comprises the steps of data preprocessing, data set division, model loading, model training and text labeling. Traditional manual labeling is high in cost and is time-consuming and labor-consuming, the traditional machine learning is powerless in the aspect of classification of long texts, classical long text model algorithms are not very good in effect due to grammar difference of Chinese, and on the basis of the consideration, the invention provides a long text news automatic labeling method based on pre-training. The news category of the long text can be automatically labeled.

Description

technical field [0001] The invention belongs to the field of natural language processing, and is a method for automatically labeling long text news based on pre-training. Background technique [0002] With the rapid development of Internet, machine learning, big data and other technologies, all kinds of information data continue to grow at an exponential rate. At present, most of the machine learning and deep learning algorithms relied on by artificial intelligence are data-dependent and require a large amount of data. Train algorithms in a supervised or semi-supervised manner for customized deployment. Due to the huge amount of big data in our country, especially news texts without a fixed format, with various types and fast update speed, it poses a huge challenge to the data labeling task. [0003] The most common news category labeling is to manually label the full amount of data. This method has high labor costs, and it is difficult to guarantee data quality. It is inev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/169G06F16/951G06N20/00

CPCG06F16/951G06N20/00G06F40/169

Inventor 王红梅郭放张丽杰党源源

Owner CHANGCHUN UNIV OF TECH

Long text news automatic labeling method based on pre-training

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A long-text, pre-training technology, used in natural language data processing, special data processing applications, network data retrieval, etc.

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example 1

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A long-text, pre-training technology, used in natural language data processing, special data processing applications, network data retrieval, etc.

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology