Unlock instant, AI-driven research and patent intelligence for your innovation.

Long text news automatic labeling method based on pre-training

A long-text, pre-training technology, used in natural language data processing, special data processing applications, network data retrieval, etc.

Inactive Publication Date: 2021-08-06
CHANGCHUN UNIV OF TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the deficiencies of traditional manual and traditional algorithms in long text data labeling, the purpose of this invention is to provide a faster and more accurate long text news labeling method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Long text news automatic labeling method based on pre-training
  • Long text news automatic labeling method based on pre-training
  • Long text news automatic labeling method based on pre-training

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0031] Example 1 The present invention is tested by the long text news collected by oneself

[0032] This data set is a data set composed of 90,000 news long texts. It is a data set for Chinese news classification, including financial, real estate, education, technology, military, automobile, sports, games, and entertainment data.

[0033] The present invention selects the Bert_CNN model as the basic model of the text representation model, and uses three indicators to evaluate its performance, which are completeness (completeness) Rand index MI (MRand index), mutual information AMI (MutualInformation based scores), and at the same time with 3 The three existing methods are compared, namely bertRCNN, bertRNN, and bert. The three existing methods all run under their respective optimal parameters. The relevant parameters of the method of the present invention are set as follows: the number of epochs is 5, the size of the mini-batch is 128, the learning rate is 0.00005, and the dr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a long text news automatic labeling method based on pre-training. The method aims to automatically label categories for news, and a model mainly comprises the steps of data preprocessing, data set division, model loading, model training and text labeling. Traditional manual labeling is high in cost and is time-consuming and labor-consuming, the traditional machine learning is powerless in the aspect of classification of long texts, classical long text model algorithms are not very good in effect due to grammar difference of Chinese, and on the basis of the consideration, the invention provides a long text news automatic labeling method based on pre-training. The news category of the long text can be automatically labeled.

Description

technical field [0001] The invention belongs to the field of natural language processing, and is a method for automatically labeling long text news based on pre-training. Background technique [0002] With the rapid development of Internet, machine learning, big data and other technologies, all kinds of information data continue to grow at an exponential rate. At present, most of the machine learning and deep learning algorithms relied on by artificial intelligence are data-dependent and require a large amount of data. Train algorithms in a supervised or semi-supervised manner for customized deployment. Due to the huge amount of big data in our country, especially news texts without a fixed format, with various types and fast update speed, it poses a huge challenge to the data labeling task. [0003] The most common news category labeling is to manually label the full amount of data. This method has high labor costs, and it is difficult to guarantee data quality. It is inev...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/169G06F16/951G06N20/00
CPCG06F16/951G06N20/00G06F40/169
Inventor 王红梅郭放张丽杰党源源
Owner CHANGCHUN UNIV OF TECH