Deep active learning text classification method based on pre-training model

An active learning and pre-training technology, applied in text database clustering/classification, unstructured text data retrieval, character and pattern recognition, etc. The effect of high recognition accuracy and low data labeling cost

Pending Publication Date: 2021-03-02
成都潜在人工智能科技有限公司
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, the patent with the publication number CN110263173A provides a machine learning method and device for quickly improving the performance of text classification, but this method uses a threshold to divide the samples into automatically generated tags and manual tags when determining samples that need manual labeling. Two parts are marked. When the threshold is set higher, it is easy to increase the cost of labeling. When the threshold is lower, it will increase the risk of introducing wrong marks (automatic marks).
The patent with the publication number CN107169001A discloses a text classification model optimization method based on crowdsourcing feedback and active learning, but this method onl...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep active learning text classification method based on pre-training model
  • Deep active learning text classification method based on pre-training model
  • Deep active learning text classification method based on pre-training model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

[0026] It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", etc. are only used to distinguish descriptions, and cannot be understood as indicating or implying relative importance.

[0027] Please see figure 1 , figure 1 It is a schematic flowchart of a deep active learning text classification method based on a pre-trained model provided by an embodiment of the present invention.

[0028] According to the applicant's research, the existing text classification active learning methods generally only select the most uncertain samples in the model for la...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a deep active learning text classification method based on a pre-training model. The method comprises the steps of: combining the pre-training model trained on a large number ofgeneral texts and utilizing the pre-training model to obtain semantic codes of the texts as input features of a classifier; then constructing a classifier to start training on an initial training set, based on the initial model and the to-be-marked sample selection strategy and the data supplement strategy provided by the invention, carrying out iteration continuously under participation of manual marking until the maximum number of iterations is reached or the marking budget is exhausted, and subsequently embedding the obtained model into a specific product to obtain a final product; and performing inference procedures. By means of the mode, high recognition accuracy can be obtained with low data labeling cost under the condition that diversity of training samples is guaranteed.

Description

technical field [0001] The invention relates to the technical field of automatic text classification, in particular to a deep active learning text classification method based on a pre-trained model. Background technique [0002] At present, in order to automatically classify texts better, various types of text classification methods have been proposed one after another. For example, the patent with the publication number CN110263173A provides a machine learning method and device for quickly improving the performance of text classification, but this method uses a threshold to divide the samples into automatically generated tags and manual tags when determining samples that need manual labeling. Labeling two parts, when the threshold is set higher, it is easy to increase the cost of labeling, and when the threshold is lower, it will increase the risk of introducing wrong labels (automatic labeling). The patent with the publication number CN107169001A discloses a text classifi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F16/35
CPCG06F16/353G06F18/214
Inventor 尹学渊祁松茂江天宇陈洪宇
Owner 成都潜在人工智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products