Deep active learning text classification method based on pre-training model

An active learning and pre-training technology, applied in text database clustering/classification, unstructured text data retrieval, character and pattern recognition, etc. The effect of high recognition accuracy and low data labeling cost
CN112434736APending Publication Date: 2021-03-02成都潜在人工智能科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
成都潜在人工智能科技有限公司
Publication Date
2021-03-02

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a deep active learning text classification method based on a pre-training model. The method comprises the steps of: combining the pre-training model trained on a large number ofgeneral texts and utilizing the pre-training model to obtain semantic codes of the texts as input features of a classifier; then constructing a classifier to start training on an initial training set, based on the initial model and the to-be-marked sample selection strategy and the data supplement strategy provided by the invention, carrying out iteration continuously under participation of manual marking until the maximum number of iterations is reached or the marking budget is exhausted, and subsequently embedding the obtained model into a specific product to obtain a final product; and performing inference procedures. By means of the mode, high recognition accuracy can be obtained with low data labeling cost under the condition that diversity of training samples is guaranteed.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the technical field of automatic text classification, in particular to a deep active learning text classification method based on a pre-trained model. Background technique

[0002] At present, in order to automatically classify texts better, various types of text classification methods have been proposed one after another. For example, the patent with the publication number CN110263173A provides a machine learning method and device for quickly improving the performance of text classification, but this method uses a threshold to divide the samples into automatically generated tags and manual tags when determining samples that need manual labeling. Labeling two parts, when the threshold is set higher, it is easy to increase the cost of labeling, and when the threshold is lower, it will increase the risk of introducing wrong labels (automatic labeling). The patent with the publication number CN107169001A discloses a text classifi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More