Semi-supervised learning method and system for text classification

A semi-supervised learning and text classification technology, applied in the semi-supervised learning method and system field of text classification, can solve the problems of infeasibility, cost, and over-fitting, so as to improve accuracy, improve efficiency, and reduce labor costs Effect

Inactive Publication Date: 2021-03-19
BEIJING ZHONGHAIJIYUAN DIGITAL TECH DEV CO LTD
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In many application scenarios of text classification, collecting large labeled data usually requires a lot of human labor for predictive labeling, however, manual labeling is inefficient, expensive or not feasible
When the sample size is small, supervised learning is usually used for training. However, typical supervised learning algorithms are prone to overfitting and cannot effectively represent data features when the labeled data set is small.
When there is a large amount of sample data, but there is no specific expected result or label, unsupervised learning is usually a commonly used method. In scenarios where sample differences are not particularly obvious, unsupervised learning cannot provide reliable category information and cannot satisfy accurate Classification requirements
[0004] In the text classification task, since there are a small amount of labeled sample data, and the categories are not comprehensive enough, and there are a large number of unlabeled sample data, including all categories, neither supervised learning nor unsupervised learning can be effectively used in this scenario. this scene

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised learning method and system for text classification
  • Semi-supervised learning method and system for text classification
  • Semi-supervised learning method and system for text classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to demonstrate the technical solution of the present invention clearly and in detail, the present invention will be described below in conjunction with the accompanying drawings, but it is not used to limit the scope of the present invention.

[0039] see figure 1 A flowchart of a semi-supervised learning method for text classification provided in Embodiment 1 of the present invention, including steps:

[0040] Obtain a sample set for the relevant task;

[0041] Preprocessing the sample set;

[0042] Perform prediction and classification labeling on the preprocessed unlabeled sample set, and expand the sample set;

[0043] The expanded sample set is used to train the deep learning model.

[0044] The above sample set includes a labeled sample set and an unlabeled sample set, and the above preprocessing includes performing data cleaning on each labeled sample and non-labeled sample. For example, suppose you need to train a text classification model for a cer...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a semi-supervised learning method and system for text classification, and the method comprises the steps: obtaining a sample set for a related task, and enabling the sample set to comprise a labeled sample set and an unlabeled sample set; preprocessing the sample set; predicting, classifying and labeling the preprocessed unlabeled sample set, and expandingthe sample set; and training the deep learning model by adopting the expanded sample set. According to the method, the problem of lack of label data in learning can be supervised, tasks are learned byusing unlabeled data and labeled sample sets, and the situation of lack of category labels is solved by using a clustering method in an unsupervised method. According to the method provided by the invention, the text classification efficiency is greatly improved, the labor cost is reduced, and the accuracy of unsupervised classification is improved under the condition of a small number of labels.

Description

technical field [0001] The invention relates to the technical field of machine learning, in particular to a semi-supervised learning method and system for text classification. Background technique [0002] At present, tasks in natural language processing usually include sub-tasks such as text classification, entity recognition, and emotion recognition. Text classification tasks refer to dividing text into specific labels. The supervised method trains the model, and then classifies based on the trained model. [0003] In many application scenarios of text classification, collecting large labeled data usually requires a large amount of human labor for predictive labeling, however, human labeling is inefficient, expensive or not feasible. When the sample size is small, supervised learning is usually used for training. However, typical supervised learning algorithms are prone to overfitting and cannot effectively represent data features when the labeled data set is small. When...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/35G06F18/24147
Inventor李越超
OwnerBEIJING ZHONGHAIJIYUAN DIGITAL TECH DEV CO LTD