Text topic classification model based on multi-source-domain integrated migration learning and classification method

A source domain, text technology, applied in the text topic classification model and classification field based on multi-source domain integrated transfer learning, can solve the problems of negative transfer, large resource consumption, and inability to judge the correct rate of pseudo-classified data in the source domain, etc. Achieve high accuracy, strong anti-interference ability, avoid negative migration phenomenon

Inactive Publication Date: 2018-08-28
YUNNAN UNIV
View PDF5 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although researchers have achieved some results in this research field, due to the complexity of transfer learning, existing transfer learning models have the following disadvantages: (1) There is too little data in the target domain, and it is necessary to find the original domain data for instance transfer, while The source domain data that can assist the target domain data is relatively simple, and it is easy to cause the data distribution to be different from the target domain data distribution, resulting in negative migration phenomenon; (2) The data samples in the source domain have relatively high requirements and need to be labeled. But in practice, more source domain data is unlabeled; (3) To continue to develop target domain data, labeling unlabeled data in the target domain requires a lot of labor and expert knowledge, and consumes a lot of resources. (4) The accuracy rate of the data with pseudo-labels in the source domain added to the target domain cannot be judged. The data in the target domain is too small to train a good classifier, and the data with pseudo-labels is too small to train a good classifier. The class-labeled data is also added through this classifier with poor classification effect, and the effect is relatively poor. If multiple classifiers can be trained, one of them will add pseudo-class labels to the source domain data without class labels, and use the integrated learning Thinking, and then use other classifiers to test, judge the correctness of the pseudo-class label, select the source domain data with pseudo-class label that is correctly classified by the classifier, consider it to be data with strong migration ability, and add it to the target domain , to migrate to get better classification results; (5) Insufficient use of data, most transfer learning uses a single source domain for migration, this method is not ideal, and the difference in data distribution can easily cause negative transfer , that is, not only did not help the target domain data to train a "good" classifier, but also affected the classification effect of the classifier. Most transfer learning uses the size of the weight to change the influence of the instance on the transfer learning. In this case Under , the experiment will appear overweight, resulting in overfitting
In general, the existing migration learning model, if the migration is improper, there will be a state of negative migration, and it does not help the target domain data to train a classifier with good classification effect, and also inhibits the normal classification of the classifier, making the existing Transfer learning is not mature

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text topic classification model based on multi-source-domain integrated migration learning and classification method
  • Text topic classification model based on multi-source-domain integrated migration learning and classification method
  • Text topic classification model based on multi-source-domain integrated migration learning and classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] Step 1. Classify all three classifiers to obtain data with pseudo-labels of different types of text topics. When the target domain is C, use NN classifier, CNN classifier and Softmax for the data of source domain S, R, and T Classifier Three classifiers for classification;

[0054]Step 2. Use 100% of the C target domain data to use the Softmax classifier for experiments, and record the correct rate; 1% of the C target domain data uses Softmax for experiments, and record the correct rate; 1% of the C target domain data uses the NN classifier Do an experiment and record the correct rate; 1% of the C target domain data is tested with a CNN classifier, and the correct rate is recorded; 1% of the C data and the source domain S data added to it are used for experiments with the Softmax classifier, and the correct rate is recorded ;Use 1% of the C data and the data of the source domain R added to it to do experiments with the Softmax classifier, and record the correct rate; us...

Embodiment 2

[0056] Step 1. Classify all three classifiers to obtain data with pseudo-labels of different types of text topics. When the target domain is S, use NN classifier, CNN classifier and Softmax for the data of source domain C, R, and T Classifier Three classifiers for classification;

[0057] Step 2. Use 100% of the data in the S target domain to conduct experiments with Softmax classifiers, and record the correct rate; use Softmax for 1% of the S target domain data, and record the correct rate; use NN classifier for 1% of the S target domain data Do an experiment and record the correct rate; 1% of the S target domain data is tested with a CNN classifier to record the correct rate; use 1% of the S data and the data of the source domain C added to it to use the Softmax classifier for the experiment, and record the correct rate ; Use 1% of the S data and the data of the source domain R added to it and use the Softmax classifier to do experiments, and record the correct rate; use 1% ...

Embodiment 3

[0059] Step 1. Classify all three classifiers to obtain data with pseudo-labels of different types of text topics. When the target domain is R, use NN classifier, CNN classifier and Softmax for the data of source domain C, S, and T Classifier Three classifiers for classification;

[0060] Step 2. Use 100% of the data in the R target domain to experiment with the Softmax classifier, and record the correct rate; use Softmax for 1% of the R target domain data, and record the correct rate; 1% of the R target domain data uses the NN classifier Do an experiment and record the correct rate; 1% of the R target domain data is tested with a CNN classifier, and the correct rate is recorded; use 1% of the R data and the data of the source domain C added to it and the Softmax classifier is used for the experiment, and the correct rate is recorded ; Use 1% of the R data and the data of the source domain S added to it to use the Softmax classifier for experiments, and record the correct rate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text topic classification model based on multi-source-domain integrated migration learning. The model is composed of a target domain data module, a tagging module, an integrated learning module for multi-source-domain tag determination and a correct data module. According to a classification method for the text topic classification model based on multi-source-domain integrated migration learning, first, data without class tags is classified through the tagging module; and next, data with tags is determined, the data correctly classified through three classifiers is selected and added into the target domain data module, classification is performed through the three classifiers to obtain data with dummy tags and different types of text topics, one type of text topics is selected to serve as target domain data, other types of text topics are used as source domain data and added into the target domain data, and a Softmax classifier is used to test the correct rate. In this way, the negative migration phenomenon brought by single-source-domain migration is effectively avoided, data composition comes from all aspects of a target domain, and data balance can be better met.

Description

technical field [0001] The invention belongs to the technical field of multi-source domain learning, and relates to a text topic classification model and a classification method based on multi-source domain integration transfer learning. Background technique [0002] Multi-source domain migration learning is a very active application research direction of machine learning. Its purpose is to find highly correlated data in the target domain and multiple source domains, and migrate these highly correlated data in the multi-source domain to the target domain. The domain helps the target domain sample data to train a "good" classifier, but the data samples of different source domains and the data samples of the target domain have different similarities, thus, the migration of multiple source domains will lead to the generation of negative migration phenomenon . Based on the obtained labeled data in other related fields, the correlation between the related field and the research ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06N3/04G06N3/08
CPCG06F16/355G06N3/08G06N3/045
Inventor 杨云李燕
Owner YUNNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products