A Short Text Classification Method Based on Multiple Weakly Supervised Ensemble

A classification method and short text technology, applied in the direction of text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve the problems of unbalanced classification, data sparse, label bottleneck and so on

Active Publication Date: 2021-12-10
湖南董因信息技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In view of this, the present invention is committed to providing a short text classification method based on multiple weak supervision integration, which can solve the problems of label bottleneck, data sparseness and unbalanced classification in short text classification as a whole.
The method of the present invention not only innovatively introduces three sources of weak supervision: keyword matching, regular expressions, and far-supervised clustering into short text annotations for the particularity of short texts; it also proposes a multiple Weakly supervised integration method, which integrates the discrete labels directly output by multiple weak supervisions into probability labels, in order to solve the imbalanced classification problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Short Text Classification Method Based on Multiple Weakly Supervised Ensemble
  • A Short Text Classification Method Based on Multiple Weakly Supervised Ensemble
  • A Short Text Classification Method Based on Multiple Weakly Supervised Ensemble

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0038] Such as figure 1 , a short text classification method based on multiple weakly supervised ensembles, including the following steps:

[0039] Step 1, obtain the original data set and knowledge base, and perform data preprocessing;

[0040] Step 2, using multiple weak supervision methods for knowledge extraction on the preprocessed data;

[0041] Step 3, program the extracted knowledge as a labeling function and use it for data labeling;

[0042] Step 4,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a short text classification method based on multiple weak supervision integration, which includes: obtaining the original data set and knowledge base, and performing data preprocessing; performing knowledge extraction on the preprocessed data; expressing the extracted knowledge as a label Functions, and used for data labeling; label integration through conditionally independent models; training classification models based on fully connected neural networks; evaluating and optimizing the classification models to obtain the optimal model; using the optimal model to classify short text. The short text classification method based on the integration of multiple weak supervision in the present invention: uses the combination of keyword matching, regular expressions and distant supervision clustering to fully express the explicit knowledge and implicit knowledge; the probability label generated by the label integration mechanism , which not only realizes the automatic labeling of unlabeled data, alleviates the data sparsity problem of short texts, but also effectively solves the imbalanced classification problem of short texts.

Description

technical field [0001] The invention belongs to the field of natural language processing, and in particular relates to a short text classification method based on multiple weak supervision integration. Background technique [0002] Under the background of mobile Internet, the development of instant messaging not only promotes the surge of short text, but also makes the research and application of short text classification more and more important. [0003] Supervised machine learning mainly relies on manually labeled data and good feature representation. Good feature expression can be learned automatically with deep learning. However, due to the thousands of parameters that need to be learned, supervised deep learning is still inseparable from a large amount of labeled data. In fact, the training data for supervised learning is still dominated by manual annotation. Manual labeling is very expensive and time-consuming. Furthermore, as real-world applications continue to ch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/279G06F40/289
CPCG06F16/35
Inventor 修保新
Owner 湖南董因信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products