Multi-label text classification calculation method based on ensemble learning

A technology of text classification and calculation method, which is applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc. It can solve the problems of high time complexity, reduce risks, improve training speed, and improve The effect of generalization ability

Pending Publication Date: 2020-04-07
NORTH CHINA ELECTRIC POWER UNIV (BAODING) +3
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the time complexity ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-label text classification calculation method based on ensemble learning
  • Multi-label text classification calculation method based on ensemble learning
  • Multi-label text classification calculation method based on ensemble learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The present invention proposes a multi-label text classification calculation method based on integrated learning, such as figure 1 shown, including:

[0022] Step 1: Preprocess the original data set, segment sentences into individual words, and delete non-keywords;

[0023] Step 2: Use the method of word frequency-inverse text frequency to perform feature extraction and vectorization processing on the text;

[0024] Step 3: Decompose the multi-label learning problem into multiple independent binary classification problems using the binary association method, each binary classification problem corresponds to a label in the label space;

[0025] Step 4: Classify the labels using an ensemble learning algorithm.

[0026] The preprocessing stage is an important task in data set design, and it is crucial to use machine learning methods to preprocess data. Actually, it consists of two subtasks; (1) word segmentation and (2) stopword removal.

[0027] The purpose of word se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of computer text classification, and particularly relates to a multi-label text classification calculation method based on ensemble learning, which comprises the following steps: step 1, preprocessing an original data set, segmenting sentences into independent words, and deleting non-keywords; : step 2, performing feature extraction vectorization processing on the text in a word frequency-inverse text frequency mode; : step 3, decomposing the multi-label learning problem into a plurality of independent binary classification problems by adopting a binary association method, wherein each binary classification problem corresponds to one label in the label space; and : step 4, classifying the labels by adopting an ensemble learning algorithm. The time complexity is reduced, the training speed is improved, the generalization ability of the weak learner is improved, the risk of over-fitting is reduced, and the robustness of the model is improved.

Description

technical field [0001] The invention belongs to the technical field of computer text classification, and in particular relates to a multi-label text classification calculation method based on integrated learning. Background technique [0002] As a form of data analysis and mining, classification technology can extract models that can describe important data sets and use them to predict the category of data objects. According to the different number of sample category labels after classification prediction, the classification problem can be divided into single-label classification problem and multi-label classification problem. The goal of multi-label classification is to predict whether certain labels are associated with an example that is associated with more than one class. [0003] Multi-label learning algorithms can be roughly divided into two schools: one is the method of problem transformation, and the other is the method of algorithm adaptation. The first group of m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 马应龙闫君璐李莉敏张冰陈亮王乔木张大伟王玮郗子月
Owner NORTH CHINA ELECTRIC POWER UNIV (BAODING)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products