Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

High-dimensional multi-label data flow classification method based on online sequence kernel extreme learning machine

A kernel extreme learning machine, multi-label technology, applied in the field of multi-label data stream classification problem, can solve the problems of difficult to detect feature drift and concept drift, difficult to obtain data, and low accuracy of classification algorithms

Active Publication Date: 2021-03-30
HEFEI UNIV OF TECH
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] One of the challenges: the existing multi-label classification algorithm and multi-label feature dimensionality reduction algorithm are both batch processing algorithms with high time complexity
However, the text data stream in the practical application field usually comes continuously, and it is difficult to obtain all the data at one time; and it does not take into account the high-dimensional and dynamic features of the newly arrived text data stream. Therefore, the existing multi-label classification and feature reduction Dimensional algorithms are difficult to be directly used to process massive high-dimensional multi-labeled text data streams
[0004] Challenge 2: It is difficult to detect the feature drift and concept drift caused by the hidden features and label distribution changes in the text data stream, which will easily lead to low accuracy of the classification algorithm; the existing multi-label classification algorithm is also difficult to adapt to the text The characteristics of the data stream and the problem of concept drift have achieved better classification results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional multi-label data flow classification method based on online sequence kernel extreme learning machine
  • High-dimensional multi-label data flow classification method based on online sequence kernel extreme learning machine
  • High-dimensional multi-label data flow classification method based on online sequence kernel extreme learning machine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In this example, if figure 1 As shown, a high-dimensional multi-label data stream classification method based on online sequence kernel extreme learning machine can be used to classify network news in browsers, sort out appropriate tags for complicated news, facilitate user retrieval, and provide readers It is widely used to provide prediction work, recommend news for browsers, etc. Specifically, it is carried out according to the following steps:

[0057] Step 1: Construct the BoW model based on the external corpus, and use the sliding window mechanism to divide the multi-label text data stream into data blocks and then vectorize:

[0058] Step 1.1: Given a set of multi-label text data stream D={d 1 , d 2 ,...,d m ,...,d |D|}, m=1, 2, ..., |D|, |D| indicates the total number of texts in the multi-label text data stream D, d m Represents the mth text in the multi-label text data stream D, and has: d m ={W m ,V m}, W m with V m respectively represent the mth te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-label text data stream classification method based on an online sequence kernel extreme learning machine. The method comprises the steps that 1, constructing a BoW model and a sliding window mechanism according to an external corpus to divide multi-label text data streams into data blocks and then vectorize the data blocks; 2, predicting the text data block Dk at the moment k by utilizing the integrated classifier model at the moment k1, and outputting a prediction result; 3, performing feature selection on the text feature set of the text data block Dk to obtain a dimension-reduced text feature set Mk; 4, judging whether concept drift or feature drift occurs or not according to the cosine similarity between the class label spaces of the text data block Dk at the moment k and the text data block Dk1 at the moment k1 and the distribution difference between the feature sets after dimension reduction; 5, constructing an online sequence kernel extreme learning machine by utilizing all texts in the text data block Dk according to the drift detection condition, and updating to an integrated classifier model at the moment k. According to the method, the classification problem of the multi-label text data flow with feature drift and concept drift is solved.

Description

technical field [0001] The invention belongs to the field of multi-label data flow mining in practical applications, and in particular relates to the multi-label data flow classification problem of text data flow, feature drift and concept drift. Background technique [0002] In many practical application fields, such as: e-mail classification, news feed, medical diagnosis, image recognition, etc., a massive, continuous, high-speed, dynamic data-data flow is generated. These data flow data are widely used in social production and life practice, and have great research value. The data stream type data in the real world usually has the following characteristics: (1) The data generation speed is fast and the scale is massive. According to the statistics of Twitter in 2012, the daily average number of Twitter users exceeds 200 million, and the total number of Tweets sent every day reaches 400 million, with an average of more than 5,000 Tweets per second; (2) An instance object ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/35G06F40/242G06K9/62G06N20/20
CPCG06F16/3344G06F16/35G06F40/242G06N20/20G06F18/214
Inventor 李培培邱士远张海翔胡学钢
Owner HEFEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products