High-dimensional multi-label data flow classification method based on online sequence kernel extreme learning machine

A kernel extreme learning machine, multi-label technology, applied in the field of multi-label data stream classification problem, can solve the problems of difficult to detect feature drift and concept drift, difficult to obtain data, and low accuracy of classification algorithms

A kernel extreme learning machine, multi-label technology, applied in the field of multi-label data stream classification problem, can solve the problems of difficult to detect feature drift and concept drift, difficult to obtain data, and low accuracy of classification algorithms

CN112579741AActive Publication Date: 2021-03-30HEFEI UNIV OF TECH

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional multi-label data flow classification method based on online sequence kernel extreme learning machine
  • High-dimensional multi-label data flow classification method based on online sequence kernel extreme learning machine
  • High-dimensional multi-label data flow classification method based on online sequence kernel extreme learning machine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In this example, if figure 1 As shown, a high-dimensional multi-label data stream classification method based on online sequence kernel extreme learning machine can be used to classify network news in browsers, sort out appropriate tags for complicated news, facilitate user retrieval, and provide readers It is widely used to provide prediction work, recommend news for browsers, etc. Specifically, it is carried out according to the following steps:

[0057] Step 1: Construct the BoW model based on the external corpus, and use the sliding window mechanism to divide the multi-label text data stream into data blocks and then vectorize:

[0058] Step 1.1: Given a set of multi-label text data stream D={d 1 , d 2 ,...,d m ,...,d |D|}, m=1, 2, ..., |D|, |D| indicates the total number of texts in the multi-label text data stream D, d m Represents the mth text in the multi-label text data stream D, and has: d m ={W m ,V m}, W m with V m respectively represent the mth te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-label text data stream classification method based on an online sequence kernel extreme learning machine. The method comprises the steps that 1, constructing a BoW model and a sliding window mechanism according to an external corpus to divide multi-label text data streams into data blocks and then vectorize the data blocks; 2, predicting the text data block Dk at the moment k by utilizing the integrated classifier model at the moment k1, and outputting a prediction result; 3, performing feature selection on the text feature set of the text data block Dk to obtain a dimension-reduced text feature set Mk; 4, judging whether concept drift or feature drift occurs or not according to the cosine similarity between the class label spaces of the text data block Dk at the moment k and the text data block Dk1 at the moment k1 and the distribution difference between the feature sets after dimension reduction; 5, constructing an online sequence kernel extreme learning machine by utilizing all texts in the text data block Dk according to the drift detection condition, and updating to an integrated classifier model at the moment k. According to the method, the classification problem of the multi-label text data flow with feature drift and concept drift is solved.

Description

technical field [0001] The invention belongs to the field of multi-label data flow mining in practical applications, and in particular relates to the multi-label data flow classification problem of text data flow, feature drift and concept drift. Background technique [0002] In many practical application fields, such as: e-mail classification, news feed, medical diagnosis, image recognition, etc., a massive, continuous, high-speed, dynamic data-data flow is generated. These data flow data are widely used in social production and life practice, and have great research value. The data stream type data in the real world usually has the following characteristics: (1) The data generation speed is fast and the scale is massive. According to the statistics of Twitter in 2012, the daily average number of Twitter users exceeds 200 million, and the total number of Tweets sent every day reaches 400 million, with an average of more than 5,000 Tweets per second; (2) An instance object ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
30 Mar 2021
Publication
CN112579741A
IPC
G06F16/33; G06F16/35; G06F40/242; G06K9/62; G06N20/20
CPC
G06F16/3344; G06F16/35; G06F40/242; G06N20/20; G06F18/214
Inventors
李培培; 邱士远