Method for handling missing values during data stream decision tree classification

A decision tree classification and processing method technology, applied in the field of missing value processing in the data flow decision tree classification, can solve the problems of reduced transmission efficiency, affecting the time performance of the ARC method, and time performance degradation

Inactive Publication Date: 2014-09-10
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the time performance of the ARC method decreases significantly when there are many characteristic attributes of the data samples, and the time performance is an importan

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for handling missing values during data stream decision tree classification
  • Method for handling missing values during data stream decision tree classification
  • Method for handling missing values during data stream decision tree classification

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0035] The specific implementation manners of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0036] The main flow chart of the inventive method is as figure 1 Shown:

[0037] (1) Adaptive selection and establishment of missing processor

[0038] The specific process of adaptively selecting and establishing the missing processor is as follows: figure 2 As shown, the steps are:

[0039]Step 1: Detect attribute X in the current data sample i There are missing values;

[0040] Step 2: Read all samples of the same type as the current data sample in the sliding window W, and calculate the attribute X in the same type of samples i The standard deviation σ(X i );

[0041] Step 3: Preset σ m is the maximum acceptable sample standard deviation, if σ(X i ) does not exceed the threshold σ m , then go to step 4, otherwise go to step 5;

[0042] Step 4: Choose the mean substitution method to establish the missing processo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of data stream mining, and particularly relates to a method for handling missing values during data stream decision tree classification. The method includes reading data samples in data streams and updating sliding windows; updating missing handlers if the missing handlers corresponding to attributes are available when the detected attributes in the current data samples have the missing values, or adaptively selecting and creating missing handlers according to characteristics of data if the missing handlers corresponding to the attributes are not available; supplementing the missing values in the data samples by the aid of the missing handlers to obtain complete data samples, training the complete data samples according to a Hoeffding decision tree classification process and returning data stream decision tree classification results. Compared with existing methods, the method has the advantages that the method is superior in time performance, the classification accuracy of decision tree models can be sufficiently guaranteed, accordingly, the time expenditure can be reduced, the time performance can be improved, the data stream classification handling speeds can be increased, and requirements of actual data stream handling application can be met.

Description

technical field [0001] The invention belongs to the technical field of data stream mining, and in particular relates to a method for processing missing values ​​in data stream decision tree classification. Background technique [0002] With the advent of the era of big data, application systems generate data streams at high speed and continuously. How to mine useful information from data streams has become a hot spot for technicians. Data stream decision tree classification technology is an important research direction in data stream mining. This technology can be applied to many aspects such as network intrusion detection and credit card fraud. The actual data flow will have missing values ​​due to network transmission failures, sensor failures, or manual operation errors. In data stream decision tree classification, missing values ​​in the data stream can have a serious impact on classification accuracy. However, the data stream can only be scanned once during the mining...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/44
Inventor 吕品侯旭珊
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products