Streaming data classification method based on decision tree

A classification method and decision tree technology, applied in the field of data classification, can solve problems such as large space and resource consumption, cumbersome, affecting the efficiency of stream data processing, and achieve the effect of improving efficiency and accuracy.

Inactive Publication Date: 2019-07-12
NORTHEASTERN UNIV
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In fact, this assumption is difficult to hold in many practical applications; in addition, as time changes, the underlying concepts in the data stream will change (also known as concept drift)
Many detection algorithms are complex and cumbersome, and consume a lot of space and resources; and in the data stream processing algorithm based on the sliding window, the size of the sliding window is fixed, or only changes with the drift of the concept, and has no effect on the flow characteristics of the data stream itself. Take into account that when the data flow rate is fast or slow, it cannot be processed immediately, which affects the efficiency of streaming data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Streaming data classification method based on decision tree
  • Streaming data classification method based on decision tree
  • Streaming data classification method based on decision tree

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0027] In this embodiment, under the development environment of Eclipse and Weka, the synthetic data formed on the Weka development tool is simulated.

[0028]Decision tree is a classic stream data classification model. When establishing a classification model, a decision tree is used to establish a base classifier. Through labeled training data, an integrated classifier composed of multiple decision trees is established. Using the current classification The device classifies the incoming stream data, forms the initial window of the data to be classified according to the size of the sliding window, uses the current classification model to classify the data in the window, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a streaming data classification method based on decision tree, and relates to the technical field of data classification. The method comprises the following steps: step 1, constructing a classifier; step 2, classifying the to-be-classified data according to the initial integrated classification model to obtain a classification result set; when the data size in the data container Wintmp meets the sliding window size, updating the current integrated classification model; step 3, observing the distribution state of data in the classification result set in the window, and taking the state as a standard for judging whether the concept drift occurs or not to finish the detection of the concept drift; step 4, acquiring historical data, and counting the increase and decreaserule of the data volume within one day; and obtaining the data volume in a preset time period according to the change rule of the data volume; and step 5, according to a concept drift detection result and a preset data size, carrying out expansion or reduction operation on the data window. According to the method, the data classification accuracy is improved, the data can be timely processed, andthe data classification efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of data classification, in particular to a flow data classification method based on a decision tree. Background technique [0002] With the rise and rapid development of the Internet, sensors, and even the Internet of Things, a large amount of streaming data has been generated. These streaming data have attracted much attention because of their high research value and commercial value. This kind of data needs to be incrementally processed and analyzed according to the time series of the data within the sliding time window, including classification, mining association rules, etc., to mine useful information from it, and then guide people to make scientific decisions. [0003] Streaming data has the characteristics of real-time generation, fast arrival speed, large data volume, and difficulty in repeated acquisition. If traditional classification mining models and algorithms are still used for processing, a la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/906
CPCG06F16/906
Inventor 张莉马晶莹杨广明
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products