A Distributed Big Data Classification Method Based on Multivariate Decision Tree Model

A classification method and decision tree technology, applied in database model, structured data retrieval, database distribution/replication, etc., can solve problems such as difficulty in ensuring the purity of learning sample sets

Active Publication Date: 2020-07-07
LIAONING TECHNICAL UNIVERSITY
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, the traditional classification mining method is based on a single learning sample set, and the distributed collection characteristics of big data determine that classification learning needs to be distributed, so the corresponding distributed learning strategies and methods need to be studied; secondly, the dynamic flow flow Big data is significantly different from the static data stored in traditional databases. It is impossible to store all the data at one time and then conduct offline mining. It is necessary to explore online real-time collection technology and incremental mining methods that change over time; finally , the traditional classification mining technology has high requirements on the learning sample set, while the classification mining of distributed and streaming big data requires multi-node and multi-step collaborative processing, it is difficult to guarantee the purity of the learning sample set, so it must be targeted at this type of big data To explore the classification technology with good robustness and performance. Therefore, the classification mining problem in big data oriented to this kind of distributed data collection and data stream aggregation over time requires integrated technology and innovative Theories and methods to solve

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Distributed Big Data Classification Method Based on Multivariate Decision Tree Model
  • A Distributed Big Data Classification Method Based on Multivariate Decision Tree Model
  • A Distributed Big Data Classification Method Based on Multivariate Decision Tree Model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to make the objects and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0035] A distributed big data classification method based on a multivariate decision tree model, such as Figure 7 shown, including:

[0036] Step 1. Local node L z (z=1, 2, ..., R) Use the integrated classifier shared by the central node G to classify and mark the unknown category label samples that arrive randomly online, and mark the category label The samples with the threshold value are stored in the data set D z . When data set D z When the capacity exceeds the preset threshold, it is sent to the central node G, and then the data set D is cleared z , R is the number of local nodes. (in the initial stage, stratified sampling R times ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed big data classification method based on a multi-variable decision-making tree model. The method includes the steps that partial nodes conduct classification on unknown-type label samples randomly received online by utilizing an integrated classification device shared by a central node, and store known-type label samples with reliability higher than a preset threshold value into a data set; when the capacity of the data set exceeds a preset threshold value, the data set is sent to the central node to be emptied; the central node combines the data set sent by each partial node to generate a training sample set in order to train the multi-variable decision-making tree model based on geometric outline similarity, the multi-variable decision-making tree model serves as a basic classification device and is added into the integrated classification device, and the integrated classification device is updated periodically; the integrated classification device is shared to the partial nodes, and the partial nodes utilize the integrated classification device to conduct classification on stream-type big data received online. By applying the multi-variable decision-making tree model based on the geometric outline similarity to the integrated classification device, the classification problem of normalized data type in the distributed stream-type big datais effectively solved.

Description

technical field [0001] The invention relates to the technical field of big data classification, in particular to a distributed big data classification method based on a multivariate decision tree model. Background technique [0002] Classification is one of the important tasks of data mining, and it is also a widely studied problem in related fields such as machine learning, pattern recognition and artificial intelligence. Classification has a wide range of applications in practice, including medical diagnosis, credit evaluation, selection shopping, face recognition, etc. [0003] The rapid development of emerging information technologies and application models such as cloud computing, the Internet of Things, mobile interconnection, and social media has led to a sharp increase in the amount of global data and pushed human society into the era of big data. Big data contains big information, and big information extracts big knowledge. Big knowledge will help users improve ins...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/28G06F16/27
Inventor 张宇
Owner LIAONING TECHNICAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products