Cloud platform decision forest classification method based on discrete weak correlation

A classification method and cloud platform technology, applied in the field of cloud computing, can solve problems such as unsatisfactory space efficiency and classification accuracy, reduce time and space overhead, improve quality, and enhance the ability to resist data noise.

Active Publication Date: 2015-06-03
武汉理工数字传播工程有限公司
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But the space efficiency and classif

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cloud platform decision forest classification method based on discrete weak correlation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0019] A decision forest classification method for cloud platform based on discrete weak correlation ( figure 1 ), the method includes the following steps:

[0020] Step S1, generating a description file that makes the cloud platform decision forest optimal, and the description file includes the optimal total number of decision trees and the new dataset dataset of each decision tree;

[0021] The optimal total number of decision trees is obtained by the following method: multiply the number of Data_Node nodes of the Hadoop cloud platform by the number of Reduce tasks set uniformly by each node, and divide the obtained decision by 2 times the square root of the product obtained by m. The optimal total number of decision trees in the forest enables each Reduce task to calculate the entropy of an attribute independently after reduction; where m is the value...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cloud platform decision forest classification method based on discrete weak correlation, which is used for weak correlation of a decision forest classification method of a cloud platform. The method comprises the steps of selecting a random sampling attribute element group according to correlation degree of data attributes, updating the continuous attribute of the probability discretion of the attribute group in a gain manner, and solving a maximum gain attribute; and finally establishing a cloud platform decision forest by virtue of an acquired classification attribute sequence. When a great amount data is processed, the time and space expenditure for establishing the cloud platform decision forest can be reduced, the capability and stability for resisting the data noise can be improved, and the classification prediction speed and the classification quality can be improved.

Description

technical field [0001] The invention relates to the field of cloud computing, in particular to a cloud platform decision forest classification method based on discrete weak correlation. Background technique [0002] Random splitting technology builds multiple decision trees, and the final prediction result is obtained through voting. Random forest is a classifier that is integrated by many decision trees. If a decision tree is regarded as an expert in a classification task, random forest is that many experts work together to classify a certain task. [0003] At present, with the advent of the era of big data, the scale of data continues to increase and the attribute dimension of the data continues to increase. The traditional random forest classification method cannot effectively handle massive scale data, and cannot complete classification prediction efficiently and quickly. Therefore, for massive and high-dimensional data, many scholars have proposed distributed random fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 袁景凌陈旻骋刘永坚杨光
Owner 武汉理工数字传播工程有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products