Unlock instant, AI-driven research and patent intelligence for your innovation.

A Random Forest Data Processing Method Based on Attribute Subspace Weighting

A random forest and data processing technology, applied in the field of data processing, to achieve the effect of improving modeling efficiency

Active Publication Date: 2017-11-24
SHENZHEN INST OF ADVANCED TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0017] In view of this, the purpose of the present invention is to provide a random forest data processing method with attribute subspace weighting, to solve the problem of effectively processing ultra-high-dimensional large data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Random Forest Data Processing Method Based on Attribute Subspace Weighting
  • A Random Forest Data Processing Method Based on Attribute Subspace Weighting
  • A Random Forest Data Processing Method Based on Attribute Subspace Weighting

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to enable those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0045] The invention discloses a random forest data processing method with attribute subspace weighting to solve the problem of effectively processing ultra-high-dimensional big data. Its main parts include:

[0046] 1) When establishing a decision tree node, the method of attribute subspace weighting is used to improve the selection r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a random forest data processing method weighted by attribute subspace. The method includes: S1. Extracting N which is consistent with the number of decision trees to be established by means of sampling with replacement for the data sample set that needs to be trained. sample subsets; S2. Construct a decision tree model without pruning for each sample subset. When constructing the nodes of the decision tree model, use the information gain method to first weight the attributes of all participating nodes, and select the weight The highest M attributes participate in node construction; S3, merge the constructed N decision tree models into a large random forest model. The invention uses information gain for attribute subspace weighting, so that useful information can be extracted, thereby improving classification accuracy.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an attribute subspace weighted random forest data processing method. Background technique [0002] With the continuous development of computers, the Internet and information technology and their widespread use in all walks of life, the various types of data accumulated by people have become larger and more complex. For example, the attribute dimensions of various types of biological information data, Internet text data, digital image data and other data can reach tens of thousands, and the amount of data is still increasing, making it difficult for traditional data mining classification algorithms to cope with ultra-high dimensions. and the challenges of ever-increasing computational load. [0003] Random forest algorithm is an integrated learning method for classification. It uses decision tree as a sub-classifier. Compared with other classification algorithms, it has t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F9/38
CPCG06F18/24323
Inventor 赵鹤黄哲学姜青山吴胤旭陈会
Owner SHENZHEN INST OF ADVANCED TECH