Parallel depth forest classification method based on information theory improvement

A classification method and information theory technology, applied in the field of parallel deep forest classification, can solve the problem of not considering the redundancy of large data sets and irrelevant features, too many multi-granularity scanning imbalances, etc., and achieve the effect of improving clustering accuracy

Active Publication Date: 2021-04-20
北京中科新天科技有限公司
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the algorithm still has the following three deficiencies: the algorithm does not take into account the problems of redundant and irrelevant features in large data sets and the imbalance of multi-granularity scanning, and the parallelization efficiency of the algorithm can be further improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel depth forest classification method based on information theory improvement
  • Parallel depth forest classification method based on information theory improvement
  • Parallel depth forest classification method based on information theory improvement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0118] In this embodiment, medical images are used to further illustrate the present invention. Parallel deep forest classification of medical images can quickly eliminate irrelevant features, multi-layer training samples, and form a refined classification model; it is beneficial to reduce the learning cost of doctors and quickly improve The doctor's experience and practical level, as well as the sharing of medical pressure and the overall improvement of medical level.

[0119] Represents n samples in the d-dimensional feature space of the original medical image dataset DB, Indicates the medical image label vector corresponding to the medical image feature matrix X.

[0120] S1. First, use the default file block strategy in Hadoop to divide the feature space of the original medical image data set into data blocks of the same size. Then, the data block is used as input data. According to definition 1, the Mapper node calls the Map function The information gain of each featu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a parallel depth forest classification method based on information theory improvement. Firstly, the algorithm designs a hybrid dimension reduction strategy based on the information theory, a data set after dimension reduction is obtained, and redundancy and irrelevant feature numbers are effectively reduced; secondly, an improved multi-granularity scanning strategy is provided for scanning samples, it is guaranteed that all features appear in a data subset at the same frequency after scanning, and the influence of multi-granularity scanning imbalance is avoided; and finally, in combination with a MapReduce framework, the parallel training is carried out on a random forest in each layer of cascade structure of the deep forest model. Meanwhile, a sample weighting strategy is proposed, and a sample with a relatively poor evaluation result is selected to enter the next layer of training according to the evaluation of the random forest in cascade on the sample, so that the number of samples in the layer is reduced, and the parallel efficiency of the algorithm is improved. The method is simple in principle and easy to implement, the operation efficiency and the clustering accuracy are remarkably improved, and the method can also provide great help in biology, medicine and astrogeography.

Description

technical field [0001] The invention relates to the field of big data mining, in particular to an improved parallel deep forest classification method based on information theory. Background technique [0002] In recent years, deep learning technology has developed rapidly with sufficient computing power. It learns human cognition and behavior patterns through training on large amounts of data, thereby partially or completely replacing human repetitive mechanical labor. Today, common deep learning algorithms are based on deep neural networks. As a supervised learning algorithm, deep neural networks can feed back calculation errors through backpropagation during training, which has the characteristics of self-organizing learning. Although the deep neural network has been widely used in various fields due to its powerful learning ability, the training of the model requires a large amount of data for support, and its learning performance is heavily dependent on the adjustment of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N20/00G16H30/00
Inventor 毛伊敏耿俊豪
Owner 北京中科新天科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products