Parallel depth forest classification method based on information theory improvement

A classification method and information theory technology, applied in the field of parallel deep forest classification, can solve the problem of not considering the redundancy of large data sets and irrelevant features, too many multi-granularity scanning imbalances, etc., and achieve the effect of improving clustering accuracy
CN112686313AActive Publication Date: 2021-04-20北京中科新天科技有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
北京中科新天科技有限公司
Publication Date
2021-04-20

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a parallel depth forest classification method based on information theory improvement. Firstly, the algorithm designs a hybrid dimension reduction strategy based on the information theory, a data set after dimension reduction is obtained, and redundancy and irrelevant feature numbers are effectively reduced; secondly, an improved multi-granularity scanning strategy is provided for scanning samples, it is guaranteed that all features appear in a data subset at the same frequency after scanning, and the influence of multi-granularity scanning imbalance is avoided; and finally, in combination with a MapReduce framework, the parallel training is carried out on a random forest in each layer of cascade structure of the deep forest model. Meanwhile, a sample weighting strategy is proposed, and a sample with a relatively poor evaluation result is selected to enter the next layer of training according to the evaluation of the random forest in cascade on the sample, so that the number of samples in the layer is reduced, and the parallel efficiency of the algorithm is improved. The method is simple in principle and easy to implement, the operation efficiency and the clustering accuracy are remarkably improved, and the method can also provide great help in biology, medicine and astrogeography.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of big data mining, in particular to an improved parallel deep forest classification method based on information theory. Background technique

[0002] In recent years, deep learning technology has developed rapidly with sufficient computing power. It learns human cognition and behavior patterns through training on large amounts of data, thereby partially or completely replacing human repetitive mechanical labor. Today, common deep learning algorithms are based on deep neural networks. As a supervised learning algorithm, deep neural networks can feed back calculation errors through backpropagation during training, which has the characteristics of self-organizing learning. Although the deep neural network has been widely used in various fields due to its powerful learning ability, the training of the model requires a large amount of data for support, and its learning performance is heavily dependent on the adjustment of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More