The invention discloses a
random forest parallelization
machine studying method for
big data in a Spark cloud service environment. The method comprises the steps that dimension reduction
processing is performed on the high-dimensional
big data through
feature vector importance analysis, and prediction is performed by adopting a weighed voting mode; through a
distributed memory management mechanism and a
cloud computing platform, parallelization of
random forest training process
model building, single decision-making tree splitting process and prediction voting is improved. According to the method, dimension reduction
processing is performed on the high-dimensional
big data through
feature vector importance analysis, prediction is performed by adopting the weighed voting mode, therefore, optimization of the
random forest method is achieved, and the mining effect of the random forest
machine studying method on the complex big data is improved; the random forest parallelization method based on the Spark cloud platform is performed on the basis, so that the operation efficiency of the random forest
machine studying method is improved.