Random forest parallelization machine studying method for big data in Spark cloud service environment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A random forest and machine learning technology, applied in the computer field, can solve the problems such as the performance degradation of the classification method and the long time, and achieve the effect of reducing the amount of calculation and complexity, reducing the impact, and improving the classification accuracy.

Pending Publication Date: 2016-05-04

HUNAN UNIV

View PDF0 Cites 51 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] Traditional classification methods can achieve ideal results on low-dimensional small data sets, but when the structure of the data becomes complex, the dimension of the data becomes higher, and the size of the data increases, the performance of the traditional classification method will be obvious. drop

In the face of massive big data, traditional classification methods take a lot of time in the process of modeling and forecasting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0038](1) Aiming at the problem that big data has high-dimensional features, the method of feature importance analysis is used in the training process and prediction process to reduce the dimensionality of high-dimensional data features, which effectively reduces the amount of calculation and complexity of the method. Aiming at the problem of a large amount of noisy data in big data, the weighted voting method is used for data set prediction and voting, which reduces the impact of noisy data on data classification voting results, and improves the classification accuracy of random forest machine learning methods for complex big data.

[0039] Step 1: The feature selection process of the training data during the random forest model training process, the process is as follows figure 1 shown. The specific implementation steps are as follows:

[0040] Step 1.1: Sampling the high-dimensional big data training set with replacement into n training data subsets;

[0041] Step 1.2: Ca...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a random forest parallelization machine studying method for big data in a Spark cloud service environment. The method comprises the steps that dimension reduction processing is performed on the high-dimensional big data through feature vector importance analysis, and prediction is performed by adopting a weighed voting mode; through a distributed memory management mechanism and a cloud computing platform, parallelization of random forest training process model building, single decision-making tree splitting process and prediction voting is improved. According to the method, dimension reduction processing is performed on the high-dimensional big data through feature vector importance analysis, prediction is performed by adopting the weighed voting mode, therefore, optimization of the random forest method is achieved, and the mining effect of the random forest machine studying method on the complex big data is improved; the random forest parallelization method based on the Spark cloud platform is performed on the basis, so that the operation efficiency of the random forest machine studying method is improved.

Description

technical field [0001] The invention belongs to the field of computers, and in particular relates to a large data-oriented random forest parallel machine learning method under a Spark cloud service environment. Background technique [0002] Explanation of terms: [0003] Feature dimensionality reduction: In the process of image or data feature extraction, too many feature dimensions extracted often lead to too complex feature matching and consume system resources. At this time, a low-latitude feature is used to represent a high-latitude feature. Dimensionality reduction. [0004] With the continuous emergence of various new information publishing methods, the rise of technologies such as cloud computing and the Internet of Things, and various sensors all over the world, data is growing and accumulating at an unprecedented rate. The data age has arrived. With the deepening of network applications, the value of big data applications is becoming more and more obvious. Massi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06N99/00

CPCG06F2216/03G06F16/90G06N20/00

Inventor 唐卓陈建国李肯立鲁彬陈俊杰肖锦波

Owner HUNAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Random forest parallelization machine studying method for big data in Spark cloud service environment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology