Spark-based extreme learning machine parallelization calculation method

An extreme learning machine and computing method technology, applied in the field of parallel computing, can solve the problem of low efficiency of parallelization schemes, and achieve the effects of improving operating efficiency, reducing the number and improving computing efficiency.

Active Publication Date: 2017-03-15
CHINA UNIV OF MINING & TECH
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Purpose of the invention: In order to overcome the problem that the efficiency of the current extreme learning machine parallelization scheme is still not high, the present invention provides a Spark-based extreme learning machine parallelization scheme, which can make the extreme learning machine run at the same efficiency when processing large data. Compared with the existing hadoop-based parallelization scheme, it is greatly improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spark-based extreme learning machine parallelization calculation method
  • Spark-based extreme learning machine parallelization calculation method
  • Spark-based extreme learning machine parallelization calculation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] specific implementation

[0029] Embodiments of the present invention will be further described below in conjunction with the accompanying drawings

[0030] Such as figure 1 Shown, the Spark-based extreme learning machine parallelization calculation method of the present invention, the steps are as follows:

[0031] a. Combine the original feature and category data with specific problems to convert attributes and categories into specific values, and then perform a normalization operation on each attribute. Each sample attribute is used as a row to obtain an attribute variable matrix, and each category is used as a row. Get the category variable matrix;

[0032] b. Randomly generate the input weight matrix ω, the number of rows of the weight matrix is ​​the number of attribute variables of each sample, the number of columns of the weight matrix is ​​the number of hidden layer nodes of the neural network, and the product of the two is obtained by multiplying ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big data processing platform Spark-based extreme learning machine parallelization calculation method which is applicable to being used in the field of machine learning of a big data processing platform Spark. The method comprises the following steps: firstly, saving a sample dataset into a distributed file system row by row according to each sample, and preprocessing the sample set to obtain a characteristic matrix and a classification label vector T; secondly, randomly initializing hidden node parameters, namely a weight matrix omega and a hidden layer deviation vector b, according to the number of sample characteristics and the number of hidden nodes; thirdly, solving a matrix multiplication parallelization solution to obtain a hidden layer output matrix H; finally, performing calculation by using the matrix multiplication parallelization solution and matrix single inverse operation to obtain a unique optimal solution of a weight vector beta. The steps are simple, and the calculation amount is small; the calculation step can be carried out in parallel on multiple computers, so that the calculation efficiency is effectively improved; meanwhile, the fault tolerance is high, and the efficiency for model training of an extreme learning machine algorithm under a big data background is greatly improved.

Description

technical field [0001] The invention relates to a parallel computing method, in particular to a Spark-based extreme learning machine parallel computing method used in the field of big data processing platform Spark machine learning computing. Background technique [0002] Machine learning is one of the most popular research fields at present. In recent years, with the continuous growth of data volume, the efficiency of machine learning has attracted much attention, and the learning efficiency of neural networks needs to be solved urgently. For the extreme learning machine algorithm, because it randomly initializes the hidden node parameters and directly obtains the hidden node output weight through matrix operations, there is no large number of iterative operations in the traditional learning algorithm, which greatly improves the operation speed at the algorithm level. [0003] Since the data to be processed in the extreme learning machine needs to be loaded into the memory ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/16
CPCG06F17/16
Inventor 刘鹏王学奎叶帅赵慧含仰彦妍尹良飞张国鹏丁恩杰
Owner CHINA UNIV OF MINING & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products