Matrix calculation method of distributed large-scale matrix multiplication based on Spark

A matrix multiplication and matrix computing technology, applied in the field of efficient distributed multiplication, can solve the problems of low performance, poor interface, poor scalability, etc., to reduce the cost of learning

Inactive Publication Date: 2016-03-23
NANJING UNIV
View PDF4 Cites 74 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Purpose of the invention: In order to overcome the defect that large-scale distributed matrix multiplication is poorly supported in the existing distributed data processing system, the present invention provides a matrix calculation method based on Spark's distributed large-scale matrix multiplication, which uses Users can perform efficient larg

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Matrix calculation method of distributed large-scale matrix multiplication based on Spark
  • Matrix calculation method of distributed large-scale matrix multiplication based on Spark
  • Matrix calculation method of distributed large-scale matrix multiplication based on Spark

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these embodiments are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention Modifications in equivalent forms all fall within the scope defined by the appended claims of this application.

[0031] The technical scheme of the present invention is mainly composed of two software modules: one is the distributed big data processing system Spark, and the other is the basic linear algebra subroutine library of a stand-alone machine. The distributed big data processing system Spark is an open source system of the Apache Foundation (project homepage http: / / spark.apache.org / ), this software does not belong to the content of the present invention. Basic Linear Alg...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a matrix calculation method of distributed large-scale matrix multiplication based on Spark. The method comprises the following steps of adopting a system framework based on a distributed computation execution engine Spark and single-machine BLAS (Basic Linear Algebra Subprograms); defining an operation interface related to a packaging matrix in a distributed system, reading a matrix file from a distributed file system, and selecting a proper scheme to execute the distributed multiplication for the quantity of resource of a distributed computing environment and the scale of a to-be-processed matrix; if the scales of the two matrixes are small, gathering to the locality to carry out single-machine multiplication; if the scale of one matrix is smaller, broadcasting the matrix execution multiplication; and if the scales of the two matrixes are large, adopting block-based distributed matrix multiplication. As for the latter two conditions, the invention respectively provides two efficient solutions, so that the problems of low performance and the poor expansibility of the existing big data processing platform about the distributed matrix operation are solved.

Description

technical field [0001] The invention relates to the technical field of parallel computing, in particular to a method capable of supporting large-scale distributed matrices for efficient distributed multiplication. Background technique [0002] With the advent of the Internet age, the scale of human data has grown tremendously. In the era of big data, matrix computing is one of the cores of many practical applications, such as social data analysis, web search, advertising computing, and recommendation systems. Especially the neural network learning that has gained attention in the field of deep learning, its core can be realized by multiplying multiple continuous matrices. However, in the face of increasing large-scale data, the traditional stand-alone matrix calculation cannot meet the basic needs due to the limitation of the hardware environment. Therefore, an efficient method that supports large-scale matrix multiplication is more needed. [0003] In recent years, with th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/16
CPCG06F17/16
Inventor 黄宜华顾荣唐云
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products