Flink based large-scale matrix parallelization computing method

A computing method and matrix computing technology, applied in the field of distributed computing, can solve the problems of lack of large-scale distributed matrix computing library, difficult to master, etc., to achieve the effect of ensuring transparency and ease of use, reducing additional overhead, and improving efficiency

Inactive Publication Date: 2016-05-25
NANJING UNIV
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the parallel operation operators provided by standard Flink are relatively low-level, not easy for high-level algorithm analysts to master, and the usage method is not as clear and concise as the interface of linear algebra such as matrix
On the other hand, there is currently no large-scale distributed matrix computing library based on Flink design

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Flink based large-scale matrix parallelization computing method
  • Flink based large-scale matrix parallelization computing method
  • Flink based large-scale matrix parallelization computing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention is a large-scale matrix parallelization calculation method based on Flink, which provides a series of matrix operation interfaces including matrix addition, subtraction, point multiplication, point division, matrix multiplication, matrix inversion, etc. Use BLAS to speed up the efficiency of matrix operations. The system framework of the present invention is as figure 1 shown.

[0024] The present invention uses two data structures to describe the matrix: row vector matrix and block matrix. The row vector matrix describes the most common and intuitive representation of the matrix. It is composed of a set of tuples consisting of the row number of the matrix and the row vector corresponding to the row number. We read from or write to the file. All of them are matrices of this form. A block matrix is ​​a set of matrix blocks that have been divided into blocks and the corresponding block numbers, because in some operations on the matrix, it can ofte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Flink based large-scale matrix parallelization computing method. The method mainly comprises the following steps: storing large-scale matrix data by utilizing FlinkDataSet and accelerating matrix calculation by adopting BLAS during matrix calculation in a single computer; designing and realizing a series of matrix operations of matrix addition, subtraction and the like; and designing a parallelization scheme and designing three optimization means in combination with Flink and algorithm characteristics for improving the performance of matrix multiplication operation of different shapes, wherein the matrix multiplication includes square block division mode based matrix block multiplication, CARMA division mode based matrix block multiplication and broadcast mode based matrix block multiplication. According to the method, the problem that conventional large-scale matrix calculation in a single computer has high overhead and even cannot be executed is solved; and the method has very high expandability.

Description

technical field [0001] The invention relates to the field of distributed computing, in particular to a method for designing and implementing a Flink-based large-scale matrix computing library. Background technique [0002] Matrix computing plays an important role in many industrial and academic fields, such as large-scale numerical analysis, data mining, physical computing, and image rendering. For a long time, people have been studying many algorithms such as efficient matrix multiplication and inversion, in order to improve the performance of applications built on matrix calculations. In today's era of "big data", with the rapid growth of the amount of data, the scale of the matrix often becomes so large that traditional single computers cannot store and calculate it. [0003] Matrix operations include matrix addition, subtraction, multiplication, inversion and many other operations. Calculations such as addition and subtraction of matrices are relatively simple, and are...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/16
Inventor 黄宜华顾荣张海鹏
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products