Embedded system on chip for accelerating Cholesky decomposition

A system-on-chip and embedded technology, applied in the direction of complex mathematical operations, etc., can solve problems such as long calculation time, achieve shortened operation time, and simple read and write control

Active Publication Date: 2015-07-22
HARBIN INST OF TECH
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention solves the problem that the current hardware acceleration system for sol

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Embedded system on chip for accelerating Cholesky decomposition
  • Embedded system on chip for accelerating Cholesky decomposition
  • Embedded system on chip for accelerating Cholesky decomposition

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0014] Specific implementation mode one: combine figure 1 , figure 2 Illustrate this embodiment mode, a kind of embedded system on chip that realizes Cholesky decomposition of acceleration, mainly comprises following module:

[0015] ARM processor module, used for task scheduling of the entire computing process;

[0016] DDR control module, used to control data reading and writing of external DDR memory;

[0017] The Programmable Logic module, referred to as the PL module, is used to read the value of the positive definite symmetric matrix from the external storage, perform Cholesky decomposition, obtain the lower triangular matrix and store it in the external storage;

[0018] The AXI bus module is used for information transmission between the PL module and the ARM processor module.

specific Embodiment approach 2

[0019] Specific implementation mode two: the PL module described in this implementation mode includes:

[0020] The Control Logic sub-module is used to receive the control information of the ARM processor module, coordinate the calculation work of the calculation unit and the update unit, and control the data reading and writing work of the internal RAM storage sub-module;

[0021] The DMA sub-module is used to control the data transmission between the PL module and the external memory;

[0022] The Cholesky decomposition sub-module is used for Submatrix-Cholesky decomposition of positive definite symmetric matrices and internal data caching.

[0023] The Submatrix-Cholesky decomposition sequence is shown in Table 1,

[0024] Table 1 Submatrix-Cholesky decomposition sequence table

[0025]

[0026]

[0027] Other steps are the same as in the first embodiment.

specific Embodiment approach 3

[0028] Specific implementation mode three: an embedded system-on-chip that accelerates the realization of Cholesky decomposition described in this implementation mode,

[0029] Described Cholesky decomposition submodule comprises:

[0030] The calculation unit is used for the calculation operation of the column when the positive definite symmetric matrix is ​​decomposed by Submatrix-Cholesky;

[0031] The internal RAM cache unit is used to store the result data calculated by the calculation unit, which is convenient for the update unit to directly call from it;

[0032] The update unit is used to update the columns when the positive definite symmetric matrix is ​​decomposed by Submatrix-Cholesky, and completes the update process of all columns after the calculation column;

[0033] The FIFO unit is used to cache the initial data of the Cholesky decomposition sub-module and the calculation result of the update unit, so as to facilitate the realization of the streaming mode of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embedded system on chip for accelerating Cholesky decomposition, relates to a system for accelerating Cholesky decomposition, and aims to solve the problem of long computing time of a hardware acceleration system which achieves solution of a system of linear equations based on a Cholesky decomposition method in the prior art. The system comprises an ARM processor module for task scheduling of the whole computing process, a DDR control module for controlling the data reading and writing of an external DDR storage, an AXI bus module for information transmission between a programmable logic module and the ARM processor module, and a programmable logic module for performing Cholesky decomposition to a positive definite symmetric matrix. A large amount of parallel updated operations are existent during the solution of coefficients of the system of linear equations of the programmable logic module based on the submatrix-Cholesky decomposition, more regular access and storage of data can be realized, the reading-writing control is simpler, the computing time is shortened greatly; the system is applicable for accelerating the solution of the system of linear equations.

Description

technical field [0001] The invention relates to a system for accelerating the realization of Cholesky decomposition. Background technique [0002] The machine learning algorithm LS-SVM has been widely used in the field of embedded high-performance computing. Its calculation process includes the solution of linear equations. At present, there are many methods for solving linear equations, such as Cholesky decomposition, Gaussian Elimination method, LU decomposition and conjugate gradient method, etc., considering the characteristics of the algorithm itself and the characteristics of accelerated computing on the embedded SoC platform, a solution with a small amount of calculation, low computational complexity, and a large number of parallel calculations is required. method, the Cholesky decomposition method becomes the best choice. [0003] According to the calculation sequence and programming method of Cholesky decomposition of linear equations, the calculation process can b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/11
Inventor 王少军王晓璐马宁刘大同彭宇彭喜元
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products