Parallel kraut decomposition method for ultra-large-scale matrix multi-core based on thread building blocks

A thread structure, super-large-scale technology, applied in the direction of program control devices, complex mathematical operations, etc., can solve problems such as high computer performance requirements, low real-time algorithm, unsuitable applications, etc., to achieve wide application prospects and great practicality value, wide-ranging effects

Inactive Publication Date: 2014-10-22
TONGJI UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for large-scale or ultra-large-scale matrices, Gauss-Jordan elimination method and traditional serial LU decomposition method often require too much calculation, which requires high performance of the computer, and the real-time performance of the algorithm is not high. Not suitable for applications with strict requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel kraut decomposition method for ultra-large-scale matrix multi-core based on thread building blocks
  • Parallel kraut decomposition method for ultra-large-scale matrix multi-core based on thread building blocks
  • Parallel kraut decomposition method for ultra-large-scale matrix multi-core based on thread building blocks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0022] TBB defines the concept of tasks. When initializing TBB task scheduling, the task scheduler object task_scheduler_init implements multi-task allocation and parallel computing, and supports multi-thread division. When calling the template class for parallel computing, the value range of the loop processing and the task granularity parameters are specified by the template class parameters. The task granularity parameter determines the granularity of task division. If the granularity is too large, the operating efficiency cannot be fully improved; if the granularity is too small, the overhead caused by excessive parallel task allocation will reduce the operating efficiency. If the appropriate task granularity cannot be obtained, the automatic allocation function auto_partitioner() provided by TBB can be used to help users set appropriate task granularity parameters.

[0023] Taking the maximum principal element template class designed by parallel_reduce as an example, the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multinuclear parallel crout decomposition method for an ultra-large scale matrix based on a TBB (Treading Building Block), which comprises the following steps: 1) rewriting a parallel computing part according to a traditional Crout method into a standard part meeting TBB requirement; 2) setting a question initial value; 3) entering into line circulation of the Crout method; 4) calling a parallel_reduce parallel module, storing the maximum pivot element in each line into a temporary vector, and meanwhile, calculating and storing a scale factor; 5) entering into row circulation of the Crout method; 6) confirming a new pivot element and the scale factor, modifying TINY, dividing each line by the pivot element; and 7) finishing the decomposition of the ultra-large scale matrix, thereby obtaining a row and line arranging sequence changed due to a local pivot element method. Compared with the prior art, the multinuclear parallel crout decomposition method provided by the invention has the advantages that the running efficiency of matrix LU decomposition is greatly increased, the method is performed on different platforms, the method is extendable, the application is wide, and the like.

Description

technical field [0001] The invention relates to an ultra-large-scale matrix decomposition method, in particular to a multi-core parallel Crout decomposition method of an ultra-large-scale matrix based on a thread building block. Background technique [0002] The Threading Building Blocks (TBB) just promoted by Intel is a multi-threaded parallel programming model based on C++, which is used to support parallel computing of multi-core processors. It has a mature data structure, supports scalable thread nesting parallelism, and supports Scalable memory allocation, support for multiple system platforms, and support for multiple compilers. TBB's programming model is to use templates as a parallel iteration model. This makes it easy for programmers to deal with issues such as synchronization, load balancing, and cache optimization, and to easily implement parallel programs that are automatically scheduled, making full use of the CPU's multi-core computing capabilities and other s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/44G06F17/16
Inventor 马健张丽岩李克平孙剑
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products