Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

33 results about "Sparse matrix-vector multiplication" patented technology

Sparse matrix-vector multiplication (SpMV) of the form is a widely used computational kernel existing in many scientific applications. The input matrix is sparse. The input vector and the output vector are dense. In case of repeated operation involving the same input matrix but possibly changing numerical values of its elements, can be preprocessed to reduce both the parallel and sequential run time of the SpMV kernel.

Sparse matrix storage method CSRL (Compressed Sparse Row with Local Information) and SpMV (Sparse Matrix Vector Multiplication) realization method based on same

The invention discloses a sparse matrix storage method CSRL (Compressed Sparse Row with Local Information) and an SpMV (Sparse Matrix Vector Multiplication) realization method based on the same. The storage method comprises the following steps of scanning a sparse matrix A in rows and storing each non-zero element value information in an array val sequence; defining a plurality of non-zero elements with continuous row subscripts as a continuous non-zero element section, recording a row subscript of a first element in each continuous non-zero element section by use of an array jas, and recording the number of non-zero elements in each continuous non-zero element section by use of an array jan; and recording an initial index of a first continuous non-zero element section in each row of the sparse matrix A by use of an array ptr. According to the data storage method, row indexes of the non-zero elements are combined and stored, so that the storage space requirement is reduced, the data locality of the sparse matrix is fully excavated, access and calculation can be performed by use of an SIMD (Single Instruction Multiple Data) instruction, the access frequency of an internal storage can be reduced, and the SpMV performance is improved.
Owner:INST OF SOFTWARE - CHINESE ACAD OF SCI

FPGA accelerator of LSTM neural network and acceleration method of FPGA accelerator

The invention provides an FPGA accelerator of an LSTM neural network and an acceleration method of the FPGA accelerator. The accelerator comprises a data distribution unit, an operation unit, a control unit and a storage unit; the operation unit comprises a sparse matrix vector multiplication module, a nonlinear activation function module and an element-by-element multiplication and addition calculation module; the control unit sends a control signal to the data distribution unit, and the data distribution unit reads an input excitation value and a neural network weight parameter from the storage unit and inputs the input excitation value and the neural network weight parameter to the operation unit for operation. The operation resources are uniformly distributed to each operation unit according to the number of the non-zero weight values, so that idling of operation resources is avoided, and the operation performance of the whole network is improved. Meanwhile, the pruned neural network is stored in the form of the sparse network, the weight value of each column is stored in the same address space, the neural network is coded according to the row index, and the operation performance and the data throughput rate are improved under the condition that the precision is guaranteed.
Owner:NANJING UNIV

Prediction method and system for vector multiplication operation time of sparse matrix

The invention relates to a prediction method and system for vector multiplication operation time of a sparse matrix, and the method comprises the following steps: constructing a convolutional neural network which comprises an input layer, a feature processing layer, a data splicing layer and an output layer, wherein the input layer is used for inputting the characteristics of a row characteristicmatrix, the characteristics of a column characteristic matrix and the characteristics of an architecture parameter extension matrix in a sparse matrix; the feature processing layer is used for extracting features in the previous layer; the data splicing layer is used for splicing the extracted features of the row feature matrix, the extracted features of the column feature matrix and the extractedfeatures of the architecture parameter extension matrix; the output layer is used for outputting a prediction result and obtaining a plurality of groups of sparse matrixes with known sparse matrix vector multiplication operation time as sample data, and inputting the sample data into the convolutional neural network to realize training of the convolutional neural network; and inputting a sparse matrix to be classified into the trained convolutional neural network to realize prediction of sparse matrix vector multiplication operation time.
Owner:CHINA INSTITUTE OF ATOMIC ENERGY +1

Many-core architecture-oriented sparse matrix vector multiplication many-core optimization method

InactiveCN112560356AImprove acceleration performanceThe optimization effect is obviousDesign optimisation/simulationComputational scienceSparse matrix vector
The invention discloses a many-core architecture-oriented sparse matrix vector multiplication many-core optimization method. The method comprises the following steps of S1, knowing a sparse matrix A of which the row number is m and the column number is n and a vector x of which the length is n; solving a vector y with the length of m, wherein y = Ax is the dot product of the sparse matrix A and the vector x; the method comprises the following steps: S1, defining the size blkxsize of an x vector block, and partitioning the x vector element according to an x vector element subscript to partitionan x vector; and S2, counting the number of the x vector block corresponding to the column number of each row of non-zero elements in the original sparse matrix, namely the sparse matrix A. Accordingto the blocking information of the x vector, namely the number information of the x vector block where the x vector element obtained by solving in the step S1 is located, thereby counting the numberinformation of the x vector block required by each row of the sparse matrix when the sparse matrix vector is multiplied. According to the method, the overall many-core acceleration performance is improved, the locality of data access is improved, and the optimization effect on unstructured grid CFD application is obvious.
Owner:JIANGNAN INST OF COMPUTING TECH

On-chip super-large-scale power supply network parallel simulation method based on spectrogram rarefaction

The invention discloses a super-large-scale power supply network parallel simulation method and device based on spectrogram rarefaction, and the method comprises the steps: reading an SPICE netlist of a power supply network, and building a Laplacian matrix and a weighted undirected graph corresponding to the SPICE netlist, and a right-end item; running a parallel spectrogram sparsification algorithm on the weighted undirected graph to obtain a sparse sub-graph and a Laplacian matrix corresponding to the sparse sub-graph; decomposing the Laplacian matrix corresponding to the sparse sub-graph by using a region decomposition method to obtain an integral Schel complement matrix; and setting a convergence threshold, solving a power supply network linear equation set which takes a Laplacian matrix corresponding to the SPICE netlist as a coefficient matrix based on an integral Schel complement matrix and a convergence threshold operation precondition conjugate gradient method, and obtaining power supply network simulation results of node voltage and the like. Therefore, the problems that an existing spectrogram rarefaction algorithm is achieved in a serial mode, and even though sparse matrix vector multiplication and vector addition in the PCG iteration process are prone to parallel, the parallel efficiency of the Cholesky decomposition process, the previous substitution process and the back substitution process of a sparse and irregular matrix is low are solved.
Owner:TSINGHUA UNIV

Adaptive sparse matrix vector multiplication strategy selection and optimization method

The invention discloses a self-adaptive sparse matrix vector multiplication strategy selection and optimization method, which is suitable for a GPU (Graphics Processing Unit) architecture, and comprises the following steps: partitioning a to-be-processed matrix according to rows, counting the number of non-zero elements of each matrix sub-block, and if the difference multiple of the number of non-zero elements of each matrix sub-block is higher than a first preset threshold value, determining that the matrix is not processed; if yes, processing is carried out by adopting a self-adaptive CSR-Vector algorithm; counting the row average non-zero element number of the to-be-processed matrix, and if the row average non-zero element number of the matrix is lower than a second preset threshold value, adopting an improved CSR-Stream algorithm for solving; counting the number of non-zero elements of the to-be-processed matrix, and if the number of the non-zero elements of the to-be-processed matrix is larger than a third preset threshold value, adopting a hola algorithm for solving; and if all the conditions are not met, a CSR-Vector algorithm is adopted for solving. According to the method, adaptive efficient SpMV solving for different application problems is realized.
Owner:UNIV OF SCI & TECH BEIJING

Encoder of LDPC code of layered quasi-circulation extended structure

The invention discloses an encoder of an LDPC code with a layered quasi-cyclic extension structure, comprising: an input buffer, a first processing-buffering pipeline stage, a second processing pipeline stage, a third buffering pipeline stage, and a fourth processing-buffering pipeline stage and output stage, according to the characteristic that the parity check matrix H is a quasi-cyclic shift matrix splicing, the pipeline structure of the RU coding method is simplified, the number of pipeline stages is reduced from six to four, and the coding delay is shortened. At the same time, according to the implementation characteristics of the main functional modules, the maximum pipeline delay is reduced and the encoding throughput is improved. Then, according to the characteristics of the quasi-cyclic shift matrix operation, the resource consumption of the encoder ROM was reduced, and the sparse matrix multiplication vector in the RU method was replaced by the quasi-cyclic shift unit matrix multiplication vector, and the quasi-cyclic shift matrix multiplication vector was used instead of Non-sparse matrix-by-vector in RU methods. In order to meet the requirements of variable code length and variable code rate, the ping-pong RAM between stages can reserve a large storage space.
Owner:SHANGHAI JIAOTONG UNIV

Sparse matrix vector multiplication parallel task granularity parameter automatic tuning method and device

The invention belongs to the field of parallel computing and discloses an automatic tuning method and device for sparse matrix vector multiplication parallel task granularity parameters. The method comprises a prediction model construction step, constructing a prediction model through a machine learning method; a statistical characteristic value obtaining step, analyzing the matrix original data file to obtain a statistical characteristic value of the matrix; an optimal task granularity parameter prediction step, inputting the obtained statistical characteristic value into a prediction model,and predicting an optimal parallel task granularity parameter value of the SpMV program when the matrix characteristic value is used as input; and a configuration step of adjusting the system task granularity during parallel running according to the prediction result. The device comprises a prediction model construction module, a statistical characteristic value acquisition module, an optimal taskgranularity parameter prediction module and a configuration module. According to the method, the parallel task granularity of the SpMV in different input matrixes is adaptively selected, so a purposeof improving the load balance and the overall computing performance of the parallel program is achieved.
Owner:NAT UNIV OF DEFENSE TECH

Sparse matrix calculations untilizing ightly tightly coupled memory and gather/scatter engine

A processor for sparse matrix calculation can include an on-chip memory, a cache, a gather / scatter engine and a core. The on-chip memory can be configured to store a first matrix or vector, and the cache can be configured to store a compressed sparse second matrix data structure. The compressed sparse second matrix data structure can include: a value array including non-zero element values of the sparse second matrix, where each entry includes a given number of element values; and a column index array where each entry includes the given number of offsets matching the value array. The gather / scatter engine can be configured to gather element values of the first matrix or vector using the column index array of the sparse second matrix. In a horizontal implementation, the gather / scatter engine can be configured to gather sets of element values from different sub-banks within a same row based on the column index array of the sparse matrix. In a vertical implementation, the gather / scatter engine can be configured to gather sets of element values from different rows based on the column index array of the sparse matrix. In a hybrid horizontal / vertical implementation, the gather / scatter engine can be configured to gather sets of element values from sets of rows and from different sub-banks within the same rows based on the column index array of the sparse matrix. The core can be configured to perform sparse matrix-matrix multiplication or sparse-matrix-vector multiplication using the gathered elements of the first matrix or vector and the value array of the compressed sparse second matrix.
Owner:ALIBABA GRP HLDG LTD

A sparse matrix storage method using compressed sparse rows with local information and an implementation method of spmv based on the method

The invention discloses a sparse matrix storage method CSRL (Compressed Sparse Row with Local Information) and an SpMV (Sparse Matrix Vector Multiplication) realization method based on the same. The storage method comprises the following steps of scanning a sparse matrix A in rows and storing each non-zero element value information in an array val sequence; defining a plurality of non-zero elements with continuous row subscripts as a continuous non-zero element section, recording a row subscript of a first element in each continuous non-zero element section by use of an array jas, and recording the number of non-zero elements in each continuous non-zero element section by use of an array jan; and recording an initial index of a first continuous non-zero element section in each row of the sparse matrix A by use of an array ptr. According to the data storage method, row indexes of the non-zero elements are combined and stored, so that the storage space requirement is reduced, the data locality of the sparse matrix is fully excavated, access and calculation can be performed by use of an SIMD (Single Instruction Multiple Data) instruction, the access frequency of an internal storage can be reduced, and the SpMV performance is improved.
Owner:INST OF SOFTWARE - CHINESE ACAD OF SCI

A sparse matrix storage method on simd many-core processor with multi-level cache

The invention discloses a storage method of a sparse matrix on an SIMD multi-core processor with a multi-level cache. The method includes the steps of firstly, the maximum value a of the number of row nonzero elements in a matrix A and the number b of the nonzero elements which can be calculated at the same time by a processor SIMD unit are acquired, and a minimum value which is larger than a and is the multiple of b is calculated to serve as a temperature row width; secondly, for the matrix A, array Value and Colidx respectively sequentially stores the nonzero element value of each row and line coordinates, and 0 and -1 are respectively supplemented to the tail of each row whose number of the nonzero elements does not reach the temporary row width; thirdly, partitioning according to b lines is performed on Colidx and Value; fourthly, each line block is compressed according to rows, and the rows, with the nonzero elements, in the line block is allowed to centralize on the upper portion of the line block; sixthly, partitioning is performed on the line blocks according to b rows to obtain sub-blocks; all-zero sub-blocks are removed, and the sub-blocks are stored according to rows. The method has the advantages that the sparse matrix is divided into dense sub-blocks, the utilization of the processor SIMD processing unit and a register are increased while the locality of the nonzero elements is kept, and sparse matrix vector multiplication performance is increased.
Owner:UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products