A high-dimensional matrix approximate singular value calculation method based on GPU parallel acceleration

By employing a GPU-based parallel acceleration method for high-dimensional matrix approximation singular value computation, leveraging low-rank approximation properties and CUDA library functions, the high computational complexity of traditional singular value decomposition is addressed, achieving efficient computational acceleration and singular value computation with satisfactory accuracy.

CN122240987APending Publication Date: 2026-06-19WEST ANHUI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WEST ANHUI UNIV
Filing Date
2026-03-23
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Traditional singular value decomposition algorithms suffer from high computational complexity and low efficiency when processing large-scale or high-dimensional data matrices, making it difficult to meet real-time processing requirements, and the GPU acceleration effect is not ideal.

Method used

A high-dimensional matrix approximation singular value calculation method based on GPU parallel acceleration is adopted. By randomly initializing column vectors and row vector matrices, Gram-Schmidt orthogonalization and matrix multiplication are performed iteratively. Taking advantage of the low-rank approximation property, only the first r larger singular values ​​are calculated. NVIDIA CUDA library functions are combined to accelerate matrix operations.

Benefits of technology

It significantly reduces computational complexity, decreasing it from the cubic order of traditional singular value decomposition to a linear order related to the matrix dimension, achieving a computational speedup of 3-10 times and a computational accuracy on the order of 0.001, thus meeting real-time processing requirements.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240987A_ABST
    Figure CN122240987A_ABST
Patent Text Reader

Abstract

This invention relates to the field of computer science and technology, and discloses a method for calculating approximate singular values ​​of high-dimensional matrices based on GPU parallel acceleration. This method reduces the computational complexity by utilizing the low-rank approximation property of the data matrix, calculating only the first larger singular value and not the subsequent smaller singular values. The method adopts iterative calculation based on matrix multiplication and vector inner product, which has high parallelizability and achieves a 3-10 times improvement in computational efficiency through GPU parallel library functions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer science and technology, and more specifically, to a method for approximate singular value calculation of high-dimensional matrices based on GPU parallel acceleration. Background Technology

[0002] Singular Value Decomposition (SVD) is a fundamental tool in matrix factorization, widely used in machine learning, data processing, and image processing. However, traditional SVD decomposition requires simultaneously calculating all singular values, singular vectors, and the corresponding matrix factorization results of the matrix, resulting in a computational complexity of O(n log n). This results in extremely low computational efficiency when processing large-scale or high-dimensional data matrices. Especially in practical applications such as face recognition, background modeling, and video processing, the data matrices are often high-dimensional, and traditional SVD decomposition algorithms are difficult to meet the requirements of real-time processing.

[0003] Although existing GPU-accelerated SVD decomposition can utilize the parallel computing capabilities of GPUs for acceleration, the traditional SVD algorithm itself has strong data dependencies, which limits the parallelism of the algorithm. Therefore, the GPU acceleration effect is not ideal, with a speedup ratio of only about 1.3, failing to fully utilize the parallel computing advantages of GPUs. Summary of the Invention

[0004] This invention provides a method for approximate singular value calculation of high-dimensional matrices based on GPU parallel acceleration, which solves the technical problems of high computational complexity and low computational efficiency of traditional singular value decomposition in related technologies.

[0005] This invention discloses a method for approximating singular values ​​of high-dimensional matrices based on GPU parallel acceleration, comprising the following steps: Obtain the high-dimensional real matrix X to be decomposed and the number r of singular values ​​to be calculated, where r is greater than 0; Randomly initialize column vector matrix A and row vector matrix T; iterate through the following operations, looping from 1 to r for iteration variable i: perform matrix multiplication with matrix X using the i-th row of matrix T to generate the initial value of the i-th column vector of matrix A; The generated i-th column vector is orthogonalized with the first i-1 column vectors of matrix A using Gram-Schmidt. This is achieved by successively subtracting the projection component of the column vector onto the direction of the preceding column vectors. The i-th column vector after orthogonalization is multiplied with matrix X to generate the i-th row of matrix T; Calculate the Frobenius norm of the i-th row vector of matrix T, and its value is the i-th approximate singular value of matrix X; After completing the loop iteration, the first r approximate singular values ​​of matrix X and their corresponding left singular vectors are obtained. This method takes advantage of the low-rank approximation property of the data matrix. By only calculating the first r larger singular values ​​and not calculating the subsequent smaller singular values, the computational complexity is reduced from the cubic order of traditional singular value decomposition to an order linearly related to the matrix dimension.

[0006] Furthermore, the Gram-Schmidt orthogonalization process includes iterating the iteration variable j from 1 to i-1, calculating the inner product of the i-th column vector and the j-th column vector of matrix A, subtracting the product of the inner product and the j-th column vector from the i-th column vector, and successively eliminating the projection component of the i-th column vector in the directions of the preceding column vectors in this way.

[0007] Furthermore, the method further includes the step of implementation on the GPU platform, using the cublasSgemm function in the CUDA library provided by NVIDIA to accelerate matrix multiplication operations, and using the cublasDgemv function to accelerate vector-matrix multiplication operations. By calling these efficient GPU library functions, the massive parallel computing capabilities of the GPU are fully utilized.

[0008] Furthermore, when implemented on a GPU platform, the matrix data is transferred from the CPU main memory to the GPU video memory, the cublasSgemm function is called to perform matrix multiplication, and the result is transferred back to the CPU main memory after the calculation is completed. Compared with using only the CPU for calculation, this method can achieve a significant speedup when processing large-scale matrices.

[0009] Furthermore, the random initialization includes filling matrices A and T with random numbers that follow a standard normal distribution to ensure the universality of the initial values ​​for iteration and the stability of the iterative algorithm.

[0010] Furthermore, the matrix X is a large-scale or high-dimensional data matrix, and the method is applied to application scenarios that require matrix decomposition, such as face recognition, background modeling, video processing, and image restoration.

[0011] Furthermore, the calculation accuracy of the approximate singular values ​​can reach the order of 0.001, which is highly consistent with the results of the standard singular value decomposition algorithm. It can be used to replace the standard singular value decomposition algorithm in various application scenarios.

[0012] This invention also discloses a GPU-based parallel acceleration system for approximating singular values ​​of high-dimensional matrices, used to execute the aforementioned method. The system includes: a matrix acquisition module for acquiring the high-dimensional matrix to be decomposed and the number of decompositions; an initialization module for randomly initializing the left and right singular vector matrices; an iterative calculation module for performing cyclic iterative calculations, including a left singular vector calculation unit, an orthogonalization processing unit, a right singular vector calculation unit, and a singular value calculation unit; and a GPU acceleration module for calling corresponding parallel library functions on the GPU platform to accelerate matrix operations.

[0013] The beneficial effects of this invention are as follows: This invention addresses the technical problems of high computational complexity and low computational efficiency in traditional Singular Value Decomposition (SVD) by improving the structure of the algorithm and combining low-rank approximation with fundamental matrix operations. It achieves a significant reduction in computational complexity. Reduce to The computational speedup reaches 3-10 times, and the computational accuracy can reach [missing information]. Order-of-magnitude technical effects. Attached Figure Description

[0014] Figure 1 This is a flowchart of the high-dimensional matrix approximate singular value calculation method based on GPU parallel acceleration of the present invention; Figure 2 This is the original grayscale image of the present invention; Figure 3 This is a singular value curve of the original grayscale image of the present invention; Figure 4 This is a diagram showing the reconstruction results of the first 50 singular values ​​of this invention; Figure 5 This is an experimental effect diagram of the present invention, with dimensions of 512*512; Figure 6 This is a comparison chart of the first 50 singular values ​​obtained by GTSVD in this invention with the SVD decomposition results; Figure 7 This is the original image comparing the reconstruction results of GTSVD and SVD in this invention; Figure 8 This is a diagram showing the SVD reconstruction results of the present invention; Figure 9 This is a diagram showing the GTSVD reconstruction result of the present invention. Detailed Implementation

[0015] The subject matter described herein will now be discussed with reference to exemplary embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and changes may be made to the function and arrangement of the elements discussed without departing from the scope of this specification. Various processes or components may be omitted, substituted, or added as needed in the examples. Furthermore, some features described in the examples may be combined in other examples.

[0016] Example 1 A GPU-based parallel acceleration method for approximating singular values ​​of high-dimensional matrices includes the following steps: Step 100, the principle of algorithm improvement: the matrix data has a low-rank approximation; In practical applications such as face recognition, background modeling, and video processing, most data matrices exhibit low-rank approximations. This means that an approximate image with acceptable error accuracy can be reconstructed using the first few large singular values ​​and their singular vectors. For example... Figure 2 As shown, Figure 2 It is a grayscale image, and its singular value curve is as follows: Figure 3 As shown: the magnitude of the singular values ​​decays rapidly, and the 50th singular value and subsequent singular values ​​are basically close to 0, so they can be ignored.

[0017] Therefore, the first 50 singular values ​​and their corresponding singular vectors can be used to reconstruct the image and apply it to subsequent intelligent processing. The image reconstructed using the first 50 singular values ​​is shown below. Figure 4 As shown.

[0018] Figure 4 The reconstruction error is This indicates that reconstructing the image using only the first 50 singular values ​​has already achieved a very high reconstruction accuracy.

[0019] Therefore, in order to save computation time, only the first few larger singular values ​​of the matrix and their corresponding singular vectors can be calculated for data optimization, identification, recovery and other tasks, which can greatly improve the speed of the algorithm.

[0020] Step 200, GTSVD algorithm derivation: the mathematical principle of calculating the first few large singular values ​​of the matrix; Assumption Given a real matrix, its SVD decomposition form is as follows: (1) in, It is a column orthogonal matrix. It is a column orthogonal matrix. It is a matrix rank and ; It is a diagonal matrix, and its diagonal elements are the matrix. The singular values, i.e.: (2) For ease of description, singular values ​​are arranged in descending order: .

[0021] If the SVD decomposition is further merged, it can be transformed into the following form: (3) in .because It is a diagonal matrix. Matrix number The F-norm of a line is equal to The A singular value.

[0022] Based on the physical meaning of SVD decomposition, the following minimization model is established and solved. The There are 1 singular value and its left singular vector: (4) in, It is a column orthogonal matrix yes The first of the matrix List, yes The first of the matrix OK.

[0023] If known The first 1, 2, ... i-1 singular values ​​and their singular vectors, model (4) can be solved by the following steps: Solve : (5) because The matrix is ​​a column orthogonal matrix, obtained through formula (5). After that, it is also necessary to and The first 1, 2, ... Orthogonalization. The following update uses the Schmidt orthogonalization method: (6) in, Representing vectors The inner product of.

[0024] Solve : (7) Repeat steps (5), (6), and (7) to obtain the matrix. The The singular values ​​and their corresponding left-expression vectors are as follows: (8) (9) In summary, before calculating the matrix... The algorithm steps for the singular values ​​and their left singular vectors are summarized in Table 1.

[0025] Table 1 shows the algorithm for calculating the first t largest singular values ​​of a matrix and their left singular vectors. This algorithm can directly calculate the first t largest singular values ​​of the matrix without calculating the smaller singular values ​​later, which significantly reduces the computational load. Furthermore, GTSVD uses only matrix multiplication and does not require other matrix factorization tools to complete all the calculations. Therefore, it has a high degree of parallelization capability.

[0026] Step 300: GPU program implementation of the GTSVD algorithm; The main steps of the CTSVD algorithm include formulas (5) and (7). As can be seen from these three formulas, the calculation formula of the CTSVD algorithm is very simple: the entire calculation task can be completed by multiplying the relevant matrices.

[0027] Clearly, matrix multiplication is a fundamental operation that can be highly parallelized. NVIDIA provides the efficient matrix multiplication utility function `cublasSgemm`. `cublasSgemm` is a Level-3 parallelized GPU matrix multiplication function with very high parallelism, resulting in a significant speedup.

[0028] Formula (6) of the CTSVD algorithm requires the calculation of vectors. Compared to the past The function calculates the projection coefficients of each vector and performs Cramer orthogonalization. The CUDA library function `cublasDgemv` on the GPU performs this function. This function is also a parallel program executed on the GPU, possessing good parallel computing capabilities.

[0029] For ease of explanation, the main steps of this GPU version of the GTSVD algorithm are shown in Table 2.

[0030] Example 2 This implementation provides a method for approximating singular values ​​of high-dimensional matrices based on GPU parallel acceleration, including the following steps: Step 1: Initialize the matrix to be processed and the singular vector matrix. Obtain the high-dimensional real matrix to be decomposed. ( ), and randomly initialize the column vector matrix. ( ) and row vector matrix ( ).

[0031] It should be noted that the matrix The initialization is to provide the initial left singular vector basis for subsequent iterative calculations, and the matrix... The initialization is to prepare for the subsequent iteration to compute the first right singular vector row. In practice, the initialization can be filled with random numbers that follow a standard normal distribution to ensure the universality of the initial values ​​for iteration.

[0032] Step 2: Iterate through the matrix to calculate the first... There are singular values ​​and their left singular vectors. Let there be an iteration counter. ,right Loop from 1 to In each iteration, perform the following steps: Step 2.1: Calculate the first... The nth left singular vector. Utilizing the current nth... A right singular vector row (representing a matrix) The (rows) and matrices Perform matrix multiplication to generate the first... The initial values ​​of the left singular vectors. The specific calculation formula is: in, Let be the matrix to be decomposed. For matrix No. transpose of a line For the generated first One left singular vector. This formula expresses the matrix... with vector Perform matrix-vector multiplication.

[0033] Step 2.2: For the first Gram-Schmidt orthogonalization is performed on the i-th left singular vector. The Gram-Schmidt orthogonalization algorithm is then applied to the i-th... left singular vectors The algorithm is processed. The input to this algorithm is a vector. and the previously acquired orthogonal vectors (in ), through calculation With each The inner product is eliminated successively. The projection component along the preceding vector direction generates a result similar to the previous one. A new vector, composed of mutually orthogonal vectors, is output. Specifically, for... Loop from 1 to Perform the following operations: in, Representing vectors and The inner product of . After this orthogonalization process, the is obtained. These are orthogonal vectors used in subsequent iterative calculations.

[0034] Step 2.3: Calculate the first... The right singular vector row. Using the orthogonalized nth... left singular vectors With matrix Perform transpose multiplication to generate the first... There are one right singular vector row. The specific calculation formula is: in, For vectors transpose, Given a matrix to be decomposed, this formula expresses the process of decomposing a vector. With matrix Perform vector-matrix multiplication.

[0035] Step 2.4: Calculate the first... There are singular values. Calculate the matrix. No. The Frobenius norm of a row vector is the value of the matrix. The There are approximately singular values. The specific calculation formula is as follows: in, Representing vectors The Frobenius norm, For matrix The Approximate singular values.

[0036] Complete the first After the iteration, the matrix The Columns as a matrix The Approximate left singular vectors, matrix The The Frobenius norm corresponding to the row is the matrix The Approximate singular value. Iterate through the loop until... The matrix can then be obtained. The former A set of approximate singular values ​​and their corresponding left singular vectors.

[0037] The iterative process consisting of steps 2.1 to 2.4 (hereinafter referred to as the CTSVD iterative algorithm) is an unconventional and innovative algorithm. Its core idea is to utilize matrices... It exhibits the property of low-rank approximation. The algorithm improves computational efficiency by employing basic operations such as matrix multiplication and vector inner product to successively calculate the approximation value. Instead of calculating all singular values ​​of the matrix simultaneously, we compute only one singular value and its corresponding left singular vector. Secondly, in practical applications (such as face recognition, background modeling, and video processing), the smaller singular values ​​of most data matrices are close to 0, so only the first few singular values ​​need to be computed. A large singular value is sufficient to meet the application requirements, among which Typically, we take about 10% of the matrix dimension; furthermore, by only calculating the first... By identifying only one singular value and ignoring subsequent smaller singular values, the computational workload can be significantly reduced.

[0038] It should also be noted that this method only involves basic mathematical operations such as matrix multiplication and vector inner product during the calculation process, without relying on other complex matrix decomposition tools (such as QR decomposition, eigenvalue decomposition, etc.). This feature makes the algorithm highly parallelizable.

[0039] In this embodiment of the application, GPU parallel acceleration can be used to further improve the computational efficiency of the algorithm; When implementing the above steps on the GPU platform, the efficient matrix multiplication function `cublasSgemm` from NVIDIA's CUDA library is used to accelerate the matrix multiplication operations in steps 2.1 and 2.3. This function is a Level-3 parallelized matrix multiplication function with very high parallelism and computational throughput. For the Gram-Schmidt orthogonalization process in step 2.2, the `cublasDgemv` function from the CUDA library is used to accelerate vector-matrix multiplication operations. By using these efficient GPU library functions, the massively parallel computing capabilities of the GPU can be fully utilized, significantly accelerating the calculation speed of matrix multiplication and vector inner products.

[0040] Specifically, the matrix , , The data is transferred from CPU main memory to GPU video memory, then the GPU kernel corresponding to the `cublasSgemm` function is called to perform matrix multiplication. Finally, the calculation result is transferred from GPU video memory back to CPU main memory. This GPU acceleration method, compared to using only the CPU for computation, significantly improves performance when processing large-scale high-dimensional matrices (matrix dimensions greater than or equal to...). (When), you can get a speed boost of 3-10 times.

[0041] The following is an example of an application of the present invention, such as Figure 5-9 As shown, the implementation process is as follows: Experimental environment: The hardware platform for this experiment was a DELL laptop with an i9-1290h CPU, 32GB of RAM, and an RTX 3090 graphics card. The software platform used in this experiment was: Matlab 2024b (GTSVD); VS2026 + cuda 12.9 (GTSVD).

[0042] Experimental results: Calculate the first t singular values ​​of the matrix and their error precision: Experimental results based on over 1000 real-world image datasets demonstrate that the GTSVD algorithm can accurately calculate the first t singular values ​​of a matrix and reconstruct the image. For example... Figure 5 As shown.

[0043] Let t=50, calculate the first 50 singular values ​​of graph A using GTSVD, and compare the result with the singular values ​​obtained from SVD decomposition. The result is as follows: Figure 6 As shown.

[0044] from Figure 6 It can be seen that the first 50 singular values ​​calculated by GTSVD are highly consistent with the first 50 singular values ​​obtained by SVD decomposition. This demonstrates that the GTSVD algorithm can accurately calculate the first few larger singular values ​​of the matrix.

[0045] Furthermore, the GTSVD algorithm can also reconstruct an approximate image of the original image using left singular vectors and singular values. Figure 7-9 Approximate images reconstructed by SVD and GTSVD using their respective first 50 singular values ​​are presented. Figure 8 To reconstruct an approximate image using the first 50 singular values ​​of SVD decomposition, Figure 9 To reconstruct an approximate image using the first 50 singular values ​​of GTSVD decomposition, Figure 8 and Figure 9 The reconstruction error is only .

[0046] from Figure 7-9As can be seen, the reconstruction results of SVD are basically consistent with those of GTSVD, and the differences between them and the original image are negligible. This demonstrates that the convergence accuracy of the GTSVD algorithm meets the requirements.

[0047] Speed ​​comparison between GTSVD and SVD algorithms right Figure 6 By magnifying the images, we obtained images in different dimensions. Then, we compared the GPU-accelerated GTSVD algorithm with the SVD algorithm in MATLAB. The main results are shown in Table 3. Table 3. Comparison of CPU time for GTSVD, SVD, and MATLAB GPU SVD algorithms. Because the image matrix has a low-rank eigenstructure, GTSVD only calculates the first 10% of the larger singular values ​​of the matrix, i.e., t = m * 10%.

[0048] The experimental results in Table 3 show that the GTSVD algorithm proposed in this invention is faster than the SVD algorithm in MATLAB. The speedup effect is very significant when the dimension is greater than or equal to 2000*2000: the speedup ratio is between 3 and 10. Compared with the SVD version built into MATLAB GPU, the speedup effect of GTSVD is still better: the former only has a slight speedup capability when the matrix dimension is greater than 5000*5000 (its speedup ratio is about 1.6).

[0049] It is understood that data preprocessing methods known to those skilled in the art include data cleaning, data transformation, and data reduction. Data transformation includes type conversion and normalization and standardization. Although the dimensions and types of data were omitted in the description of the preceding embodiments, data preprocessing is a technical knowledge known to those skilled in the art and a prerequisite step in data processing. Therefore, the previously described well-known data preprocessing steps were not described independently.

[0050] The embodiments of the present invention have been described above. However, the embodiments are not limited to the specific implementation methods described above. The specific implementation methods described above are merely illustrative and not restrictive. Those skilled in the art can make more equivalent embodiments under the guidance of the present embodiments, and all of them are within the protection scope of the present embodiments.

Claims

1. A method for approximate singular value calculation of high-dimensional matrices based on GPU parallel acceleration, characterized in that, Includes the following steps: Obtain the high-dimensional real matrix X to be decomposed and the number r of singular values ​​to be calculated, where r is greater than 0; Randomly initialize column vector matrix A and row vector matrix T; The following operations are performed iteratively, looping through the iteration variable i from 1 to r: Multiply the i-th row of matrix T with matrix X to generate the initial value of the i-th column vector of matrix A; perform Gram-Schmidt orthogonalization on the generated i-th column vector and the first i-1 column vectors of matrix A by successively subtracting the projection component of the column vector onto the directions of the preceding column vectors to achieve mutual orthogonality between the column vectors; multiply the orthogonalized i-th column vector with matrix X to generate the i-th row of matrix T; calculate the Frobenius norm of the i-th row vector of matrix T, whose value is the i-th approximate singular value of matrix X. After completing the loop iteration, the first r approximate singular values ​​of matrix X and their corresponding left singular vectors are obtained. This method takes advantage of the low-rank approximation property of the data matrix. By only calculating the first r larger singular values ​​and not calculating the subsequent smaller singular values, the computational complexity is reduced from the cubic order of traditional singular value decomposition to an order linearly related to the matrix dimension.

2. The method according to claim 1, characterized in that, The Gram-Schmidt orthogonalization process includes iterating the iteration variable j from 1 to i-1, calculating the inner product of the i-th column vector and the j-th column vector of matrix A, and subtracting the product of the inner product and the j-th column vector from the i-th column vector. In this way, the projection components of the i-th column vector in the directions of the preceding column vectors are eliminated one by one.

3. The method according to claim 1, characterized in that, The method further includes steps implemented on the GPU platform, using the cublasSgemm function from the CUDA library provided by NVIDIA to accelerate matrix multiplication operations and the cublasDgemv function to accelerate vector-matrix multiplication operations. By calling these efficient GPU library functions, the massive parallel computing capabilities of the GPU are fully utilized.

4. The method according to claim 3, characterized in that, When implemented on a GPU platform, the matrix data is transferred from the CPU main memory to the GPU video memory, the cublasSgemm function is called to perform matrix multiplication, and the result is transferred back to the CPU main memory after the calculation is completed. Compared with using only the CPU for calculation, this method achieves a significant speedup when processing large-scale matrices.

5. The method according to claim 1, characterized in that, The random initialization includes filling matrices A and T with random numbers that follow a standard normal distribution to ensure the universality of the initial values ​​for iteration and the stability of the iterative algorithm.

6. The method according to claim 1, characterized in that, The matrix X is a large-scale or high-dimensional data matrix, and the method is applied to application scenarios that require matrix decomposition, such as face recognition, background modeling, video processing, and image restoration.

7. The method according to claim 1, characterized in that, The calculation accuracy of the approximate singular values ​​can reach the order of 0.001, which is highly consistent with the results of the standard singular value decomposition algorithm. It can be used to replace the standard singular value decomposition algorithm in various application scenarios.

8. A GPU-based parallel acceleration system for high-dimensional matrix approximate singular value computation, used to execute the method described in any one of claims 1 to 7, characterized in that, include: The matrix acquisition module is used to obtain the high-dimensional matrix to be decomposed and the number of decompositions. An initialization module is used to randomly initialize the left and right singular vector matrices. The iterative calculation module is used to perform cyclic iterative calculations, including a left singular vector calculation unit, an orthogonalization processing unit, a right singular vector calculation unit, and a singular value calculation unit; The GPU acceleration module is used to call corresponding parallel library functions on the GPU platform to accelerate matrix operations.