Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

46 results about "Loop unrolling" patented technology

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler.

A GEMM (general matrix-matrix multiplication) high-performance realization method based on a domestic SW 26010 many-core CPU

ActiveCN107168683ASolve the problem that the computing power of slave cores cannot be fully utilizedImprove performanceRegister arrangementsConcurrent instruction executionFunction optimizationAssembly line
The invention provides a GEMM (general matrix-matrix multiplication) high-performance realization method based on a domestic SW 26010 many-core CPU. For a domestic SW many-core processor 26010, based on the platform characteristics of storage structures, memory access, hardware assembly lines and register level communication mechanisms, a matrix partitioning and inter-core data mapping method is optimized and a top-down there-level partitioning parallel block matrix multiplication algorithm is designed; a slave core computing resource data sharing method is designed based on the register level communication mechanisms, and a computing and memory access overlap double buffering strategy is designed by using a master-slave core asynchronous DMA data transmission mechanism; for a single slave core, a loop unrolling strategy and a software assembly line arrangement method are designed; function optimization is achieved by using a highly-efficient register partitioning mode and an SIMD vectoring and multiplication and addition instruction. Compared with a single-core open-source BLAS math library GotoBLAS, the function performance of the high-performance GEMM has an average speed-up ratio of 227. 94 and a highest speed-up ratio of 296.93.
Owner:INST OF SOFTWARE - CHINESE ACAD OF SCI +1

High-speed accurate single-pattern character string matching method

The invention provides a high-speed accurate single-pattern character string matching method, comprising a pretreatment phase and a search phase; wherein, the pretreatment phase comprises three main steps: pretreating patterns, pretreating texts and judging optimal matching action in accordance with matching conditions; the search phase is a process of string matching and comprises three main steps: Scan Loop, Match Loop and subsequent judgment action. In the invention, the following improvements are made on the basis of an SBNDM2 algorithm, one of the top-speed methods when matching is carried out in current corpora of English: reducing the expenditure of index bound detection by introducing an index bound protection mechanism; simplifying the algorithm by the way of modifying the definitions of bitmasks and bit vectors; determining a method for selecting the optimal loop unrolling characters with regard to different pattern lengths and different corpora by expanding the loop unrolling mechanism of SBNDM2 and improving the matching performance of the algorithm aiming at different matching conditions. The method of the invention is a high-speed bit parallel accurate single-pattern string matching method with high performance and broad application range when the pattern length is not more than the machine word-length.
Owner:HARBIN ENG UNIV

Runtime error analytical method based on abstract interpretation and model verification

The invention discloses a runtime error analytical method based on abstract interpretation and model verification. The method includes the following steps that on the basis of the abstract interpretation theory, the program numerical variable value range is analyzed by the adoption of a forward iteration method, the variable value range information is obtained when program points are stable, and the iterative computations of loop nodes are achieved by the way that loop unrolling and delay widening are combined; the variable value range information at the relevant program points needing to be detected is converted to be in an assertion or hypothesis mode to be plugged into a program according to a runtime error type to be analyzed; the assertion or hypothesis programs are converted into a Boolean formula, wherein the Boolean formula comprises limiting conditions and attributes; the correctness of the attributes in the Boolean formula is judged through an SAT verifier, if correct, it shows that relevant runtime errors do not exist, if not correct, it shows that the relevant runtime errors exist, and relevant counter example paths are output. By means of the method, an equilibrium point is acquired between runtime error analysis precision and efficiency.
Owner:中国航天系统科学与工程研究院

Method for optimizing finite difference algorithm in heterogeneous many-core framework

InactiveCN106020773AImplement and optimize parallel computingSolve low computing performanceRegister arrangementsConcurrent instruction executionExtensibilityAnalysis data
The invention belongs to the technical field of high-performance calculation, and relates to a method for optimizing a finite difference algorithm in a heterogeneous many-core framework. The method is used for optimizing the finite difference algorithm in a many-core accelerator (MIC) and multi-core general processor (CPU)-based hybrid heterogeneous high-performance computer system by using three progressive optimization methods. The method mainly comprises a basic optimization method, a parallel optimization method and a heterogeneous collaborative optimization method. The method disclosed in the invention has the beneficial effects as follows: the three progressive optimization methods are used for solving the problems of low calculation performance and bad parallel effect caused by leap-type access and parallel execution lack when converting the finite difference algorithm from a many-core system to a heterogeneous many-core; the method is an optimization method with high efficiency and expandability, and can be used for weakening the calculation strength and clearing obstacles for vectorization through basic optimization methods such as branch elimination, loop unrolling and invariant switching; and the parallel optimization method such as a core algorithm is rewritten by using a vector instruction set through analyzing data dependency and circulating partitioning, and a multi-threading and long-vector mechanism of the many-core processor is fully utilized.
Owner:THE PLA INFORMATION ENG UNIV +2

Circulating-unfolded-structured AES encryption/decryption circuit based on data redundancy real-time error detection mechanism

The invention discloses a circulating-unfolded-structured AES encryption/decryption circuit based on a data redundancy real-time error detection mechanism, and is used for resisting fault injection attacks or used for improving circuit reliability in an extreme application environment. The circuit comprises two parts of an AES encryption/decryption unit and a detecting unit, wherein the AES encryption/decryption unit adopts the circulating-unfolded structure, and is formed by Nk round transformation units and an alternative selector; the detecting unit is composed of Nk comparators. The AES encryption/decryption unit adopts the data redundancy processing technology in the data processing process, utilizes two adjacent round transformation units to perform the same operation on each group of data twice; the comparators in the detecting unit compare the results of the two operations; the AES encryption/decryption unit works normally if the operation results are the same; the AES encryption/decryption unit generates an error if the results are different. Compared with the conventional structural redundancy error detection mechanism, the adoption of data redundancy error detection mechanism can greatly reduce the circuit area.
Owner:NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Complex matrix optimizing method

The invention discloses a complex matrix optimizing method, which is characterized by comprising the steps: firstly calculating the specific unrolling granularity of the godson architecture, carrying out four-by-four loop unrolling to a complex matrix, and selecting a maximum value as the size nb of a partitioning block of a matrix so as to obtain the optimal ideal size of the partitioning block of the matrix on the godson, wherein the maximum value of the size nb of the partitioning block of the matrix is smaller than 52, and the product of 24 and the square of the size nb of the partitioning block of the matrix is smaller than the maximum of 64 kilobyte of a first data cache of a godson processor; reasonably dividing and combing matrixes in a matrix multiplication by utilizing the continuity and the locality of data storage, and reducing the cache access number of the first grade data of the godson; and carrying out the common complex addition and multiplication in the complex matrix operation by utilizing the multiplication of two complexes in the classic complex algorithm so as to reduce the operating scale, so that the calculation performance of the complex matrix multiplication on the godson is enhanced by about 50%, and the operating rate of BLAS (basic linear algebra subprograms) base on the godson 3A is increased by more than 1.5 times.
Owner:UNIV OF SCI & TECH OF CHINA

Data processing method, device and equipment and computer storage medium

The invention provides a data processing method, apparatus and device, and a computer storage medium. The method comprises the steps of obtaining an intermediate representation of a deep learning model; wherein the loop expansion factor is related to information of the intermediate representation during execution of the rear-end hardware equipment and/or equipment information of the rear-end hardware equipment; performing loop expansion on the intermediate representation according to the loop expansion factor to obtain an optimized intermediate representation; and compiling the optimized intermediate representation to obtain a target code which can be executed by the back-end hardware equipment, so that the back-end hardware equipment executes the target code to realize the function of the target code. By adopting the embodiment of the invention, the loop expansion factor can be calculated according to the execution information of the rear-end hardware equipment and/or the equipment information of the rear-end hardware equipment to obtain a more accurate loop expansion factor, and the intermediate representation is subjected to loop expansion through the loop expansion factor, so that instruction scheduling is carried out in a larger range; and the transportability of the intermediate representation is improved.
Owner:PHYTIUM TECH CO LTD

AES Encryption/Decryption Circuit Based on Data Redundancy Real-time Error Detection Mechanism

The invention discloses a circulating-unfolded-structured AES encryption / decryption circuit based on a data redundancy real-time error detection mechanism, and is used for resisting fault injection attacks or used for improving circuit reliability in an extreme application environment. The circuit comprises two parts of an AES encryption / decryption unit and a detecting unit, wherein the AES encryption / decryption unit adopts the circulating-unfolded structure, and is formed by Nk round transformation units and an alternative selector; the detecting unit is composed of Nk comparators. The AES encryption / decryption unit adopts the data redundancy processing technology in the data processing process, utilizes two adjacent round transformation units to perform the same operation on each group of data twice; the comparators in the detecting unit compare the results of the two operations; the AES encryption / decryption unit works normally if the operation results are the same; the AES encryption / decryption unit generates an error if the results are different. Compared with the conventional structural redundancy error detection mechanism, the adoption of data redundancy error detection mechanism can greatly reduce the circuit area.
Owner:NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Complex matrix optimizing method

The invention discloses a complex matrix optimizing method, which is characterized by comprising the steps: firstly calculating the specific unrolling granularity of the godson architecture, carrying out four-by-four loop unrolling to a complex matrix, and selecting a maximum value as the size nb of a partitioning block of a matrix so as to obtain the optimal ideal size of the partitioning block of the matrix on the godson, wherein the maximum value of the size nb of the partitioning block of the matrix is smaller than 52, and the product of 24 and the square of the size nb of the partitioning block of the matrix is smaller than the maximum of 64 kilobyte of a first data cache of a godson processor; reasonably dividing and combing matrixes in a matrix multiplication by utilizing the continuity and the locality of data storage, and reducing the cache access number of the first grade data of the godson; and carrying out the common complex addition and multiplication in the complex matrix operation by utilizing the multiplication of two complexes in the classic complex algorithm so as to reduce the operating scale, so that the calculation performance of the complex matrix multiplication on the godson is enhanced by about 50%, and the operating rate of BLAS (basic linear algebra subprograms) base on the godson 3A is increased by more than 1.5 times.
Owner:UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products