A method for accelerating MD5 message digest algorithm and instruction set processor

By implementing parallel acceleration of the MD5 message digest algorithm using MD5 round function iteration instructions in a RISC architecture, the problems of low acceleration efficiency and poor flexibility in existing technologies are solved, achieving efficient execution of the MD5 message digest algorithm and simplifying software programs.

CN115525341BActive Publication Date: 2026-06-16SHANGHAI HIGH-PERFORMANCE INTEGRATED CIRCUIT DESIGN CENT

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI HIGH-PERFORMANCE INTEGRATED CIRCUIT DESIGN CENT
Filing Date
2022-10-19
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing technologies struggle to efficiently accelerate the MD5 message digest algorithm, especially as data volume increases. Software implementations are insufficient to meet practical application needs, while existing hardware acceleration methods are costly, inflexible, and lack scalability.

Method used

The MD5 round function iteration instruction based on RISC architecture is adopted to accelerate the parallel execution of two data-independent MD5 message digest algorithms through pipelining. The MD5 message processing round function iteration is completed using fixed-length 32-bit instruction format, and intermediate iteration variables, message words and round iteration numbers are processed in parallel, simplifying the software program.

🎯Benefits of technology

It significantly accelerates the execution speed of the MD5 message digest algorithm, simplifies the software program, and offers design flexibility and scalability, making it suitable for RISC processors and dedicated cryptographic chips.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115525341B_ABST
    Figure CN115525341B_ABST
Patent Text Reader

Abstract

The application relates to an acceleration method of an MD5 message digest algorithm and an instruction set processor. The method is based on MD5 round function iteration instructions, and parallel acceleration of two irrelevant MD5 message digest algorithms is realized by pipelining the MD5 round function iteration instructions. The MD5 round function iteration instructions adopt a RISC architecture, and are used for executing any time message processing round function iteration in the MD5 message digest algorithm according to source operands. The MD5 round function iteration instructions adopt an MD5 round function parallel algorithm, take intermediate iteration variables, message words and round iteration serial numbers in the 16-word message grouping processing process of two groups of irrelevant MD5 message digest algorithms as inputs, complete the MD5 round function iteration of the two groups of data respectively in parallel, and output the results in a specified form. The instruction set processor supports pipelining execution of the MD5 round function iteration instructions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of processor design and information security technology, and in particular to an acceleration method for the MD5 message digest algorithm and an instruction set processor. Background Technology

[0002] With the rapid development of information technology, information security has become increasingly important. Cryptography and related technologies are crucial for information security. Hash cryptography algorithms, represented by MD and SHA algorithms, are important and widely used in information security technology. How to efficiently implement hash cryptography algorithms has become a research hotspot. The MD5 message digest algorithm is a typical hash cryptography algorithm, the fifth version of the MD algorithm (abbreviated as MD5). Designed by Ronald Linn Rivest in 1992, it is specified in RFC 1321. The MD5 message digest algorithm can generate a 128-bit (16-byte) message digest hash value for messages of any length. The MD5 message digest algorithm features good compression, fixed length, irreversibility, high discreteness, and collision resistance, and is widely used in message integrity verification, digital signatures, and network communication security.

[0003] The MD5 message digest algorithm calculation process includes five steps: append padding bits, append length, initialize the MD buffer, process the message in 16-word blocks (512 bits), and output a 128-bit message digest value.

[0004] 1) Bit stuffing refers to extending the input message M by bits to increase the length L of the stuffed message. M (Number of bits) satisfies L M mod 512 = 448, meaning the message length is expanded to a multiple of 512 minus 64. Even if the original message length meets the requirement, bit stuffing is still necessary. The stuffing process involves adding one "1" bit after the message M, followed by several "0" bits, to make the stuffed message length meet the requirement L. M mod 512 = 448, the minimum number of fill bits is 1, and the maximum number of fill bits is 512.

[0005] 2) Length padding means representing the length of the message before padding with a 64-bit number and adding this 64-bit number to the message from the previous step (after bit padding). If the length of the message before bit padding is greater than 2 64 , only its lower 64 bits are used. The length of the message obtained after bit padding and length padding is exactly a multiple of 512 bits and also an integer multiple of 16 words (32-bit). Let M j (j ∈ {0, 1, 2, …… N - 1}) represent the words in the message after bit padding and length padding, where N is an integer multiple of 16.

[0006] 3) Initializing the buffer means initializing and assigning values to the 4 32-bit registers (A, B, C, D) that store the intermediate iterative results of the MD5 message digest algorithm. The initial values (stored with the lower byte first and the higher byte last) are the following hexadecimal values: A = 01234567, B = 89ab cd ef, C = fe dc ba 98, D = 76543210.

[0007] 4) The processing of a 16-word (512-bit) message block is essentially a process of compressing a 16-word (512-bit) message block with a compression function. Its function is to compress a 16-word (512-bit) message block into 4 32-bit working variables through 4 rounds of cyclic operations in sequence. Rounds 1 to 4 of the cycle respectively correspond to their own non-linear round functions F(b, c, d) = (b & c) | ((~b) & d), G(b, c, d) = (b & d) | ((~d) & c), H(b, c, d) = b ⊕ c ⊕ d, I(b, c, d) = c ⊕ (c | (~d)). Each round of the cycle performs 16 iterative operations based on its own non-linear round function in sequence. Let the message word M j be the j-th (j ∈ [0, 15]) 32-bit message word in the 16-word (512-bit) message block, and the iterative constant T[i] = 2 32 * abs(sin(i)), where i is in radians, <<< s represents a cyclic left shift of s bits. When processing each message block, first copy the iterative working variables (A, B, C, D) to (a, b, c, d). The iterative function for the first round of the cycle is FF(a, b, c, d, j, s, i): a = b + ((a + F(b, c, d) + M j + T[i]) <<< s), where + represents modulo 2 32Addition (the same below), perform the following 16 iterative operations in sequence: FF(ABCD, 0, 7, 1), FF(DABC, 1, 12, 2), FF(CDAB, 2, 17, 3), FF(BCDA, 3, 22, 4), FF(ABCD, 4, 7, 5), FF(DABC, 5, 12, 6), FF(CDAB, 6, 17, 7), FF(BCDA, 7, 22, 8), FF(ABCD, 8, 7, 9), FF(DABC, 9, 12, 10), FF(CDAB, 10, 17, 11), FF(BCDA, 11, 22, 12), FF(ABCD, 12, 7, 13), FF(DABC, 13, 12, 14), FF(CDAB, 14, 17, 15), FF(BCDA, 15, 22, 16). The iterative function in the second round of the loop is GG(a, b, c, d, j, s, i): a = b + ((a + G(b, c, d) + M j + T[i]) <<< s), perform the following 16 iterative operations in sequence: GG(ABCD, 1, 5, 17), GG(DABC, 6, 9, 18), GG(CDAB, 11, 14, 19), GG(BCDA, 0, 20, 20), GG(ABCD, 5, 5, 21), GG(DABC, 10, 9, 22), GG(CDAB, 15, 14, 23), GG(BCDA, 4, 20, 24), GG(ABCD, 9, 5, 25), GG(DABC, 14, 9, 26), GG(CDAB, 3, 14, 27), GG(BCDA, 8, 20, 28), GG(ABCD, 13, 5, 29), GG(DABC, 2, 9, 30), GG(CDAB, 7, 14, 31), GG(BCDA, 12, 20, 32). The iterative function in the third round of the loop is HH(a, b, c, d, j, s, i): a = b + ((a + H(b, c, d) + M j+(T[i]) <<< s), perform the following 16 iterative operations in sequence: HH(ABCD, 5, 4, 33), HH(DABC, 8, 11, 34), HH(CDAB, 11, 16, 35), HH(BCDA, 14, 23, 36), HH(ABCD, 1, 4, 37), HH(DABC, 4, 11, 38), HH(CDAB, 7, 16, 39), HH(BCDA, 10, 23, 40), HH(ABCD, 13, 4, 41), HH(DABC, 0, 11, 42), HH(CDAB, 3, 16, 43), HH(BCDA, 6, 23, 44), HH(ABCD, 9, 4, 45), HH(DABC, 12, 11, 46), HH(CDAB, 15, 16, 47), HH(BCDA, 2, 23, 48), and the iterative function of the fourth round of loop is II(a, b, c, d, j, s, i): a = b + ((a + I(b, c, d) + M j +(T[i]) <<< s), perform the following 16 iterative operations in sequence: II(ABCD, 0, 6, 49), II(DABC, 7, 10, 50), II(CDAB, 14, 15, 51), II(BCDA, 5, 21, 52), II(ABCD, 12, 6, 53), II(DABC, 3, 10, 54), II(CDAB, 10, 15, 55), II(BCDA, 1, 21, 56), II(ABCD, 8, 6, 57), II(DABC, 15, 10, 58), II(CDAB, 6, 15, 59), II(BCDA, 13, 21, 60), II(ABCD, 4, 6, 61), II(DABC, 11, 10, 62), II(CDAB, 2, 15, 63), II(BCDA, 9, 21, 64). After completing the above four rounds of loop operations, add the execution results of the loop iterative operations to their initial values a, b, c, d respectively as the initial values of the next packet of data until the MD5 round function iterative processing of all 16 - word (512 - bit) message packets is completed. Concatenate and output the finally iterated results A = A + a, B = B + b, C = C + c, D = D + d, and the 128 - bit message digest value (A, B, C, D) is obtained, where A is at the low - byte end and D is at the high - byte end.

[0008] While the MD5 message digest algorithm can be implemented in software, its high computational complexity requires significant computing resources. The time required to obtain the digest value increases rapidly with message length. As data volumes and application demands grow, software implementations based on general-purpose instructions are increasingly insufficient for practical applications. Therefore, hardware implementations are needed to further improve the performance of MD5 message digest algorithms. Currently, dedicated hardware such as FPGAs, ASICs, and GPUs are commonly used to accelerate MD5 implementations. These methods offer high acceleration efficiency but suffer from high cost, limited design flexibility, and poor versatility and scalability. However, using instruction set architecture (ISA) extensions to accelerate MD5 implementations offers both high efficiency and design flexibility and scalability, effectively improving the performance of MD5 message digest implementations on RISC processors. Summary of the Invention

[0009] The technical problem to be solved by the present invention is to provide an acceleration method and instruction set processor for the MD5 message digest algorithm, which can improve the execution speed of the MD5 message digest algorithm and simplify the software program.

[0010] The technical solution adopted by this invention to solve its technical problem is as follows: A method for accelerating the MD5 message digest algorithm is provided. Based on the MD5 round function iteration instruction, the parallel acceleration of two unrelated MD5 message digest algorithms is achieved through pipelined execution of the MD5 round function iteration instruction. The MD5 round function iteration instruction adopts a RISC architecture, uses a fixed-length 32-bit format, has three 256-bit source operands and one 256-bit target operand, and is used to execute any message processing round function iteration in the MD5 message digest algorithm according to the source operands. The MD5 round function iteration instruction adopts an MD5 round function parallel algorithm, which refers to an algorithm that takes the intermediate iteration variables, message words, and round iteration numbers in the 16-word message grouping process of two unrelated MD5 message digest algorithms as input, completes the MD5 round function iteration of each of the two sets of data in parallel, and outputs the iteration results in a specified format.

[0011] The parallel acceleration of two unrelated MD5 message digest algorithms by executing the MD5 round function iteration instructions in a pipelined manner specifically includes the following steps:

[0012] (1) Load the initial values ​​of the intermediate iteration variables of the MD5 message digest algorithm into the register file in the format {D1,C1,B1,A1,D0,C0,B0,A0}. Then, load the message words from two unrelated messages that have undergone bit stuffing and length stuffing into the format {192'b0,W... 1,j W0,j The two unrelated round iteration numbers are loaded into the register file in the form {192'b0,Row1,Row0};

[0013] (2) Using the initial values ​​of the iteration variables in the register file {D1,C1,B1,A1,D0,C0,B0,A0}, and the message words {192'b0,W} in two unrelated messages... 1,j W 0,j}, using the two unrelated round iteration indices {192'b0,Row1,Row0} as the source operands, execute the first MD5 round function iteration instruction, generating the MD5 intermediate iteration variable {C (1) 1,B (1) 1,A' (1) 1,D (1) 1,C (1) 0,B (1) 0,A' (1) 0,D (1) 0};

[0014] (3) Execute the next MD5 round function iteration instruction sequentially, and each time process the result of the previous MD5 round function iteration instruction {C}. (i-1) 1,B (i-1) 1,A' (i-1) 1,D (i-1) 1,C (i-1) 0,B (i-1) 0,A' (i-1) 0,D (i-1) 0} is updated as the source operand A for the next instruction. Data from registers Vb and Vc, containing the updated message word and iteration number, is read from the register file as the new source operands B and C. This completes rounds 2-64 of function iteration processing for the 16-word message blocks in the two unrelated MD5 message digest algorithm, ultimately obtaining the MD5 intermediate iteration variable {C}. (64) 1,B (64) 1,A' (64) 1,D (64) 1,C (64) 0,B (64) 0,A' (64) 0,D (64) 0};

[0015] (4) Perform the summation of the intermediate iteration variables of MD5 in parallel to obtain the intermediate iteration variable {D1+D}. (64) 1,C1+C (64) 1,B1+B (64) 1,A1+A' (64) 1,D0+D (64)0, C0+C (64) 0,B0+B (64) 0,A0+A' (64) 0};

[0016] (5) If the input message also includes unprocessed 16-character message groups, the MD5 intermediate iteration variable obtained in step (4) is used as the initial iteration value for processing the next 16-character message group. Steps (2)-(4) are continued to be executed in a loop until the last 16-character message group in the message is processed, and the execution result {D1,C1,B1,A1,D0,C0,B0,A0} is obtained.

[0017] (6) Output the execution result {D1,C1,B1,A1,D0,C0,B0,A0} to complete the execution of the MD5 message digest algorithm for two sets of unrelated data and obtain the digest values ​​of the two sets of unrelated messages.

[0018] The instruction format of the MD5 round function iteration instruction is MD5R Va,Vb,Vc,Vd, which is used to indicate the operation of three source operands in three 256-bit registers Va, Vb, and Vc. The result is stored in a 256-bit destination register Vd. In the 32-bit instruction, bits [31:26] represent the 6-bit opcode, bits [25:21] represent the address of a 256-bit register Va selected from a set of 32 256-bit registers, bits [20:16] represent the address of a 256-bit register Vb selected from a set of 32 256-bit registers, bits [15:10] represent the 6-bit function code used to determine the specific instruction function, bits [9:5] represent the address of a 256-bit register Vc selected from a set of 32 256-bit registers, and bits [4:0] represent the address of a 256-bit destination register Vd selected from a set of 32 256-bit registers.

[0019] The specific MD5 round function iteration instruction is as follows: Based on the intermediate iteration variables {C1,B1,A1,D1,C0,B0,A0,D0} of the MD5 message digest algorithm from two sets of unrelated data from register Va, the two unrelated message words {M1,M0} from register Vb, and the two unrelated round iteration numbers {Row1,Row0} from register Vc, execute any one MD5 message processing round function iteration of the MD5 message digest algorithm, complete the MD5 round function iteration in the 16-word message grouping process of the MD5 message digest algorithm, and store the obtained execution result {C1,B1,A1',D1,C0,B0,A0',D0} in the target register Vd, where Row1 and Row0 have valid values ​​from 1 to 64, and for i equal to 0 and 1, A i The generation logic of ' is: A'i =B i +(A i +Temp i +W i +TRow i )<< <SRow i Where "+" represents modulo 2. 32 Addition, <<< indicates a circular left shift, Temp i It is a 32-bit intermediate variable, determined by the round iteration number Row. i Determine the selected round function, when 0 <Row i =<16 hours Temp i [31:0]=F(B i C i D i ), when 16 <Row i =<32 when Temp i [31:0]=G(B i C i D i ), when 32 <Row i =<48 when Temp i [31:0]=H(B i C i D i When 48 <Row i =<64 when Temp i [31:0]=I(B i C i D i ), where F, G, H, and I are the round functions specified by the MD5 message digest algorithm, TRow i It is a 32-bit round iteration constant, TRow i Based on the MD5 round iteration number Row i Checking the table confirms, SRow i It is a 32-bit round iteration shift constant, SRow i Based on the MD5 round iteration number Row i The table is used to determine the process; during any iteration of the MD5 message processing round function of the MD5 message digest algorithm, hardware logic is used to implement the round function processing, table lookup operation, and shift operation; executing the MD5 round function iteration instruction once can realize any MD5 round function iteration of two unrelated MD5 message digest algorithms respectively, and executing the MD5 round function iteration instruction 64 times consecutively can realize the MD5 round function iteration processing of 16-word message groups in two unrelated MD5 message digest algorithms.

[0020] During any iteration of the MD5 message processing round function in the execution of the MD5 message digest algorithm, temporary parameters are obtained using the round function based on the intermediate iteration variables and the round iteration sequence number; the round iteration constant and the round iteration cyclic shift constant are obtained by looking up a table based on the round iteration sequence number; and the intermediate iteration variables are updated according to the temporary parameters, message words, intermediate iteration variables, round iteration constant, and round iteration cyclic shift constant.

[0021] The technical solution adopted by this invention to solve its technical problem is as follows: An instruction set processor is provided, including a register file and an MD5 round function iteration instruction execution unit. The register file is used to store source operands A, B, and C. The MD5 round function iteration instruction execution unit is used to receive and execute MD5 round function iteration instructions. The input signals of the MD5 round function iteration instruction execution unit include a 256-bit source operand A, a 256-bit source operand B, and a 256-bit source operand C. The output signal is a 256-bit execution result {C1,B1,A1',D1,C0,B0,A0',D0}. The MD5 round function iteration instruction execution unit implements round function processing, table lookup operations, and shift operations through hardware logic.

[0022] The MD5 round function iteration instruction execution unit includes: a round function module for obtaining temporary parameters based on intermediate iteration variables and round iteration numbers using a round function; a first lookup table module for obtaining round iteration constants based on round iteration numbers by looking up a table; a second lookup table module for obtaining round iteration cyclic shift constants based on round iteration numbers by looking up a table; and a logic operation module for performing logic operations based on temporary parameters, message words, intermediate iteration variables, round iteration constants, and round iteration cyclic shift constants to update the intermediate iteration variables.

[0023] The delay of the MD5 round function iteration instruction is 1 clock cycle. The output of the MD5 round function iteration instruction execution unit can be bypassed to the input of the MD5 round function iteration instruction execution unit and used as the source operand A of the next MD5 round function iteration instruction. The instruction set processor supports the pipelined execution of the MD5 round function iteration instruction.

[0024] Beneficial effects

[0025] By adopting the above-mentioned technical solution, the present invention has the following advantages and positive effects compared with the prior art:

[0026] This invention enables parallel processing of round functions of two unrelated MD5 message digest algorithms; a single VMD5R instruction can complete any round iteration in the MD5 message digest algorithm, and by executing the VMD5R instruction multiple times consecutively, the processing of the 16-word (512-bit) message block, which has the largest computational load in the MD5 message digest algorithm, can be completed. This significantly accelerates the execution speed of the MD5 message digest algorithm and simplifies the software program of the MD5 message digest algorithm, which is beneficial for algorithm writing and reduces the storage overhead of the algorithm.

[0027] This invention employs pipeline technology to fully realize the parallel potential of 16-word (512-bit) message packet processing in the MD5 message digest algorithm. By sequentially pipelined execution of 64 VMD5R instructions, the round function iterative processing of a group of 16-word (512-bit) message packets in two sets of unrelated MD5 message digest algorithms can be completed, significantly accelerating the execution speed of the MD5 message digest algorithm.

[0028] In this invention, the execution latency of the processor's VMD5R instruction is 1 clock cycle, supporting pipelined execution of VMD5R instructions. The execution result of the preceding VMD5R instruction can be bypassed and used as an input operand for the following VMD5R instruction. This processor improves execution speed by optimizing the internal computational steps of the DM5 message packet processing round function iteration, implementing F / G / H / I round function processing with dedicated hardware logic, and using lookup tables and hardware logic to implement shift operations in the algorithm. Using this processor, the round function iteration processing of two sets of 16-word (512-bit) message packets in the MD5 message digest algorithm, which are independent of each other, can be completed in as little as 64 clock cycles, significantly accelerating the execution speed of the MD5 message digest algorithm.

[0029] This invention fully explores and realizes the parallel potential of the round function iteration in the DM5 message block processing of the MD5 message digest algorithm, effectively accelerating the execution of the MD5 message digest algorithm. The VMD5R instructions, methods, and execution units for accelerating the MD5 message digest algorithm provided by this invention have the advantages of easy portability and good scalability. They are easily integrated or connected to the execution process of an RISC processor and can be applied to RISC processors or dedicated cryptographic chips to improve their performance in executing the MD5 message digest algorithm. Attached Figure Description

[0030] Figure 1 This is a flowchart of the VMD5R instruction execution process;

[0031] Figure 2 This is the flowchart of the MD5 round function parallel algorithm;

[0032] Figure 3This is a flowchart of the process for accelerating the MD5 message digest algorithm;

[0033] Figure 4 This is a schematic diagram of the instruction format of the VMD5R instruction;

[0034] Figure 5 This is a schematic diagram of a processor that accelerates the MD5 message digest algorithm;

[0035] Figure 6 This is a structural diagram of the VMD5R instruction execution unit. Detailed Implementation

[0036] The present invention will be further illustrated below with reference to specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Furthermore, it should be understood that after reading the teachings of this invention, those skilled in the art can make various alterations or modifications to the invention, and these equivalent forms also fall within the scope defined by the appended claims.

[0037] The MD5 message digest algorithm consists of five steps: bit stuffing, length stuffing, buffer initialization, 16-word (512-bit) message block processing, and outputting a 128-bit message digest value. The core of the algorithm lies in the 16-word (512-bit) message block processing, which includes four rounds (64 iterations) of loop iterations, summing the iteration results with the initial value, and outputting the result. Each iteration requires executing the non-linear round function specified by the MD5 message digest algorithm (one of F(b,c,d), G(b,c,d), H(b,c,d), or I(b,c,d), whose main operations are logical AND, OR, NOT, and XOR), and a 4-times modulo 2 round. 32 Operations such as addition and circular shifting; the result of each iteration serves as the input for the next iteration, which means that different iterations within the same MD5 message digest algorithm cannot be executed in parallel. Therefore, the key to accelerating the MD5 message digest algorithm lies in fully exploring and implementing the inherent parallelism of the MD5 message processing iteration function (FF / GG / HH / II) to shorten the time of each iteration, while simultaneously enabling the parallel execution of different MD5 message digest algorithms with unrelated data as much as possible.

[0038] The inventors of this invention discovered that there is potential for acceleration in the 16-word (512-bit) message grouping process of the MD5 message digest algorithm. Dedicated instructions can be used to fully realize the inherent parallelism of the MD5 message processing iteration functions (FF / GG / HH / II). By optimizing the implementation flow of the MD5 message processing iteration functions (FF / GG / HH / II), the critical path can be shortened, thereby improving execution speed. Based on the characteristic of modern processors supporting highly data-parallel instructions, multiple data-independent MD5 algorithms can be processed in parallel to simultaneously execute multiple data-independent MD5 message processing iteration functions, thus achieving the goal of accelerating the MD5 message digest algorithm.

[0039] The embodiments of the present invention relate to an acceleration method for the MD5 message digest algorithm. This method is based on the MD5 round function iteration instruction (abbreviated as VMD5R instruction) and achieves parallel acceleration of two data-unrelated MD5 message digest algorithms by pipelined execution of the MD5 round function iteration instruction.

[0040] The VMD5R instruction set adopts a RISC architecture and uses a fixed-length 32-bit format, such as... Figure 1 As shown, it has three 256-bit source operands and one 256-bit destination operand, and can execute any one iteration of the message processing round function in the MD5 message digest algorithm based on the source operands; the VMD5R instruction adopts the MD5 round function parallel algorithm, such as... Figure 2 As shown, it is possible to complete any one iteration of the MD5 round function in two sets of unrelated MD5 message digest algorithms in parallel at once; the parallel MD5 round function algorithm refers to the intermediate iteration variables {D1,C1,B1,A1,D0,C0,B0,A0} and message words {W} in the 16-word (512-bit) message grouping process of the two sets of unrelated MD5 message digest algorithms. 1,j W 0,j The algorithm takes the round iteration numbers {Row1,Row0} as input, performs parallel MD5 round function iterations on two sets of data, and outputs the iteration results in the form {C1,B1,A'1,D1,C0,B0,A'0,D0}.

[0041] The MD5 message digest algorithm acceleration method achieves parallel acceleration of the message processing iteration process of two unrelated MD5 message digest algorithm 16-word (512-bit) message blocks through pipelined execution of VMD5R instructions, such as... Figure 3 As shown, the specific steps are as follows:

[0042] 1) Load the initial values ​​of the intermediate iteration variables of the MD5 message digest algorithm into the register file in the format {D1,C1,B1,A1,D0,C0,B0,A0} (where A1=A0=67452301, B1=B0=efcd ab 89, C1=C0=98ba dc fe, D1=D0=10325476). Then, load the message words from two unrelated messages (16-word blocks) that have undergone bit stuffing and length stuffing into the register file in the format {192'b0,W 1,j W 0,j Load the two unrelated round iteration numbers into the register file in the form {192'b0, Row1, Row0};

[0043] 2) Using the initial values ​​of the iteration variables in the register file {D1,C1,B1,A1,D0,C0,B0,A0}, and the message word {192'b0,W} in two unrelated messages (16-word groups) 1,j W 0,j The first VMD5R instruction is executed with {192'b0, Row1, Row0} as the source operands for two unrelated round iterations, generating the MD5 intermediate iteration variable {C}. (1) 1,B (1) 1,A' (1) 1,D (1) 1,C (1) 0,B (1) 0,A' (1) 0,D (1) 0};

[0044] 3) Following step 2), continue executing 63 VMD5R instructions sequentially, each time processing the result of the previous VMD5R instruction {C}. (i-1) 1,B (i-1) 1,A' (i-1) 1,D (i-1) 1,C (i-1) 0,B (i-1) 0,A' (i-1) 0,D (i-1) 0} is updated to the source operand A of the next instruction. The data in the source registers Vb and Vc after updating the message word and iteration number are read from the register file as the new operands B and C. This completes the function iteration processing of 16-word (512-bit) message blocks in the MD5 message digest algorithm for rounds 2 to 64, which are two data that are not related. Finally, the intermediate iteration variable {C} of MD5 is obtained. (64) 1,B (64) 1,A' (64) 1,D (64) 1,C (64) 0,B (64) 0,A'(64) 0,D (64) 0};

[0045] 4) Use general-purpose instructions in the processor to perform the summation of the intermediate iteration variables of MD5 in parallel, and obtain the intermediate iteration variable of MD5 {D1+D...} (64) 1,C1+C (64) 1,B1+B (64) 1,A1+A' (64) 1,D0+D (64) 0, C0+C (64) 0,B0+B (64) 0,A0+A' (64) 0};

[0046] 5) If the input message also includes an unprocessed 16-word (512-bit) message block, use the MD5 intermediate iteration variable obtained in step 4) as the initial iteration value for processing the next 16-word (512-bit) message block, and continue to execute steps 2), 3), and 4) in a loop until the last 16-word (512-bit) message block in the message is processed, and obtain the execution result {D1,C1,B1,A1,D0,C0,B0,A0}.

[0047] 6) Output the execution result {D1,C1,B1,A1,D0,C0,B0,A0} from step 5) to complete the execution of the MD5 message digest algorithm for two sets of unrelated data, and obtain the digest values ​​of the two sets of unrelated messages.

[0048] In this implementation, the instruction format of the MD5 round function iteration instruction (MD5R) is MD5R Va,Vb,Vc,Vd, which instructs the operation of three source operands in three 256-bit source registers Va, Vb, and Vc. The result is stored in a 256-bit destination register Vd. Figure 4 As shown, bits [31:26] of the 32-bit instruction represent the 6-bit opcode, bits [25:21] represent the address of a 256-bit register Va selected from a set of 32 256-bit register files, bits [20:16] represent the address of a 256-bit register Vb selected from a set of 32 256-bit register files, bits [15:10] represent the 6-bit function code used to determine the specific function of the instruction, bits [9:5] represent the address of a 256-bit register Vc selected from a set of 32 256-bit register files, and bits [4:0] represent the address of a 256-bit register Vd selected from a set of 32 256-bit register files.

[0049] The MD5 round function iteration instruction (MD5R) takes two sets (each set of four 32-bit iteration variable words) of unrelated intermediate iteration variables {C1,B1,A1,D1,C0,B0,A0,D0} from the source register Va, two unrelated message words {M1,M0} from the source register Vb (lower 64 bits), and two unrelated round iteration numbers {Row1,Row0} from the source register Vc (lower 64 bits). Row1 and Row0 can take values ​​from 1 to 64. It executes any MD5 message processing round function iteration of the MD5 message digest algorithm, completes the MD5 round function iteration in the 16-word (512-bit) message grouping process of the MD5 message digest algorithm, and stores the execution result {C1,B1,A1,D1,C0,B0,A0,D0} in the destination register Vd. The MD5 round function iteration instruction (MD5R) performs the following operations: obtaining temporary parameters based on intermediate iteration variables and round iteration numbers using a round function; obtaining round iteration constants and round iteration cyclic shift constants by looking up tables based on the round iteration numbers; and updating the intermediate iteration variables according to the temporary parameters, message words, intermediate iteration variables, round iteration constants, and round iteration cyclic shift constants. Specifically:

[0050]

[0051]

[0052] Among them, F_MD5RLUTT(Row i The function is used to determine the MD5 round iteration number Row. i Determine a 32-bit round iteration constant TRow in the MD5 message digest algorithm during the processing of 16-word (512-bit) message blocks. i The specific values ​​are shown in Table 1, where Row i All data are in decimal, TRow i All data are in hexadecimal.

[0053] Table 1. F_MD5RLUTT(Row) in the MD5 round function iteration instruction (MD5R) i Function value

[0054]

[0055]

[0056] F_MD5RROT(Row i The function is used to determine the MD5 round iteration number Row. i Determine the round-by-round shift constant SRow during the 16-word (512-bit) message packet processing in the MD5 message digest algorithm.i The specific values ​​are shown in Table 2, where all data are decimal numbers.

[0057] Table 2. F_MD5RROT(Row) in the MD5 round function iteration instruction (MD5R) i Function value

[0058] <![CDATA[Row i ]]> <![CDATA[TRow i ]]> <![CDATA[Row i ]]> <![CDATA[TRow i ]]> <![CDATA[Row i ]]> <![CDATA[TRow i ]]> <![CDATA[Row i ]]> <![CDATA[TRow i ]]> 1 7 2 12 3 17 4 22 5 7 6 12 7 17 8 22 9 7 10 12 11 17 12 22 13 7 14 12 15 17 16 22 17 5 18 9 19 14 20 20 21 5 22 9 23 14 24 20 25 5 26 9 27 14 28 20 29 5 30 9 31 14 32 20 33 4 34 11 35 16 36 23 37 4 38 11 38 16 40 23 41 4 42 11 43 16 44 23 45 4 46 11 47 16 48 23 49 6 50 10 51 15 52 21 53 6 54 10 55 15 56 21 57 6 58 10 59 15 60 21 61 6 62 10 63 15 64 21

[0059] Executing the MD5 round function iteration instruction (MD5R) once can perform any one MD5 round function iteration for two unrelated MD5 message digest algorithms. Executing this instruction 64 times consecutively can perform MD5 round function iteration processing for 16-word (512-bit) message blocks in two unrelated MD5 message digest algorithms.

[0060] Embodiments of the present invention relate to an instruction set processor, such as... Figure 5 As shown, it includes an instruction unit, an instruction decoding unit, an instruction scheduling and issuing unit, an instruction execution unit (including a VMD5R instruction execution unit), an instruction submission ROB unit, and a register file containing 32 256-bit registers. The VMD5R instruction execution unit is used to receive and execute an MD5 round function iteration instruction (VMD5R), and its function is to execute any MD5 message processing round function iteration according to the input information.

[0061] The VMD5R instruction execution unit provides pipelined execution for VMD5R instructions. This unit receives and executes VMD5R instructions, and its input signals include: a 256-bit source operand A (two sets of uncorrelated MD5 intermediate iteration variables {D1, C1, B1, A1, D0, C0, B0, A0}, each set including four 32-bit MD5 intermediate iteration variables, from the source register Va in the register file or bypass data from the result of the previous VMD5R instruction), and a 256-bit... The source operand B (the lower 64 bits of which are the message words {W1, W0} of two uncorrelated MD5 message digest algorithms, from the register file) and a 256-bit source operand C (the lower 64 bits of which are the round iteration numbers {Row1, Row0} of two uncorrelated MD5 message digest algorithms) output a 256-bit execution result {C1, B1, A'1, D1, C0, B0, A'0, D0} (written back to the register file, and whether to bypass to the input of the VMD5R instruction execution unit is determined according to the information of the next instruction).

[0062] like Figure 6As shown, the MD5 round function iteration instruction execution unit includes: a round function module for obtaining temporary parameters based on intermediate iteration variables and round iteration numbers using a round function; a first lookup table module for obtaining round iteration constants based on round iteration numbers by looking up a table; a second lookup table module for obtaining round iteration cyclic shift constants based on round iteration numbers by looking up a table; and a logic operation module for performing logical operations based on temporary parameters, message words, intermediate iteration variables, round iteration constants, and round iteration cyclic shift constants to update the intermediate iteration variables. The round function module, the first lookup table module, the second lookup table module, and the logic operation module are all implemented using hardware logic. By optimizing the operation steps of the DM5 message processing round function iteration, the execution speed is improved.

[0063] The execution latency of the VMD5R instruction in the processor is 1 clock cycle (the clock cycle of the VMD5R instruction can be adjusted according to the operating frequency of the specific processor (processor core)). The output of the VMD5R instruction execution unit can be bypassed to the input of the VMD5R instruction execution unit and used as the source operand A of the next VMD5R instruction. The round function iteration processing of 16-word (512-bit) message blocks in the MD5 message digest algorithm, which involves two sets of unrelated data, can be completed in as little as 64 clock cycles. This invention can be used in various processors (processor cores) or dedicated cryptographic chips, including various general-purpose or dedicated RISC processors (processor cores), dedicated cryptographic chips, or other processors (processor cores).

Claims

1. A method for accelerating the MD5 message digest algorithm, characterized in that, Based on MD5 round function iteration instructions, this paper accelerates the parallel processing of two unrelated MD5 message digest algorithms through pipelined execution of these instructions. The MD5 round function iteration instructions employ a RISC architecture, using a fixed-length 32-bit format with three 256-bit source operands and one 256-bit destination operand. These operands are used to execute any round function iteration of the MD5 message digest algorithm based on the source operands. The MD5 round function iteration instructions utilize a parallel MD5 round function algorithm. This parallel algorithm takes the intermediate iteration variables, message words, and round iteration numbers from the 16-word message grouping process of two unrelated MD5 message digest algorithms as input, completes the MD5 round function iterations of each set of data in parallel, and outputs the iteration results in a specified format. The parallel acceleration of two unrelated MD5 message digest algorithms by executing the MD5 round function iteration instructions in a pipelined manner specifically includes the following steps: (1) Initialize the intermediate iteration variables of the MD5 message digest algorithm with Load the format into the register file, and sort the message words from two unrelated messages that have undergone bit stuffing and length stuffing according to... The data is loaded into the register file in the form of two unrelated round iteration numbers. The register file is loaded into the form of [the specified format]. (2) Initialize the iteration variables in the register file Message words in two unrelated messages Two unrelated round iteration numbers Execute the first MD5 round function iteration instruction on the source operand to generate intermediate MD5 iteration variables. ; (3) Execute the next MD5 round function iteration instruction sequentially, and each time use the execution result of the previous MD5 round function iteration instruction. Update source operand A to the next instruction, read the data from registers Vb and Vc (containing the updated message word and iteration number) from the register file as new source operands B and C, complete rounds 2-64 of function iteration processing for 16-word message blocks in two unrelated MD5 message digest algorithms, and finally obtain the intermediate MD5 iteration variables. ; (4) Perform the summation of the intermediate iteration variables of MD5 in parallel to obtain the intermediate iteration variables of MD5. ; (5) If the input message also includes unprocessed 16-character message blocks, use the MD5 intermediate iteration variable obtained in step (4) as the initial value for processing the next 16-character message block, and continue to execute steps (2)-(4) in a loop until the last 16-character message block in the message is processed, and obtain the execution result. ; (6) The execution result Output: Perform the MD5 message digest algorithm on two sets of unrelated data to obtain the digest values ​​of the two unrelated messages.

2. The method for accelerating the MD5 message digest algorithm according to claim 1, characterized in that, The instruction format of the MD5 round function iteration instruction is MD5R Va,Vb, Vc,Vd, which is used to indicate the operation of three source operands in three 256-bit registers Va, Vb, and Vc. The result is stored in a 256-bit destination register Vd. In the 32-bit instruction, bits [31:26] represent the 6-bit opcode, bits [25:21] represent the address of a 256-bit register Va selected from a set of 32 256-bit registers, bits [20:16] represent the address of a 256-bit register Vb selected from a set of 32 256-bit registers, bits [15:10] represent the 6-bit function code used to determine the specific instruction function, bits [9:5] represent the address of a 256-bit register Vc selected from a set of 32 256-bit registers, and bits [4:0] represent the address of a 256-bit destination register Vd selected from a set of 32 256-bit registers.

3. The method for accelerating the MD5 message digest algorithm according to claim 1, characterized in that, The MD5 round function iteration instruction specifically involves: using the intermediate iteration variables of the MD5 message digest algorithm based on two sets of unrelated data from register Va. Two unrelated message words from register Vb The round iteration sequence numbers are unrelated to the two data from register Vc. Execute any one of the MD5 message processing round function iterations of the MD5 message digest algorithm, complete the MD5 round function iterations in the 16-word message block processing process of the MD5 message digest algorithm, and record the execution results. Stored in the destination register Vd, where Row1 and Row0 have valid values ​​from 1 to 64. For i equal to 0 and 1, A i The generation logic of ' is: A' i = B i + (A i + Temp i + W i +TRow i ) <<< SRow i Where "+" represents modulo 2. 32 Addition, <<< indicates a circular left shift, Temp i It is a 32-bit intermediate variable, determined by the round iteration number Row. i Determine the selected round function, when 0 <Row i =<16 hours Temp i [31:0] = F(B i C i D i ), when 16 <Row i =<32 when Temp i [31:0] = G(B i C i D i ), when 32 <Row i When =<48, Temp i [31:0] = H(B i C i D i When 48 <Row i =<64 when Temp i [31:0] = I(B i C i D i ), where F, G, H, and I are the round functions specified by the MD5 message digest algorithm, TRow i It is a 32-bit round iteration constant, TRow i Based on the MD5 round iteration number Row i Checking the table confirms, SRow i It is a 32-bit round-following shift constant, SRow i Based on the MD5 round iteration number Row i The table is used to determine the process; during any iteration of the MD5 message processing round function of the MD5 message digest algorithm, hardware logic is used to implement the round function processing, table lookup operation, and shift operation; executing the MD5 round function iteration instruction once can realize any iteration of the MD5 round function of two unrelated MD5 message digest algorithms, and executing the MD5 round function iteration instruction 64 times consecutively can realize the MD5 round function iteration processing of 16-word message groups in two unrelated MD5 message digest algorithms.

4. The method for accelerating the MD5 message digest algorithm according to claim 3, characterized in that, During any iteration of the MD5 message processing round function in the execution of the MD5 message digest algorithm, temporary parameters are obtained using the round function based on the intermediate iteration variables and the round iteration sequence number; the round iteration constant and the round iteration cyclic shift constant are obtained by looking up a table based on the round iteration sequence number; and the intermediate iteration variables are updated according to the temporary parameters, message words, intermediate iteration variables, round iteration constant, and round iteration cyclic shift constant.

5. An instruction set processor, characterized in that, The system includes a register file and an MD5 round function iteration instruction execution unit. The register file stores source operands A, B, and C. The MD5 round function iteration instruction execution unit receives and executes the MD5 round function iteration instructions. The input signals of the MD5 round function iteration instruction execution unit include a 256-bit source operand A, a 256-bit source operand B, and a 256-bit source operand C, and the output signal is a 256-bit execution result. The MD5 round function iteration instruction execution unit implements round function processing, table lookup operations, and shift operations through hardware logic; it achieves parallel acceleration of two unrelated MD5 message digest algorithms through pipelined execution of the MD5 round function iteration instructions, specifically including the following steps: (1) Initialize the intermediate iteration variables of the MD5 message digest algorithm with Load the format into the register file, and sort the message words from two unrelated messages that have undergone bit stuffing and length stuffing according to... The data is loaded into the register file in the form of two unrelated round iteration numbers. The register file is loaded into the form of [the specified format]. (2) Initialize the iteration variables in the register file Message words in two unrelated messages Two unrelated round iteration numbers Execute the first MD5 round function iteration instruction on the source operand to generate intermediate MD5 iteration variables. ; (3) Execute the next MD5 round function iteration instruction sequentially, and each time use the execution result of the previous MD5 round function iteration instruction. Update source operand A to the next instruction, read the data from registers Vb and Vc (containing the updated message word and iteration number) from the register file as new source operands B and C, complete rounds 2-64 of function iteration processing for 16-word message blocks in two unrelated MD5 message digest algorithms, and finally obtain the intermediate MD5 iteration variables. ; (4) Perform the summation of the intermediate iteration variables of MD5 in parallel to obtain the intermediate iteration variables of MD5. ; (5) If the input message also includes unprocessed 16-character message blocks, use the MD5 intermediate iteration variable obtained in step (4) as the initial value for processing the next 16-character message block, and continue to execute steps (2)-(4) in a loop until the last 16-character message block in the message is processed, and obtain the execution result. ; (6) The execution result Output: Perform the MD5 message digest algorithm on two sets of unrelated data to obtain the digest values ​​of the two unrelated messages.

6. The instruction set processor according to claim 5, characterized in that, The MD5 round function iteration instruction execution unit includes: a round function module for obtaining temporary parameters based on intermediate iteration variables and round iteration numbers using a round function; a first lookup table module for obtaining round iteration constants based on round iteration numbers by looking up a table; a second lookup table module for obtaining round iteration cyclic shift constants based on round iteration numbers by looking up a table; and a logic operation module for performing logic operations based on temporary parameters, message words, intermediate iteration variables, round iteration constants, and round iteration cyclic shift constants to update the intermediate iteration variables.

7. The instruction set processor according to claim 5, characterized in that, The delay of the MD5 round function iteration instruction is 1 clock cycle, and the output result of the MD5 round function iteration instruction execution unit can be bypassed to the input of the MD5 round function iteration instruction execution unit and used as the source operand A of the next MD5 round function iteration instruction; The instruction set processor supports the pipelining execution of the MD5 round function iteration instructions.