A method for accelerating SM3 cryptographic hash algorithm and instruction set processor
By designing an extended instruction set for SM3 and employing parallel pipeline and instruction-level parallelism techniques, the SM3 cryptographic hash algorithm is accelerated, solving the problems of high computational complexity and instruction set expansion. This achieves efficient execution of the SM3 cryptographic hash algorithm, making it suitable for domestic general-purpose processors and information security technology fields.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANGHAI HIGH-PERFORMANCE INTEGRATED CIRCUIT DESIGN CENT
- Filing Date
- 2022-10-19
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies have high computational complexity when executing the SM3 cryptographic hash algorithm, resulting in high resource consumption. Furthermore, the instruction set extensions of general-purpose processors have licensing period and version upgrade issues, making it difficult to meet the needs of large data volumes and real-time processing.
We designed an SM3 extended instruction set using parallel pipeline and instruction-level parallelism techniques. This set includes SM3 message word extension instructions and SM3 working variable word iterative update instructions. We accelerated the SM3 cryptographic hash algorithm by using multi-message word parallel extension and multi-round iterative fusion algorithms. We also implemented multi-data parallel processing using 32-bit format instructions of the RISC architecture.
It significantly improves the execution speed of the SM3 cryptographic hash algorithm, reduces the complexity and storage space of the software program, is easy to integrate into general-purpose processors, and supports information security applications of domestic processors.
Smart Images

Figure CN115525342B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of processor design and information security technology, and in particular to an acceleration method and instruction set processor for the SM3 cryptographic hash algorithm. Background Technology
[0002] Cryptographic security profoundly impacts information security and national security. To ensure cryptographic security, the State Cryptography Administration has formulated and promulgated a series of cryptographic standards, constructing a comprehensive cryptographic system architecture in my country. The SM3 cryptographic hash algorithm, promulgated by the State Cryptography Administration in 2010, is a widely used cryptographic hash algorithm in technologies such as digital signatures and authentication, random number generation, and message verification code generation and verification. Accelerating the speed at which processors execute the SM3 cryptographic hash algorithm is of great significance.
[0003] The SM3 cryptographic hash algorithm is a typical cryptographic hash algorithm. Its core algorithms are the SM3 message expansion function and the SM3 iterative compression function. The main function of the SM3 cryptographic hash algorithm is to process hashes of length l (l < 2^3). 64 The SM3 message word expansion process takes 156 bits, is padded and iteratively compressed to generate a hash value, which is 256 bits long. Its main characteristics are high security and high computational complexity; the SM3 message word expansion process requires 52 rounds of W... j The message word (32 bits) is generated iteratively and consists of 64 W's. j Message generation: The SM3 compression function requires 64 rounds of iterative compression to generate the hash value. While the SM3 cryptographic hash algorithm offers strong security, it inevitably introduces a high computational complexity, especially when implemented on general-purpose processors using a general-purpose instruction set, which often consumes a significant amount of computing resources.
[0004] Existing methods for accelerating the SM3 cryptographic hash algorithm mainly include software algorithm optimization, dedicated hardware acceleration, and instruction set architecture (ISA) extensions. Software algorithm optimization offers advantages such as high flexibility and low hardware overhead, but its optimization potential is limited, and it is susceptible to security threats such as side-channel attacks. In applications with large data volumes and high real-time processing requirements, software algorithm optimization is increasingly unable to meet practical application needs. Dedicated hardware acceleration typically offers good performance, but it incurs high hardware overhead and cost, and has poor scalability, making it difficult to directly port or integrate with the execution process of general-purpose processors. ISA extensions can accelerate the execution of the SM3 cryptographic algorithm while offering design flexibility and combining the advantages of both software and hardware acceleration. In particular, current mainstream general-purpose processor instruction sets provide highly data-parallel instructions, among which vector registers supporting Single Instruction Multiple Data (SIMD) are very suitable for implementing multiple data parallel processing. These can be used to design and extend instruction sets specifically for accelerating the SM3 cryptographic hash algorithm, thereby improving the performance of general-purpose processors in cryptographic security applications.
[0005] Currently, the mainstream international processor instruction set architectures (ISAs) mainly include Intel's x86 architecture, IBM's Power architecture, and ARM's ARM architecture. Most mainstream processor manufacturers in China use instruction set architectures licensed from companies like Intel, IBM, and ARM for processor research and development. While adopting foreign instruction set architectures has the advantage of a relatively mature application development ecosystem, it presents challenges such as licensing periods and version upgrades, and it is inconvenient to provide dedicated extensions for my country's information security technologies. Summary of the Invention
[0006] The technical problem to be solved by the present invention is to provide an acceleration method and instruction set processor for the SM3 cryptographic hash algorithm, which can significantly improve the speed at which the processor executes the SM3 cryptographic hash algorithm.
[0007] The technical solution adopted by this invention to solve its technical problem is as follows: A method for accelerating the SM3 cryptographic hash algorithm is provided, based on the SM3 extended instruction set, using parallel pipeline and instruction-level parallelism to accelerate the execution of the SM3 cryptographic hash algorithm; the SM3 extended instruction set adopts a RISC architecture, the instructions use a fixed-length 32-bit format, and both the source operand and the destination operand are 256 bits; the SM3 extended instruction set includes SM3 message word expansion instructions and SM3 working variable word iteration update instructions; the SM3 message word expansion instructions use a multi-message word parallel expansion algorithm to accelerate the SM3 message expansion function; the multi-message word parallel expansion algorithm uses 16 message words in the padded message as initial input, and can complete 8 message words in a single execution. The generation process involves executing the SM3 message word expansion instruction 7 times to generate 68 message words. The SM3 working variable word iterative update instruction employs a multi-round iterative fusion algorithm to accelerate the SM3 iterative compression function. This algorithm integrates the expanded message words obtained after processing by the SM3 message word expansion instruction into the execution process of the SM3 iterative compression function. Each time, it uses 8 message words and the working variable word in the iteration as input to complete 4 rounds of iterative updates of the SM3 working variable word. The SM3 working variable word iterative update instruction is executed 16 times to complete the 0th to 63rd rounds of iterative updates of the SM3 algorithm's working variable word, thereby obtaining the final working variable word output by the SM3 compression function.
[0008] The method of accelerating the execution of the SM3 cryptographic hash algorithm using parallel pipeline and instruction-level parallelism specifically includes the following steps:
[0009] (1) Fill the 16 message words W0, W1, ..., W in the message after filling. 15 As input, the SM3 message word extension instruction described in step 1 is executed to generate a new message word W. 16 W 17 , ..., W 23 Simultaneously, the message words W0, W1, ..., W7 and the initial value V of the 256-bit working state variable are used. (0) As input, execute the first SM3 working variable word iterative update instruction;
[0010] (2) Using the latest 16 message words as input, execute 6 of the SM3 message word extension instructions consecutively. The message word output by each execution result is denoted as W. 8i+8 W 8i+9 , ..., W 8i+15 Finally, the 68 message words W0, W1, ..., W3 of the SM3 cryptographic hash algorithm are obtained. 67 After the first SM3 working variable word iterative update instruction is executed for 5 cycles, the working status variable V is output. (3)Using the new message word and working status variable as input, the same method is used to continuously execute 15 SM3 working variable word iterative update instructions. The working status variable output in each execution is denoted as V. (4j-1) ;
[0011] (3) Output the final execution result as the working state variable V. (63) The final output is a 256-bit SM3 hash value y={H,G,F,E,D,C,B,A}.
[0012] The SM3 message word extension instruction adopts a simple arithmetic instruction format in register format, specifically VSM3MSWVa, Vb, Vc. It is used to instruct two operands in two 256-bit source registers Va and Vb to perform an operation, and the result is stored in the 256-bit destination register Vc. In the 32-bit instruction, bits [31:26] represent the 6-bit opcode, bits [25:21] indicate that one of the register files consisting of 32 256-bit registers is selected as the source register Va to store the source operand of the instruction, bits [20:16] indicate that one of the register files consisting of 32 256-bit registers is selected as the source register Vb to store the source operand of the instruction, bits [15:13] are always all "0", bits [12:5] represent the 8-bit function code used to determine the specific function of the instruction, and bits [4:0] indicate that one of the register files consisting of 32 256-bit registers is selected as the destination register Vc to store the result of the instruction operation.
[0013] The SM3 message word extension instruction is specifically as follows: based on the current 16 32-bit message words W 15 ~W0, generates the subsequent 8 message words W using the SM3 cryptographic hash algorithm in parallel. 23 ~W 16 Among them, W7~W0 are stored in the source register Va, W 15 ~W8 is stored in the source register Vb, and the generated result W 23 ~W 16 Stored in a destination register Vc; Result W 23 ~W 16 Each result word W in i The generation logic is: (Temp XOR (Temp<<<15) XOR (Temp<<<23)) XOR (W i-13 <<<7) XORW i-6 Where XOR represents bitwise XOR, <<< represents circular left shift, and Temp is a 32-bit intermediate variable word. The generation logic of Temp is W. i-16 XOR W i-9 XOR (W i-3<<<15); A single execution of the SM3 message word extension instruction can generate 8 message words for the SM3 cryptographic hash algorithm. The SM3 message word extension instruction is executed 7 times sequentially. Each time, the source register Va is updated with data from the source register Vb, and the source register Vb is updated with data from the generated destination register Vc, generating 68 message words W0, W1, ..., W0 for the SM3 cryptographic hash algorithm. 67 .
[0014] The SM3 working variable word iteration update instruction adopts an immediate value format floating-point compound arithmetic instruction format, specifically VSM3RVa,Vb,#c,Vd. This instruction instructs two operands in two 256-bit source registers Va and Vb, along with a 5-bit immediate value #c, to perform an operation. The result is stored in a 256-bit destination register Vd. Bits [31:26] of the 32-bit instruction represent a 6-bit opcode, and bits [25:21] represent the selection of a register file consisting of 32 256-bit registers. One of the register files is selected as the source register Va, which stores the source operands of the instruction. Bits [20:16] indicate that one of the register files consisting of 32 256-bit registers is selected as the source register Vb, which stores the source operands of the instruction. Bits [15:10] indicate the 6-bit function code used to determine the specific function of the instruction. Bits [9:5] indicate the 5-bit immediate value, indicating the number of loop iterations. Bits [4:0] indicate that one of the register files consisting of 32 256-bit registers is selected as the destination register Vd, which stores the result of the instruction.
[0015] The SM3 working variable word iterative update instruction is as follows: Based on the current eight 32-bit working variable words {H,G,F,E,D,C,B,A} and eight message words {W7~W0}, and the value of the immediate value #c, the working variable words are iteratively updated in rounds 4*#c to 4*#c+3 according to the SM3 algorithm. {W7~W0} are stored in the source register Va, the working variable words {H,G,F,E,D,C,B,A} are stored in the source register Vb, and the updated working variable words {H,G,F,E,D,C,B,A} are stored in the destination register Vd. Each round of iterative update selects different iteration constants T and iteration logic based on the immediate value #c. During the iteration process, intermediate variable words SS1, SS2, TT1, TT2, and P0 are set, each intermediate variable word being 32 bits. For the four rounds (i=0~3) of iteration from 4*#c to 4*#c+3, when #c is less than 4: the iteration constant T = 0x79cc4519, the generation logic of SS1 is: ((A<<<12) + E + (T<<<(4 * #c + i)))<<<7, the generation logic of SS2 is: SS2 = SS1 XOR (A<<<12), and the generation logic of TT1 is: TT1 = (A XOR B XOR C) + D + SS2 + (W i XOR W i+4 The generation logic of TT2 is: TT2 = (E XOR) F XOR G) + H + SS1 +W i The generation logic of P0 is: P0 = TT2 XOR (TT2<<<9) XOR (TT2<<<17), where XOR represents bitwise XOR and <<< represents circular left shift. The update process is: H is updated to G, G is updated to F<<<19, F is updated to E, E is updated to P0, D is updated to C, C is updated to B<<<9, B is updated to A, and A is updated to TT1. When #c is greater than or equal to 4: the iteration constant T = 0x7a879d8a. The generation logic of SS1 is: ((A<<<12) + E + (T<<<(4 * #c +i)))<<<7. The generation logic of SS2 is: SS2 = SS1 XOR (A<<<12). The generation logic of TT1 is: TT1=((A AND B) OR (A AND C) OR (B AND C)) + D + SS2 +(Wi XOR W i +4), the generation logic of TT2 is: TT2 = ((E AND F ) OR ( NOT(E) AND G)) + H + SS1 + W iThe generation logic of P0 is: P0 = TT2 XOR (TT2<<<9) XOR (TT2<<<17), where XOR represents bitwise XOR, <<< represents left circular shift, AND represents bitwise AND, OR represents bitwise OR, and NOT represents bitwise NOT. The update process is as follows: H is updated to G, G is updated to F<<<19, F is updated to E, E is updated to P0, D is updated to C, C is updated to B<<<9, B is updated to A, and A is updated to TT1. A single execution of the SM3 working variable word iterative update instruction can complete 4 rounds of iterative update of the SM3 working variable word. The SM3 working variable word iterative update instruction is executed 16 times in sequence. Each time, the source register Va is updated with 8 message words, and the source register Vb is updated with the data in the generated target register Vd. The 0th to 63rd rounds of iterative update of the SM3 algorithm working variable word are completed, thus obtaining the final output working variable word {H,G,F,E,D,C,B,A} of the SM3 compression function.
[0016] The technical solution adopted by this invention to solve its technical problem is as follows: An instruction set processor is provided, including a register file, an SM3 message word extended instruction execution unit, and an SM3 working variable word iterative update instruction execution unit; the register file is used to provide source operands and store execution results for the SM3 message word extended instruction execution unit and the SM3 working variable word iterative update instruction execution unit; the SM3 message word extended instruction execution unit and the SM3 working variable word iterative update instruction execution unit are placed on different execution pipelines and occupy different read and write ports of the register file respectively;
[0017] The SM3 message word extension instruction execution unit has:
[0018] Two sets of 256-bit inputs are used to receive operands A and B of the SM3 message word extension instruction;
[0019] A set of 256-bit outputs is used to output the execution result of the SM3 message word extension instruction;
[0020] The SM3 message word extension instruction execution unit uses hardware logic to implement shift operations and parallel processing of 8 message words; the SM3 message word extension instruction execution unit can execute the SM3 message word extension instructions in a pipelined manner;
[0021] The SM3 working variable word iteration update instruction execution unit has:
[0022] Two sets of 256-bit inputs are used to receive operands A and B of the SM3 working variable word iteration update instruction;
[0023] A set of 5-bit inputs is used to receive the immediate operand C of the SM3 working variable word iteration update instruction;
[0024] A set of 256-bit outputs is used to output the execution result of the VSM3R instruction. After four rounds of iterative updates to the SM3 working variable word, the new working variable word {H,G,F,E,D,C,B,A} is obtained.
[0025] The SM3 working variable word iterative update instruction execution unit executes the SM3 iterative compression function in hardware and implements shift operations and iterative constant processing in hardware logic; the SM3 working variable word iterative update instruction execution unit can execute the SM3 working variable word iterative update instruction in a pipelined manner.
[0026] The execution delay of the SM3 message word extension instruction execution unit is 1 clock cycle; the execution delay of the SM3 working variable word iterative update instruction execution unit is set with 4 levels of iterative execution stations and 1 level of output station, with a total execution delay of 5 clock cycles. The instruction set processor supports the parallel pipelined execution of the SM3 message word extension instruction and the SM3 working variable word iterative update instruction.
[0027] Beneficial effects
[0028] By adopting the above-mentioned technical solution, the present invention has the following advantages and positive effects compared with the prior art:
[0029] This invention employs the SM3 message word extension instruction (VSM3MSW) with a multi-message word parallel extension algorithm to achieve parallel extension generation of multiple message words, and the SM3 working variable word iterative update instruction (VSM3R) with a multi-round iterative fusion algorithm to update message words W'0, W'1, ..., W' 63 The extension is integrated into the execution process of the SM3 iterative compression function, thereby realizing the parallel execution of multiple rounds of iterative updates of the SM3 working variable. Using the SM3 extended instruction set of this invention to write the SM3 cryptographic hash algorithm program, the various functions of message expansion and iterative compression functions in the SM3 cryptographic hash algorithm can be completed, which significantly simplifies the software program, is beneficial to algorithm development and reduces the storage space of the algorithm.
[0030] This invention fully realizes the inherent parallel execution potential of the SM3 cryptographic hash algorithm by employing parallel pipeline and instruction parallelism techniques, and significantly improves the performance of the SM3 cryptographic hash algorithm implementation.
[0031] In this invention, the VSM3MSW instruction has an execution delay of 1 clock cycle, and the VSM3R instruction has an execution delay of 5 clock cycles. Both support pipelined execution, and the SM3 message expansion and SM3 iterative compression process in the SM3 cryptographic hash algorithm can be completed in as little as 80 clock cycles. Five sets of unrelated data are pipelined in this processor, and the generation of five hash values can be completed in as little as 85 clock cycles, which greatly accelerates the execution speed of the SM3 cryptographic hash algorithm.
[0032] This invention fully realizes the parallel potential of message expansion and multi-round iteration in the SM3 cryptographic hash algorithm, and has the advantage of good scalability, making it easy to integrate into existing execution components of general-purpose processors. It can be applied to domestically produced general-purpose processors, as well as to dedicated chips in the field of information security technology to improve the speed of executing the SM3 cryptographic hash algorithm. Attached Figure Description
[0033] Figure 1 This is a flowchart of the execution process of the SM3 extended instruction set;
[0034] Figure 2 This is a flowchart illustrating the implementation of a method to accelerate the SM3 cryptographic hash algorithm;
[0035] Figure 3 This is the flowchart of the SM3 multi-message word parallel extension algorithm;
[0036] Figure 4 This is a flowchart of the SM3 multi-round iterative fusion algorithm;
[0037] Figure 5 This is a schematic diagram of a simple arithmetic instruction format in register format;
[0038] Figure 6 This is a schematic diagram of the floating-point compound arithmetic instruction format in immediate format;
[0039] Figure 7 This is the circuit structure diagram of the VSM3MSW instruction set;
[0040] Figure 8 This is the VSM3R instruction circuit structure diagram;
[0041] Figure 9 This is an execution pipeline block diagram of a processor or processor core according to an embodiment of the present invention. Detailed Implementation
[0042] The present invention will be further illustrated below with reference to specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. Furthermore, it should be understood that after reading the teachings of this invention, those skilled in the art can make various alterations or modifications to the invention, and these equivalent forms also fall within the scope defined by the appended claims.
[0043] The SM3 cryptographic hash algorithm mainly consists of two parts: padding and iterative compression. The core of the algorithm lies in the message expansion function and the iterative compression function within the iterative compression part. The basic unit of operation in the SM3 cryptographic hash algorithm is a word, with each word being 32 bits. The iterative compression function has a 256-bit internal state, denoted as V, including eight 32-bit working variable words {H,G,F,E,D,C,B,A}. The internal state after each iteration is denoted as V0. (i) V (0) The initial 256-bit value IV is defined for the SM3 cryptographic hash algorithm. The iterative compression algorithm uses the padded 256-bit message block B. (i) As input, the message expansion function expands the message into 132 message words W0, W1, ..., W... 67 W'0, W'1, ..., W' 63 This is used in the SM3 compression function. Taking a 256-bit initial state variable and an expanded 132-word message as input, the SM3 compression function is executed. After 64 rounds of iteration, the final hash result is generated, yielding a 256-bit hash value.
[0044] The inventors of this invention discovered that in the message extension process of the SM3 cryptographic hash algorithm, W0, W1, ..., W 67 The generation of W'0, W'1, ..., W'... has the potential to generate multiple message words in parallel, allowing for the simultaneous generation of multiple message words. 63 The message words are used only during the iteration process of the compression function, and can be based on the first 68 message words W0, W1, ..., W... 67 The SM3 compression function's iteration process can be integrated into the compression function's parallel processing, employing a multi-round iteration approach. Instructions specifically designed to accelerate SM3 message expansion and SM3 compression functions can be designed to fully realize the parallel potential of the SM3 cryptographic hash algorithm's internal operations. Furthermore, SM3 message expansion and SM3 iterative compression have the potential for parallel execution. This can be achieved by setting up independent working circuits to support the simultaneous start and parallel execution of the SM3 message word expansion instruction (VSM3MSW) and the SM3 working variable word iteration update instruction (VSM3R). Using parallel pipelines and instruction-level parallel execution techniques, multiple data-independent SM3 cryptographic hash algorithms can be implemented in parallel, significantly improving the speed of SM3 cryptographic hash algorithm execution.
[0045] The embodiments of the present invention relate to a method for accelerating the SM3 cryptographic hash algorithm. This method is based on the SM3 extended instruction set, which employs a RISC architecture, such as... Figure 1As shown, the implementation includes an SM3 message word expansion instruction (VSM3MSW) to accelerate the SM3 message expansion function and an SM3 working variable word iteration update instruction (VSM3R) to accelerate the SM3 iterative compression function. All instructions use a fixed-length 32-bit format, with both source operands and results being 256 bits. Parallel pipelined and instruction-level parallelism techniques are employed to accelerate the execution of the SM3 cryptographic hash algorithm. For different data, the VSM3MSW and VSM3R instructions can be executed in parallel pipelined manner; that is, multiple sets of data without true correlation are executed in parallel at different execution stations in the pipeline, implementing multiple SM3 cryptographic hash algorithms and obtaining their respective hash values.
[0046] The generation of SM3 hash values is achieved by using parallel pipelined execution of the SM3 message word extension instruction (VSM3MSW) and the SM3 working variable word iteration update instruction (VSM3R), such as... Figure 2 As shown, the specific process includes the following steps:
[0047] 1) Fill the 16 message words W0, W1, ..., W in the message after filling. 15 As input, the first VSM3MSW instruction is executed to generate a new message word W. 16 W 17 , ..., W 23 Simultaneously, the message words W0, W1, ..., W7 and the initial value V of the 256-bit working state variable are used. (0) As input, execute the first VSM3R instruction;
[0048] 2) Following step 1), continuously execute 6 VSM3MSW instructions, using the latest 16 message words as input. Record the message word output after each execution as W. 8i+8 W 8i+9 , ..., W 8i+15 Finally, the 68 message words W0, W1, ..., W3 of the SM3 cryptographic hash algorithm are obtained. 67 The first VSM3R instruction executes for 5 cycles to output the working status variable V. (3) Then, using the new message word and working status variable as input, 15 VSM3R instructions are executed continuously in the same manner. The working status variable output from each execution is denoted as V. (4j-1) ;
[0049] 3) The working state variable V is output as the final execution result of step 2). (63) The final output is a 256-bit SM3 hash value y={H,G,F,E,D,C,B,A}.
[0050] The SM3 message word extension instruction (VSM3MSW) employs a multi-message word parallel extension algorithm, generating 8 message words (each message word being 32 bits) in a single execution. The multi-message word parallel extension algorithm is as follows: Figure 3 As shown, its function is to fill the 16 message words W0, W1, ..., W in the message. 15 As initial input, a single execution can generate 8 message words; executing this instruction 7 times sequentially can generate 68 message words W0, W1, ..., W3 for the SM3 algorithm. 67 ;
[0051] The SM3 working variable word iterative update instruction (VSM3R) employs a multi-round iterative fusion algorithm, allowing a single execution of this instruction to complete four rounds of iterative updates to the SM3 working variable word; the multi-round iterative fusion algorithm is as follows: Figure 4 As shown, its function is to transfer message words W'0, W'1, ..., W' 63 The extension is integrated into the execution process of the SM3 iterative compression function, and each time it is in the form of message word W. 16 W 17 , ..., W 67 The eight message words and the working variable words {H,G,F,E,D,C,B,A} (each working variable word is 32 bits) in the iteration are used as input to complete the four rounds of iterative updates of the SM3 working variable words. Executing this instruction 16 times in sequence can complete the 0th to 63rd rounds of iterative updates of the SM3 algorithm's working variable words, thereby obtaining the final output working variable words {H,G,F,E,D,C,B,A} of the SM3 compression function.
[0052] The SM3 Message Word Extension instruction (VSM3MSW) uses a simple arithmetic instruction format in register format. The instruction format is VSM3MSWVa, Vb, Vc, which instructs two operands in two 256-bit source registers Va and Vb to perform an operation, and the result is stored in the 256-bit destination register Vc. Figure 5 As shown, bits [31:26] of the 32-bit instruction represent the 6-bit opcode, bits [25:21] indicate that one of the 32 256-bit register files is selected as the source register Va to store the source operands of the instruction, bits [20:16] indicate that one of the 32 256-bit register files is selected as the source register Vb to store the source operands of the instruction, bits [15:13] are always all "0", bits [12:5] represent the 8-bit function code used to determine the specific function of the instruction, and bits [4:0] indicate that one of the 32 256-bit register files is selected as the destination register Vc to store the result of the instruction.
[0053] The SM3 message word extension instruction (VSM3MSW) is used to extend the message word based on the current 16 32-bit message words W. 15 ~W0, generates the subsequent 8 message words W for the SM3 algorithm. 23 ~W 16 W7 to W0 are stored in the source register Va, and W 15 ~W8 is stored in the source register Vb, and the generated result W 23 ~W 16 Stored in a destination register Vc, the operation performed by the SM3 message word extension instruction (VSM3MSW) is as follows:
[0054]
[0055] A single execution of the SM3 message word extension instruction (VSM3MSW) generates 8 message words for the SM3 algorithm. Executing this instruction 7 times sequentially, updating Va with Vb each time and updating Vb with the generated Vc, generates 68 message words W0, W1, ..., Wc for the SM3 algorithm. 67 .
[0056] The SM3 working variable word iteration update instruction (VSM3R) uses an immediate floating-point compound arithmetic instruction format. The instruction format is VSM3RVa,Vb,#c,Vd, which instructs two operands in two 256-bit source registers Va and Vb, along with a 5-bit immediate value #c, to perform an operation. The result is stored in a 256-bit destination register Vd. Figure 6 As shown, bits [31:26] of the 32-bit instruction represent the 6-bit opcode; bits [25:21] indicate that one of the 32 256-bit register files is selected as the source register Va to store the source operands of the instruction; bits [20:16] indicate that one of the 32 256-bit register files is selected as the source register Vb to store the source operands of the instruction; bits [15:10] represent the 6-bit function code used to determine the specific function of the instruction; bits [9:5] represent the 5-bit immediate value indicating the number of loop iterations; and bits [4:0] indicate that one of the 32 256-bit register files is selected as the destination register Vc to store the result of the instruction.
[0057] The function of the SM3 working variable word iteration update instruction (VSM3R) is to perform iterative updates of the working variable words from round 4*#c to 4*#c+3 according to the SM3 algorithm, based on the current eight 32-bit working variable words {H,G,F,E,D,C,B,A}, eight message words {W7~W0}, and the value of the immediate value #c (used to calculate the loop count, with a valid value range of 0 to 15). {W7~W0} are stored in register Va, the working variable words {H,G,F,E,D,C,B,A} are stored in register Vb, and the updated working variable words {H,G,F,E,D,C,B,A} are stored in register Vd. The operation performed by the SM3 working variable word iteration update instruction (VSM3R) is as follows:
[0058]
[0059]
[0060] A single execution of the SM3 working variable word iteration update instruction (VSM3R) can complete four rounds of iterative updates of the SM3 working variable word. Executing this instruction 16 times in sequence, updating Va with eight message words each time and updating Vb with the generated Vd, can complete the 0th to 63rd rounds of iterative updates of the SM3 algorithm working variable word, thus obtaining the final output working variable word {H,G,F,E,D,C,B,A} of the SM3 compression function.
[0061] Embodiments of the present invention also relate to an instruction set processor, such as... Figure 9 As shown, it includes a register file, an SM3 message word extension instruction execution unit, and an SM3 working variable word iteration update instruction execution unit; the register file is used to provide source operands and store execution results for the SM3 message word extension instruction execution unit and the SM3 working variable word iteration update instruction execution unit; the SM3 message word extension instruction execution unit and the SM3 working variable word iteration update instruction execution unit are placed on different execution pipelines and occupy different read and write ports of the register file respectively.
[0062] like Figure 9As shown, the SM3 message word extension instruction execution unit uses the VSM3MSW instruction circuit, and the SM3 working variable word iteration update instruction execution unit uses the VSM3R instruction circuit. The VSM3MSW and VSM3R instruction circuits are placed on different execution pipelines, each occupying different register file read and write ports (they could also be placed on the same execution pipeline sharing the register file read and write ports, but this would prevent the VSM3MSW and VSM3R instructions from starting or completing simultaneously). The VSM3MSW instruction circuit has an execution delay of 1 clock cycle, and the VSM3R instruction circuit has an execution delay of 5 clock cycles. The SM3 message word extension instruction execution unit and the SM3 working variable word iteration update instruction execution unit enable parallel pipelined execution of the VSM3MSW and VSM3R instructions, allowing for simultaneous start of the latter, thus achieving higher computational speed. In practice, the instruction execution delay can be adjusted according to the target operating frequency, and the pipeline stage can be redesigned to achieve higher or lower operating frequencies.
[0063] VSM3MSW instruction circuit as follows Figure 7 As shown, it has two sets of 256-bit inputs for receiving operands A and B (from the register file) of the VSM3MSW instruction, and one set of 256-bit outputs for outputting the execution result of the VSM3MSW instruction (writing back to the register file), which consists of 8 message words of the SM3 algorithm. The VSM3MSW instruction circuit can complete one VSM3MSW instruction in one execution, with an execution delay of 1 clock cycle. It uses hardware logic to directly implement shift operations and parallel processing of 8 message words to improve the execution speed of the instruction.
[0064] VSM3R instruction circuit as follows Figure 8 As shown, the circuit has two 256-bit inputs for receiving operands A and B (from the register file) of the VSM3R instruction, one 5-bit input for receiving the immediate operand C of the VSM3R instruction, and one 256-bit output for outputting the execution result of the VSM3R instruction (writing back to the register file). After four rounds of iterative updates to the SM3 working variable word, the new working variable word {H,G,F,E,D,C,B,A} is obtained. The VSM3R instruction circuit uses hardware execution compression functions, hardware logic to directly implement shift operations, and iterative constant processing to improve the instruction execution speed. The VSM3R instruction circuit is equipped with four iterative execution stations and one output station, with a total execution delay of 5 clock cycles, supporting instruction pipelined execution.
[0065] The present invention will be further illustrated below through a specific embodiment: an acceleration method for the SM3 cryptographic hash algorithm in a general-purpose processor.
[0066] Before executing the first SM3 instruction, the processor's registers are first loaded with a padded 512-bit message word block and the initial value IV of the 256-bit SM3 working status variable. The 512-bit message word block is used as 16 32-bit message words W0 to W16. 15 The 256-bit initial value IV is used as the initial value V of the compression function iteration variable. (0) This is represented by eight 32-bit state words {H,G,F,E,D,C,B,A}. The process of accelerating the SM3 cryptographic hash algorithm using the SM3 extended instruction set processor and methods is as follows:
[0067] (1) In the first clock cycle (first clock cycle), the processor that accelerates the SM3 cryptographic hash algorithm simultaneously begins to execute the first SM3 message word extension instruction (VSM3MSW) and the first SM3 working variable word iteration update instruction (VSM3R), where the input of the VSM3MSW instruction circuit is the padded message word W0~W 15 {W7~W0} serves as operand A for the VSM3MSW instruction circuit, {W 15 ~W8} is used as operand B; the input to the VSM3R instruction circuit is the padded message word W0~W7, and the initial value of the iteration variable V. (0) And a 5-bit immediate operand 5'b00000, where message words W7~W0 serve as operand A for the VSM3R instruction circuit, the initial values of the iteration variables {H,G,F,E,D,C,B,A} serve as operand B, and the immediate operand 5'b00000 serves as operand C. Before the end of this clock cycle, the VSM3MSW instruction circuit completes the generation of a set of message words, obtaining the subsequent 8 message words W of the SM3 algorithm. 23 ~W 16 The VSM3R instruction circuit completes the execution of the first VSM3R instruction at the first execution station, completing the first iteration defined by the SM3 compression function.
[0068] (2) In the second clock cycle (second clock cycle), the processor that accelerates the SM3 cryptographic hash algorithm begins to execute the second SM3 message word extension instruction (VSM3MSW) and continues to execute the second execution station of the first SM3 working variable word iteration update instruction (VSM3R). The input of the VSM3MSW instruction circuit is message word W8~W 23 , {W8~W 15} as operand A of the VSM3MSW instruction circuit, {W 16 ~W 23As operand B, the second execution station of the VSM3R instruction circuit continues to execute the first SM3 working variable word iteration update instruction (VSM3R), while the first execution station can begin iterative calculation of another set of SM3 compression functions that do not have true data correlation. Before the end of this clock cycle, the VSM3MSW instruction circuit completes the generation of a new set of message words, obtaining the subsequent 8 message words W of the SM3 algorithm. 31 ~W 24 The VSM3R instruction circuit completes the execution of the second execution station of the first VSM3R instruction, completing the second iteration defined by the SM3 compression function.
[0069] (3) In the third clock cycle (the third clock cycle), the processor that accelerates the SM3 cryptographic hash algorithm begins to execute the third SM3 message word extension instruction (VSM3MSW) and continues to execute the third execution station of the first SM3 working variable word iteration update instruction (VSM3R). The input of the VSM3MSW instruction circuit is the message word W. 16 ~W 31 , {W 16 ~W 23} as operand A of the VSM3MSW instruction circuit, {W 24 ~W 31 As operand B, the third execution station of the VSM3R instruction circuit continues to execute the first SM3 working variable word iterative update instruction (VSM3R). The first and second execution stations can execute the iterative calculation of the other two sets of SM3 compression functions that do not have true data correlation. Before the end of this clock cycle, the VSM3MSW instruction circuit completes the generation of a new set of message words, obtaining the subsequent 8 message words W of the SM3 algorithm. 39 ~W 32 The VSM3R instruction circuit completes the execution of the third execution station of the first VSM3R instruction, completing the third iteration defined by the SM3 compression function.
[0070] (4) In the 4th clock cycle (4th clock cycle), the processor that accelerates the SM3 cryptographic hash algorithm begins to execute the 4th SM3 message word extension instruction (VSM3MSW) and continues to execute the 4th execution station of the 1st SM3 working variable word iteration update instruction (VSM3R). The input of the VSM3MSW instruction circuit is the message word W. 23 ~W 39 , {W 31 ~W 23} as operand A of the VSM3MSW instruction circuit, {W 39 ~W 32As operand B, the fourth execution station of the VSM3R instruction circuit continues to execute the first SM3 working variable word iteration update instruction (VSM3R). The first, second, and third execution stations can execute the iterative calculation of the other three sets of SM3 compression functions that do not have true data correlation. Before the end of this clock cycle, the VSM3MSW instruction circuit completes the generation of a new set of message words, obtaining the subsequent 8 message words W of the SM3 algorithm. 47 ~W 40 The VSM3R instruction circuit completes the execution of the fourth execution station of the first VSM3R instruction, completing the fourth iteration defined by the SM3 compression function.
[0071] (5) In the 5th clock cycle (5th clock cycle), the processor that accelerates the SM3 cryptographic hash algorithm begins to execute the 5th SM3 message word extension instruction (VSM3MSW) and continues to execute the 5th execution station of the 1st SM3 working variable word iteration update instruction (VSM3R). The input of the VSM3MSW instruction circuit is the message word W. 31 ~W 47 , {W 39 ~W 31} as operand A of the VSM3MSW instruction circuit, {W 47 ~W 40 As operand B, the fifth execution station of the VSM3R instruction circuit continues to execute the first SM3 working variable word iteration update instruction (VSM3R). The first, second, third, and fourth execution stations can execute the iterative calculation of the other three sets of SM3 compression functions that do not have true data correlation. Before the end of this clock cycle, the VSM3MSW instruction circuit completes the generation of a new set of message words, obtaining the subsequent 8 message words W of the SM3 algorithm. 55 ~W 48 The VSM3R instruction circuit completes the execution of the 5th execution station of the 1st VSM3R instruction, and outputs the 4th iteration result defined by the SM3 compression function.
[0072] (6) In the 6th clock cycle (6th clock cycle), the processor accelerating the SM3 cryptographic hash algorithm begins to execute the 6th SM3 message word extension instruction (VSM3MSW) and begins to execute the 1st execution station of the 2nd SM3 working variable word iteration update instruction (VSM3R), where the input of the VSM3MSW instruction circuit is the message word W. 39 ~W 55 , {W 47 ~W 39} as operand A of the VSM3MSW instruction circuit, {W 55 ~W 48As operand B; the first execution station of the VSM3R instruction circuit begins executing the second SM3 working variable word iteration update instruction (VSM3R), with message words W4~W as input. 11 The execution result of the first VSM3R instruction is the iteration variable V. (3) And an immediate operand 5'b00001 with a bit width of 5 bits, where the message word W 11 ~W4 serves as operand A in the VSM3R instruction circuit, and the iteration variable V (3) {H,G,F,E,D,C,B,A} is used as operand B, and the immediate operand 5'b00001 is used as operand C. The second, third, fourth, and fifth execution stations can perform iterative calculations of four other sets of SM3 compression functions that do not have true data correlation. Before the end of this clock cycle, the VSM3MSW instruction circuit completes the generation of a new set of message words, obtaining the subsequent 8 message words W of the SM3 algorithm. 63 ~W 56 The VSM3R instruction circuit completes the execution of the first execution station of the second VSM3R instruction, completing the fifth iteration defined by the SM3 compression function.
[0073] (7) In the 7th clock cycle (7th clock cycle), the processor that accelerates the SM3 cryptographic hash algorithm begins to execute the 7th SM3 message word extension instruction (VSM3MSW) and continues to execute the 2nd execution station of the 2nd SM3 working variable word iteration update instruction (VSM3R), where the input of the VSM3MSW instruction circuit is the message word W. 39 ~W 55 , {W 47 ~W 39} as operand A of the VSM3MSW instruction circuit, {W 55 ~W 48 As operand B, the second execution station of the VSM3R instruction circuit continues to execute the second SM3 working variable word iteration update instruction (VSM3R). The first, third, fourth, and fifth execution stations can execute the iterative calculation of the other four sets of SM3 compression functions that do not have true data correlation. Before the end of this clock cycle, the VSM3MSW instruction circuit completes the generation of a new set of message words, obtaining the subsequent four message words W of the SM3 algorithm. 67 ~W 64 The VSM3R instruction circuit completes the execution of the second VSM3R instruction at the second execution station, completing the sixth iteration defined by the SM3 compression function.
[0074] (8) In the 8th clock cycle (8th clock cycle), the processor accelerating the SM3 cryptographic hash algorithm begins to execute the first SM3 message word extension instruction (VSM3MSW) of another SM3 hash function that does not have true data correlation, and continues to execute the third execution station of the second SM3 working variable word iteration update instruction (VSM3R) of the previous set of SM3 hash functions, where the input of the VSM3MSW instruction circuit is a new set of padded message words W. 15 ~W0, {W7~W0} are the operands A of the VSM3MSW instruction circuit, {W 15 ~W8} is used as operand B; the third execution station of the VSM3R instruction circuit continues to execute the second SM3 working variable word iteration update instruction (VSM3R), while the first, second, fourth, and fifth execution stations can execute the iterative calculation of the other four sets of SM3 compression functions that do not have true data correlation. Before the end of this clock cycle, the VSM3MSW instruction circuit completes the generation of a new set of message words, obtaining the subsequent 8 message words W of the SM3 algorithm. 23 ~W 16 The VSM3R instruction circuit completes the execution of the third execution station of the second VSM3R instruction, completing the seventh iteration defined by the SM3 compression function.
[0075] (9) Continue to execute 72 cycles in sequence according to the above steps (1) to (8). The process ends at the 80th clock cycle (80th clock). After message expansion and 64 rounds of compression iteration, the SM3 iterative compression function of the processor that accelerates the SM3 cryptographic hash algorithm is completed. The eight 32-bit iterative status words output by the VSM3R instruction circuit in the processor that accelerates the SM3 cryptographic hash algorithm are the hash values output by the iterative compression function of the SM3 cryptographic hash algorithm.
[0076] Considering that processors accelerating the SM3 cryptographic hash algorithm support two parallel instruction pipelines, under continuous pipelined execution, the VSM3MSW instruction circuit can implement a set of SM3 message expansions every 7 clock cycles (7 clock cycles), generating 68 schedule words for the SM3 cryptographic hash algorithm. The VSM3R instruction circuit can pipelined execution of iterative calculations of 5 sets of SM3 compression functions without true data correlation. That is, every 85 clock cycles (85 clock cycles), the VSM3R instruction circuit can execute iterative calculations of 5 sets of SM3 compression functions without true data correlation, generating 5 independent SM3 hash values. For a large number of pipelined iterative compression calculations of the SM3 cryptographic hash algorithm, if multiple instruction set execution units (ISUs) for accelerating the SM3 cryptographic hash algorithm can be set in the processor, the execution of the SM3 cryptographic hash algorithm can be further significantly accelerated.
Claims
1. A method for accelerating the SM3 cryptographic hash algorithm, characterized in that, Based on the SM3 extended instruction set, parallel pipelines and instruction-level parallelism are employed to accelerate the execution of the SM3 cryptographic hash algorithm. The SM3 extended instruction set adopts a RISC architecture, with instructions in a fixed-length 32-bit format, and both source and destination operands being 256 bits. The SM3 extended instruction set includes SM3 message word extension instructions and SM3 working variable word iteration update instructions. The SM3 message word extension instructions utilize a multi-message word parallel extension algorithm to accelerate the SM3 message extension function. This algorithm uses 16 message words from the padded message as initial input, and can generate 8 message words in a single execution. The SM3 message word extension instructions are executed sequentially. The process is repeated 7 times, generating 68 message words. The SM3 working variable word iterative update instruction adopts a multi-round iterative fusion algorithm to accelerate the SM3 iterative compression function. The multi-round iterative fusion algorithm can integrate the expanded message words obtained after processing by the SM3 message word expansion instruction into the execution process of the SM3 iterative compression function. Each time, 8 message words and the working variable words in the iteration are used as input to complete 4 rounds of iterative update of the SM3 working variable words. The SM3 working variable word iterative update instruction is executed 16 times in sequence to complete the 0th to 63rd rounds of iterative update of the SM3 algorithm working variable words, thereby obtaining the final output working variable words of the SM3 iterative compression function.
2. The method for accelerating the SM3 cryptographic hash algorithm according to claim 1, characterized in that, The method of accelerating the execution of the SM3 cryptographic hash algorithm using parallel pipeline and instruction-level parallelism specifically includes the following steps: (1) Fill the 16 message words W0, W1, ..., W in the message after filling. 15 As input, the SM3 message word extension instruction described in step 1 is executed to generate a new message word W. 16 W 17 , ..., W 23 Simultaneously, the message words W0, W1, ..., W7 and the initial value V of the 256-bit working state variable are used. (0) As input, execute the first SM3 working variable word iterative update instruction; (2) Using the latest 16 message words as input, execute 6 of the SM3 message word extension instructions consecutively. The message word output by each execution result is denoted as W. 8i+8 W 8i+9 , ..., W 8i+15 Finally, the 68 message words W0, W1, ..., W3 of the SM3 cryptographic hash algorithm are obtained. 67 After the first SM3 working variable word iterative update instruction is executed for 5 cycles, the working status variable V is output. (3) Using the new message word and working status variable as input, the same method is used to continuously execute 15 SM3 working variable word iterative update instructions. The working status variable output in each execution is denoted as V. (4j-1) ; (3) Output the final execution result as the working state variable V. (63) The final output is a 256-bit SM3 hash value y={H,G,F,E,D,C,B,A}.
3. The method for accelerating the SM3 cryptographic hash algorithm according to claim 1, characterized in that, The SM3 message word extension instruction adopts a simple arithmetic instruction format in register format, specifically VSM3MSWVa, Vb, Vc. It is used to instruct two operands in two 256-bit source registers Va and Vb to perform an operation, and the result is stored in the 256-bit destination register Vc. In the 32-bit instruction, bits [31:26] represent the 6-bit opcode, bits [25:21] indicate that one of the register files consisting of 32 256-bit registers is selected as the source register Va to store the source operand of the instruction, bits [20:16] indicate that one of the register files consisting of 32 256-bit registers is selected as the source register Vb to store the source operand of the instruction, bits [15:13] are always all "0", bits [12:5] represent the 8-bit function code used to determine the specific function of the instruction, and bits [4:0] indicate that one of the register files consisting of 32 256-bit registers is selected as the destination register Vc to store the result of the instruction operation.
4. The method for accelerating the SM3 cryptographic hash algorithm according to claim 1, characterized in that, The SM3 message word extension instruction is specifically as follows: based on the current 16 32-bit message words W 15 ~W0, generates the subsequent 8 message words W using the SM3 cryptographic hash algorithm in parallel. 23 ~W 16 Among them, W7~W0 are stored in the source register Va, W 15 ~W8 is stored in the source register Vb, and the generated result W 23 ~W 16 Stored in a destination register Vc; Result W 23 ~W 16 Each result word W in i The generation logic is: (Temp XOR (Temp<<<15) XOR (Temp<<<23)) XOR (W i-13 <<<7) XOR W i-6 Where XOR represents bitwise XOR, <<< represents circular left shift, and Temp is a 32-bit intermediate variable word. The generation logic of Temp is W. i-16 XOR W i- 9XOR (W i-3 <<<15); A single execution of the SM3 message word extension instruction can generate 8 message words for the SM3 cryptographic hash algorithm. The SM3 message word extension instruction is executed 7 times sequentially. Each time, the source register Va is updated with data from the source register Vb, and the source register Vb is updated with data from the generated destination register Vc, generating 68 message words W0, W1, ..., W0 for the SM3 cryptographic hash algorithm. 67 .
5. The method for accelerating the SM3 cryptographic hash algorithm according to claim 1, characterized in that, The SM3 working variable word iteration update instruction adopts an immediate value format floating-point compound arithmetic instruction format, specifically VSM3RVa,Vb,#c,Vd. This instruction instructs two operands in two 256-bit source registers Va and Vb, along with a 5-bit immediate value #c, to perform an operation. The result is stored in a 256-bit destination register Vd. Bits [31:26] of the 32-bit instruction represent a 6-bit opcode, and bits [25:21] represent the selection of a register file consisting of 32 256-bit registers. One of the register files is selected as the source register Va, which stores the source operands of the instruction. Bits [20:16] indicate that one of the register files consisting of 32 256-bit registers is selected as the source register Vb, which stores the source operands of the instruction. Bits [15:10] indicate the 6-bit function code used to determine the specific function of the instruction. Bits [9:5] indicate the 5-bit immediate value, indicating the number of loop iterations. Bits [4:0] indicate that one of the register files consisting of 32 256-bit registers is selected as the destination register Vd, which stores the result of the instruction.
6. The method for accelerating the SM3 cryptographic hash algorithm according to claim 1, characterized in that, The SM3 working variable word iterative update instruction is as follows: Based on the current eight 32-bit working variable words {H,G,F,E,D,C,B,A} and eight message words {W7~W0}, and the value of the immediate value #c, the working variable words are iteratively updated in rounds 4*#c to 4*#c+3 according to the SM3 algorithm. {W7~W0} are stored in the source register Va, the working variable words {H,G,F,E,D,C,B,A} are stored in the source register Vb, and the updated working variable words {H,G,F,E,D,C,B,A} are stored in the destination register Vd. Each round of iterative update selects different iteration constants T and iteration logic based on the immediate value #c. During the iteration process, intermediate variable words SS1, SS2, TT1, TT2, and P0 are set, each intermediate variable word being 32 bits. For the four rounds (i=0~3) of iteration from 4*#c to 4*#c+3, when #c is less than 4: the iteration constant T = 0x79cc4519, the generation logic of SS1 is: ((A<<<12) + E + (T<<<(4 *#c + i)))<<<7, the generation logic of SS2 is: SS2 = SS1 XOR (A<<<12), and the generation logic of TT1 is: TT1 = (AXOR B XOR C) + D + SS2 + (W i XOR W i+4 The generation logic of TT2 is: TT2 = (E XOR) F XOR G) + H+ SS1 +W i The generation logic of P0 is: P0 = TT2 XOR (TT2<<<9) XOR (TT2<<<17), where XOR represents bitwise XOR and <<< represents circular left shift. The update process is: H is updated to G, G is updated to F<<<19, F is updated to E, E is updated to P0, D is updated to C, C is updated to B<<<9, B is updated to A, and A is updated to TT1. When #c is greater than or equal to 4: the iteration constant T = 0x7a879d8a. The generation logic of SS1 is: ((A<<<12) + E + (T<<<(4 * #c + i)))<<<7. The generation logic of SS2 is: SS2 = SS1 XOR (A<<<12). The generation logic of TT1 is: TT1=((A AND B ) OR (A AND C) OR(B AND C)) + D + SS2 +(Wi XOR W i +4), the generation logic of TT2 is: TT2 = ((E AND F ) OR (NOT(E)AND G)) + H + SS1 +W i The generation logic of P0 is: P0 = TT2 XOR (TT2<<<9) XOR (TT2<<<17), where XOR represents bitwise XOR, <<< represents left circular shift, AND represents bitwise AND, OR represents bitwise OR, and NOT represents bitwise NOT. The update process is as follows: H is updated to G, G is updated to F<<<19, F is updated to E, E is updated to P0, D is updated to C, C is updated to B<<<9, B is updated to A, and A is updated to TT1. A single execution of the SM3 working variable word iterative update instruction can complete 4 rounds of iterative update of the SM3 working variable word. The SM3 working variable word iterative update instruction is executed 16 times in sequence. Each time, the source register Va is updated with 8 message words, and the source register Vb is updated with the data in the generated target register Vd. The 0th to 63rd rounds of iterative update of the SM3 algorithm working variable word are completed, thus obtaining the final output working variable word {H,G,F,E,D,C,B,A} of the SM3 compression function.
7. An instruction set processor, characterized in that, It includes a register file, an SM3 message word extended instruction execution unit, and an SM3 working variable word iterative update instruction execution unit; the register file is used to provide source operands and store execution results for the SM3 message word extended instruction execution unit and the SM3 working variable word iterative update instruction execution unit; the SM3 message word extended instruction execution unit and the SM3 working variable word iterative update instruction execution unit are placed on different execution pipelines and occupy different read and write ports of the register file respectively. The SM3 message word extension instruction execution unit has: Two sets of 256-bit inputs are used to receive operands A and B of the SM3 message word extension instruction; A set of 256-bit outputs is used to output the execution result of the SM3 message word extension instruction; The SM3 message word extension instruction execution unit uses hardware logic to implement shift operations and parallel processing of 8 message words; the SM3 message word extension instruction execution unit can execute the SM3 message word extension instructions in a pipelined manner; The SM3 working variable word iteration update instruction execution unit has: Two sets of 256-bit inputs are used to receive operands A and B of the SM3 working variable word iteration update instruction; A set of 5-bit inputs is used to receive the immediate operand C of the SM3 working variable word iteration update instruction; A set of 256-bit outputs is used to output the execution result of the VSM3R instruction. After four rounds of iterative updates to the SM3 working variable word, the new working variable word {H,G,F,E,D,C,B,A} is obtained. The SM3 working variable word iterative update instruction execution unit uses hardware to execute the SM3 iterative compression function and uses hardware logic to implement shift operations and iterative constant processing; the SM3 working variable word iterative update instruction execution unit can execute the SM3 working variable word iterative update instruction in a pipelined manner.
8. The instruction set processor according to claim 7, characterized in that, The execution delay of the SM3 message word extension instruction execution unit is 1 clock cycle; the SM3 working variable word iterative update instruction execution unit is equipped with 4 levels of iterative execution stations and 1 level of output station, with a total execution delay of 5 clock cycles; the instruction set processor supports the parallel pipelined execution of the SM3 message word extension instruction and the SM3 working variable word iterative update instruction.