A method and apparatus for implementing montgomery multiplication and a reduction method
By optimizing the calculation process of Montgomery multiplication and using the method of obtaining carry-over and high-order cumulative values through all-1 values and continuous multiplication, the problems of long calculation time and waste of resources in Montgomery multiplication are solved, and more efficient calculation is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUNMMIO SCIENCE & TECHNOLOGY (BEIJING) CO LTD
- Filing Date
- 2025-11-18
- Publication Date
- 2026-06-26
AI Technical Summary
Montgomery multiplication is time-consuming and wastes a lot of computing resources in the calculation of pasta curves. In addition, the strong dependencies between instructions lead to low computational efficiency.
The product value is obtained by arithmetic multiplication of the first and second multipliers. The M-bit all-1 value is used as the accumulated value for Montgomery reduction. The carry value is obtained by accumulating the first register. The high-order accumulated value is obtained by two consecutive arithmetic multiplications. Finally, the carry value and the high-order accumulated value are added to the second register to obtain the reduced result.
It shortens computation time, improves computation efficiency by about 10%, reduces waste of computing resources, and weakens dependencies between instructions.
Smart Images

Figure CN121523645B_ABST
Abstract
Description
Technical Field
[0001] This application relates to, but is not limited to, large number arithmetic techniques, and particularly to a method and apparatus for implementing Montgomery multiplication and a method for reduction. Background Technology
[0002] Zero-knowledge proofs were proposed by S. Goldwasser, S. Micali, and C. Rackoff in the early 1980s. As a highly secure encryption technique, zero-knowledge proofs have broad application prospects in the future of information transmission. Zero-knowledge proofs involve a large number of finite field calculations, namely addition, subtraction, and multiplication operations bounded by a very large prime number. To simplify the calculation process and eliminate the most complex division and modulo operations, related techniques typically transform the value from the finite field to the Montgomery field, focusing the corresponding addition, subtraction, multiplication, and squaring calculations within the Montgomery field, and finally transferring the result back from the Montgomery field to the corresponding finite field.
[0003] The calculation of pasta curves involves 256-bit Montgomery multiplication. However, the strong dependencies between instructions in the implementation of Montgomery multiplication lead to long computation times, wasting computational resources due to the extended waiting period for results. The pasta curve typically refers to the elliptic curve used in pasta curve cryptography. Pasta (Pairing-based Short-integer Solution) curve cryptography is a cryptographic method based on elliptic curve pairing. Summary of the Invention
[0004] This application provides a method and apparatus for implementing Montgomery multiplication and a reduction method, which can solve any of the above-mentioned technical problems.
[0005] This application provides a method for implementing Montgomery multiplication, including:
[0006] A 2M-bit product is obtained through arithmetic multiplication of the first and second multipliers; where the first and second multipliers are the two M-bit multipliers of the Montgomery multiplication; the product is stored in adjacent second and first registers, with the second register storing the high M bits of the product and the first register storing the low M bits of the product.
[0007] The M-bit all-1 value is used as the M-bit accumulated value for Montgomery reduction, and the carry value is obtained by accumulating the first register.
[0008] The high-order accumulated value required for Montgomery reduction is obtained by performing two consecutive arithmetic multiplications on the first register.
[0009] The carry value and the high-order accumulated value are added to the second register to obtain the reduced result, and the Montgomery multiplication result of the first multiplier and the second multiplier is determined based on the reduced result.
[0010] In one exemplary instance, the accumulation of the first register to obtain the carry value includes:
[0011] The carry value is obtained by accumulating the first register using a first addition operation; wherein the first addition operation is an addition without carry and the carry flag is updated after the addition calculation is completed.
[0012] In one exemplary instance, the step of accumulating the obtained carry value and the high-order accumulated value into the second register to obtain the reduced result includes:
[0013] The addition operation is performed by adding the high-order accumulated value to the carry value corresponding to the accumulated value in the second register through a second addition operation; wherein, the second addition operation is an addition with the carry value, but the carry flag is not updated after the addition calculation is completed.
[0014] In one exemplary instance, the range of the first multiplier and the second multiplier is 0 to... , The domain is finite; the first multiplier is determined based on the result of the reduction. Second multiplier The results of Montgomery multiplication include:
[0015] Determine whether the result after reduction is between 0 and... Within the range, the result after reduction is greater than The Montgomery multiplication result of the first multiplier and the second multiplier is the result after reduction minus... The result after reduction is not greater than... The Montgomery multiplication result of the first multiplier and the second multiplier is the result after reduction.
[0016] In one exemplary instance, the M bits are 256 bits.
[0017] This application also provides a reduction method, including:
[0018] The M-bit all-1 value is used as the M-bit accumulated value for Montgomery reduction, and the carry value is obtained by accumulating the first register.
[0019] The high-order accumulated value required for Montgomery reduction is obtained by performing two consecutive arithmetic multiplications on the first register.
[0020] The carry value and the high-order accumulated value are added to the second register to obtain the reduced result;
[0021] The second register and the first register are adjacent registers; the second register is used to store the high M bits of the product value, and the first register is used to store the low M bits of the product value; the product value is the arithmetic multiplication of the first multiplier and the second multiplier to obtain a 2M-bit result; the first multiplier and the second multiplier are the two M-bit multipliers of the Montgomery multiplication.
[0022] This application embodiment further provides a computer-readable storage medium storing computer-executable instructions for executing the method for implementing Montgomery multiplication as described above, or the reduction method described above.
[0023] This application embodiment also provides a computer device, including a memory and a processor, wherein the memory stores the following instructions executable by the processor: steps for performing the method for implementing Montgomery multiplication as described above, or steps for the reduction method described above.
[0024] This application also provides an apparatus for implementing Montgomery multiplication, comprising: a first multiplier, a second multiplier, a third multiplier, a first adder, a second adder, and a processing unit; wherein,
[0025] The first multiplier is used to calculate the arithmetic multiplication of the first multiplier and the second multiplier to obtain a 2M-bit product; wherein the first multiplier and the second multiplier are two M-bit multipliers of the Montgomery multiplication; the product is stored in the adjacent second register and the first register, the second register is used to store the high M bits of the product, and the first register is used to store the low M bits of the product;
[0026] The first adder is used to accumulate the first register by taking the M-bit all-1 value as the M-bit accumulated value of the Montgomery reduction to obtain the carry value.
[0027] The second and third multipliers are used to perform two consecutive arithmetic multiplications on the first register in sequence to obtain the high-order accumulated value required for Montgomery reduction;
[0028] The second adder is used to add the obtained carry value and the high-order accumulated value to the second register to obtain the reduced result;
[0029] The processing unit is used to determine the arithmetic multiplication result of the first and second multipliers based on the result after reduction.
[0030] Other features and advantages of the invention will be set forth in the following description, and will be apparent in part from the description, or may be learned by practicing the invention. The objects and other advantages of the invention may be realized and obtained by means of the structures particularly pointed out in the description and the drawings. Attached Figure Description
[0031] The accompanying drawings are used to provide a further understanding of the technical solutions of this application and constitute a part of the specification. They are used together with the embodiments of this application to explain the technical solutions of this application and do not constitute a limitation on the technical solutions of this application.
[0032] Figure 1 This is a flowchart illustrating the method for implementing Montgomery multiplication in an embodiment of this application.
[0033] Figure 2 This is a schematic diagram illustrating the principle of implementing the first multiplication in the Montgomery multiplication method in the embodiments of this application;
[0034] Figure 3 This is a schematic diagram illustrating the principle of implementing Montgomery multiplication in the embodiments of this application;
[0035] Figure 4 This is a schematic diagram illustrating the principle of implementing the two additions in Montgomery multiplication in the embodiments of this application;
[0036] Figure 5 This is a schematic diagram of the structure of the device for implementing Montgomery multiplication in the embodiments of this application. Detailed Implementation
[0037] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in detail below with reference to the accompanying drawings. It should be noted that, unless otherwise specified, the embodiments and features described in these embodiments can be arbitrarily combined with each other.
[0038] In a typical configuration of this application, the computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.
[0039] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0040] Computer-readable media include both permanent and non-permanent, removable and non-removable media, which can store information by any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include non-transitory computer-readable media, such as modulated data signals and carrier waves.
[0041] The steps illustrated in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases the steps shown or described may be performed in a different order than that presented here.
[0042] For a computing system employing a 256-bit architecture, with a series of 256-bit registers and arithmetic units, the first multiplier of the Montgomery multiplication method... Second multiplier The range is 0 to ,in, The domain is a finite field. Within a finite field, operations are modulo operations. Therefore, after performing addition, subtraction, and multiplication, modulo reduction is usually required to ensure the result is within a finite field. For Montgomery multiplication, the final result after reduction... Its value is Between, if the result is determined Greater than Therefore, the final result is the result. minus Otherwise, the final result is the result. constant.
[0043] For the 256-digit first multiplier in Montgomery multiplication Second multiplier Through 256-bit multipliers Sum of multipliers The arithmetic multiplication yields a 512-bit product R1-R0, where registers R1 and R0 are two adjacent 256-bit registers. Register R1 stores the high 256 bits of the product R1-R0, and register R0 stores the low 256 bits. Then, two consecutive arithmetic multiplications are used to obtain the accumulated value needed for Montgomery reduction. This accumulated value is then used to add the value in register R0 until it reaches 0, and the carry is added to register R1, thus obtaining the reduced result. However, such a reduction process needs to be executed step by step in sequence. The dependencies between instructions are relatively strong, the calculation time is long, the calculation efficiency is low, and the waiting time for the result also wastes computing resources.
[0044] Figure 1 This is a flowchart illustrating the method for implementing Montgomery multiplication in an embodiment of this application, as shown below. Figure 2 As shown, it may include:
[0045] Step 100: Through the first multiplier Second multiplier The arithmetic multiplication yields a 2M-bit product; where the first multiplier... Second multiplier For the two M-bit multipliers of the Montgomery multiplication, the product is stored in the adjacent second register R1 and first register R0. The second register R1 is used to store the high M bits of the product, and the first register R0 is used to store the low M bits of the product.
[0046] In one exemplary instance, M is a positive integer. For example, M could be 256 bits, etc.
[0047] In one exemplary instance, the first multiplier in Montgomery multiplication. Second multiplier The range is 0 to , It is a finite field. For example... Figure 2 In the illustrated embodiment, taking M as 256 bits as an example, the first multiplier of 256 bits is used. Second multiplier Arithmetic multiplication (such as Figure 2 As shown in gfmul), a 512-bit product value R1-R0 is obtained. The second register R1 and the first register R0 are adjacent 256-bit registers. The second register R1 is used to store the high 256 bits of the obtained product value R1-R0, and the first register R0 is used to store the low 256 bits of the obtained product value R1-R0.
[0048] Step 101: Use the M-bit all-1 value as the M-bit accumulated value of Montgomery reduction, and accumulate it in the first register R0 to obtain the carry value.
[0049] In the modulo reduction calculation of Montgomery multiplication, the first register R0 is ultimately reduced. Therefore, the inventors of this application discovered that what is truly needed is the carry generated during the accumulation of the first register R0, rather than the final accumulated value of the first register R0. Analysis revealed that the original value of the first register R0 can range from M bits all 0 to M bits all 1. If the original value of the first register R0 is M bits all 0, then in the subsequent two consecutive arithmetic multiplications, the result of multiplying the first register R0 by the constant Inv will also be all 0, and the resulting accumulated value will also be all 0. In this case, the first register R0 will be directly reduced without generating a carry. Otherwise, a corresponding accumulated value will inevitably be generated. This accumulated value, when added to the first register R0, will form all 0s in the lower bits and generate a carry. Thus, in the embodiments of this application, the first register R0 is pre-processed with addition to generate and retain the carry. In step 101, the first register R0 is directly added to all M bits of 1. This way, no carry will occur only if the original value of the first register R0 is all 0; otherwise, a carry will inevitably occur. This process yields the same result as a normal Montgomery modular multiplication, while also moving an addition at the end of the critical path forward to after the first multiplication shown in step 100 and before the two consecutive arithmetic multiplications.
[0050] In step 101, before calculating the lower-order subtraction value, the accumulated subtraction value is replaced with an integer consisting entirely of 1s to obtain the carry-over value after subtraction, such as... Figure 3 As shown in the upper part, carry represents the generated carry value, fff…ff represents all 1s, and ???…?? represents the accumulated result value. The first addition operation gfadd is the least significant bit of a series of additions, i.e., the starting addition. The first addition operation gfadd does not care about the carry flag of the previous addition instructions, but it will update the carry flag after completing the calculation. That is to say, the first addition operation gfadd is an addition without carry and updates the carry flag after completing the addition calculation. Through the embodiment of this application, one addition on the critical path is stripped and brought forward, saving 2 clock cycles occupied by one addition.
[0051] Step 102: Obtain the high-order accumulated value required for Montgomery reduction by performing two consecutive arithmetic multiplications on the first register R0.
[0052] The implementation of this step is consistent with related technologies, and will not be repeated here. It can be done as follows: Figure 3 The middle part of the diagram shows two arithmetic multiplications, where N represents the high-order cumulative value obtained after two consecutive arithmetic multiplications.
[0053] Step 103: Add the obtained carry value and high-order accumulated value to the second register R1 to obtain the reduced result.
[0054] In one exemplary instance, this step can perform an addition operation, gfade, on the high-order accumulated value and the value in the second register R1, with the carry corresponding to the accumulated value. gfade is the final operation of a series of additions. gfade adds a carry flag, i.e., the carry value obtained in step 101, but the carry flag is not updated after the addition calculation is completed. In other words, the second addition operation is an addition with the carry value from step 101, but the carry flag is not updated after the addition calculation is completed.
[0055] In this embodiment of the application, two adders are used: one to perform the first addition operation gfadd and the other to perform the second addition operation gfade, as well as a carry flag, which is the carry value generated after the first addition operation gfadd is completed. Figure 4 The working principle of two adders in the embodiments of this application is illustrated, such as... Figure 4 As shown, for example, the carry flag generated by gfadd in cycle 17 is only available in cycle 19, and in this embodiment, it is read and used by the corresponding gfade in cycle 19; similarly, the carry flag generated by gfadd in cycle 18 is only available in cycle 20, and in this embodiment, it is read and used by the corresponding gfade in cycle 20. Based on this application, on the one hand, one addition on the critical path is removed and moved forward, saving 2 cycles occupied by one addition, improving computational efficiency by approximately 10%; on the other hand, a new second addition operation gfade instruction is introduced, shortening the length of the critical path.
[0056] Step 104: Determine the first multiplier based on the result of the reduction. Second multiplier The results of Montgomery's multiplication.
[0057] In one exemplary instance, in step 104, it is determined whether the result after reduction is between 0 and... Within the range, if the result after reduction is greater than So, the first multiplier Second multiplier The result of arithmetic multiplication is the result after reduction minus If the result after reduction is not greater than So, the first multiplier Second multiplier The result of Montgomery's multiplication is the result after reduction.
[0058] This application also provides a computer-readable storage medium storing computer-executable instructions for executing the method for implementing Montgomery multiplication as described above.
[0059] This application embodiment further provides a computer device, including a memory and a processor, wherein the memory stores the following instructions executable by the processor: steps for performing the method for implementing Montgomery multiplication as described in any of the above embodiments.
[0060] The Pallas curve and the Vesta curve are collectively known as Pasta curves, named after two planets in the solar system. Both the Pallas and Vesta curves have a base field and a scalar field of 256 bits. The base field defines the points on the elliptic curve and is typically used for addition and multiplication operations on the elliptic curve. The scalar field defines the multiplication operations on the elliptic curve and usually contains integers representing multiples. The fundamental field of a Pallas curve is equal to the scalar field of a Vesta curve, meaning that the coordinate elements (usually elements in a finite field) of a point on a Pallas curve correspond to the elements in the scalar field of a point on a Vesta curve. The scalar field of a Pallas curve is equal to the base field of a Vesta curve, meaning that the elements in the scalar field of a point on a Pallas curve correspond to the elements in the fundamental field of a point on a Vesta curve. Both Pallas and Vesta curves are low-degree isogenies. Both Pallas and Vesta curves have the same 2-adicity, 32, meaning that the order of these two elliptic curves (Pallas and Vesta) is divisible by 2 to the power of 32. In elliptic curve cryptography, "2-adicity" represents the exponent of 2 in the factors of the number of points (or order) of an elliptic curve. Specifically, if the order of an elliptic curve is divisible by a positive integer power of 2, then the elliptic curve is said to have the corresponding 2-adicity.
[0061] When performing Pasta curve calculations, 256-bit Montgomery multiplication is introduced. In the process of implementing Montgomery multiplication using instructions, the method for implementing Montgomery multiplication provided in the embodiments of this application can reduce the dependencies between instructions, shorten the calculation time, and avoid the problem of wasting computing resources while waiting for results.
[0062] This application also provides a reduction method, including:
[0063] The M-bit all-1 value is used as the M-bit accumulated value for Montgomery reduction, and the carry value is obtained by accumulating the first register.
[0064] The high-order accumulated value required for Montgomery reduction is obtained by performing two consecutive arithmetic multiplications on the first register.
[0065] The carry value and the high-order accumulated value are added to the second register to obtain the reduced result;
[0066] The second register and the first register are adjacent registers; the second register is used to store the high M bits of the product value, and the first register is used to store the low M bits of the product value; the product value is the arithmetic multiplication of the first multiplier and the second multiplier to obtain a 2M-bit result; the first multiplier and the second multiplier are the two M-bit multipliers of the Montgomery multiplication.
[0067] This application also provides a computer-readable storage medium storing computer-executable instructions for performing any of the reduction methods described above.
[0068] This application provides another computer device, including a memory and a processor, wherein the memory stores the following instructions executable by the processor: steps for performing any of the reduction methods described above.
[0069] Figure 5 This is a schematic diagram of the structural composition of the device for implementing Montgomery multiplication in the embodiments of this application, as shown below. Figure 5 As shown, it includes a first multiplier, a second multiplier, a third multiplier, a first adder, a second adder, and a processing unit; wherein,
[0070] The first multiplier is used to calculate the first multiplier. Second multiplier Arithmetic multiplication is used to obtain a 2M-bit product; where the first multiplier... Second multiplier For the two M-bit multipliers of the Montgomery multiplication; the product is stored in the adjacent second register R1 and first register R0, where the second register R1 stores the high M bits of the product and the first register R0 stores the low M bits of the product;
[0071] The first adder is used to accumulate the first register R0 by taking the M-bit all-1 value as the M-bit accumulated value of the Montgomery reduction to obtain the carry value;
[0072] The second and third multipliers are used to perform two consecutive arithmetic multiplications on the first register R0 in sequence to obtain the high-order accumulated value required for Montgomery reduction;
[0073] The second adder is used to add the obtained carry value and the high-order accumulated value to the second register R1 to obtain the reduced result;
[0074] Processing unit, used to determine the first multiplier based on the result of reduction. Second multiplier The results of Montgomery's multiplication.
[0075] In one exemplary instance, M is a positive integer. For example, M could be 256 bits, etc.
[0076] Although the embodiments disclosed in this application are as described above, the content described is merely for the purpose of understanding this application and is not intended to limit this application. Any person skilled in the art to which this application pertains may make any modifications and changes in the form and details of the implementation without departing from the spirit and scope disclosed in this application; however, the scope of patent protection of this application shall still be determined by the scope defined in the appended claims.
Claims
1. A method for implementing Montgomery multiplication, comprising: A 2M-bit product is obtained through arithmetic multiplication of the first and second multipliers; where the first and second multipliers are the two M-bit multipliers of the Montgomery multiplication; the product is stored in adjacent second and first registers, with the second register storing the high M bits of the product and the first register storing the low M bits of the product. The M-bit all-1 value is used as the M-bit accumulated value for Montgomery reduction, and the carry value is obtained by accumulating the first register. The high-order accumulated value required for Montgomery reduction is obtained by performing two consecutive arithmetic multiplications on the first register. The carry value and the high-order accumulated value are added to the second register to obtain the reduced result, and the Montgomery multiplication result of the first multiplier and the second multiplier is determined based on the reduced result.
2. The method according to claim 1, wherein, The step of accumulating the first register to obtain the carry value includes: The carry value is obtained by accumulating the first register using a first addition operation; wherein the first addition operation is an addition without carry and the carry flag is updated after the addition calculation is completed.
3. The method according to claim 1, wherein, The step of adding the obtained carry value and the high-order accumulated value to the second register to obtain the reduced result includes: The addition operation is performed by adding the high-order accumulated value to the carry value corresponding to the accumulated value in the second register through a second addition operation; wherein, the second addition operation is an addition with the carry value, but the carry flag is not updated after the addition calculation is completed.
4. The method according to claim 1, wherein, The range of the first multiplier and the second multiplier is 0 to , The domain is finite; the determination of the Montgomery multiplication result of the first and second multipliers based on the result after reduction includes: Determine whether the result after reduction is between 0 and... Within the range, the result after reduction is greater than The Montgomery multiplication result of the first multiplier and the second multiplier is the result after reduction minus... The result after reduction is not greater than... The Montgomery multiplication result of the first multiplier and the second multiplier is the result after reduction.
5. The method according to any one of claims 1 to 4, wherein, The M-bit is 256 bits.
6. A reduction method, comprising: The M-bit all-1 value is used as the M-bit accumulated value for Montgomery reduction, and the carry value is obtained by accumulating the first register. The high-order accumulated value required for Montgomery reduction is obtained by performing two consecutive arithmetic multiplications on the first register. The carry value and the high-order accumulated value are added to the second register to obtain the reduced result; The second register and the first register are adjacent registers; the second register is used to store the high M bits of the product value, and the first register is used to store the low M bits of the product value; the product value is the arithmetic multiplication of the first multiplier and the second multiplier to obtain a 2M-bit result; the first multiplier and the second multiplier are the two M-bit multipliers of the Montgomery multiplication.
7. A computer-readable storage medium storing computer-executable instructions for performing the method of implementing Montgomery multiplication as described in any one of claims 1 to 5, or the reduction method as described in claim 6.
8. A computer device comprising a memory and a processor, wherein, The memory stores the following instructions that can be executed by a processor: steps for performing the method of implementing Montgomery multiplication as described in any one of claims 1 to 5, or steps for the reduction method as described in claim 6.
9. An apparatus for implementing Montgomery multiplication, comprising: It includes a first multiplier, a second multiplier, a third multiplier, a first adder, a second adder, and a processing unit; wherein, The first multiplier is used to calculate the arithmetic multiplication of the first multiplier and the second multiplier to obtain a 2M-bit product; wherein the first multiplier and the second multiplier are two M-bit multipliers of the Montgomery multiplication; the product is stored in the adjacent second register and the first register, the second register is used to store the high M bits of the product, and the first register is used to store the low M bits of the product; The first adder is used to accumulate the first register by taking the M-bit all-1 value as the M-bit accumulated value of the Montgomery reduction to obtain the carry value. The second and third multipliers are used to perform two consecutive arithmetic multiplications on the first register in sequence to obtain the high-order accumulated value required for Montgomery reduction; The second adder is used to add the obtained carry value and the high-order accumulated value to the second register to obtain the reduced result; The processing unit is used to determine the Montgomery multiplication result of the first and second multipliers based on the result after reduction.
10. The apparatus according to claim 9, wherein, The M-bit is 256 bits.