Method for generating bytecode, method for generating machine code, and device
By replacing integer arithmetic statements with preset functions to generate bytecode and machine code in the virtual machine environment, the problem of poor performance in integer arithmetic overflow checking in the prior art is solved, and more efficient overflow checking and computational performance are achieved.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- ANT BLOCKCHAIN TECHNOLOGY (SHANGHAI) CO LTD
- Filing Date
- 2025-10-30
- Publication Date
- 2026-07-02
AI Technical Summary
Existing integer arithmetic overflow detection methods have poor computational performance in virtual machine environments. When checking for overflows using multi-bytecode instructions, they generate a large number of machine instructions, or when checking for overflows using the host API, they cause frequent switching and additional overhead.
The statements used for integer operations in the target source code are replaced with preset target functions to generate target bytecode containing bytecode instructions. The bytecode instructions indicate that integer operations and overflow checks are performed through a sequence of machine instructions. The generated machine code contains the corresponding sequence of machine instructions.
The number of instructions used for integer overflow checks in the bytecode has been reduced, avoiding additional function calls and memory accesses, making full use of processor features, and improving computing performance.
Smart Images

Figure CN2025131162_02072026_PF_FP_ABST
Abstract
Description
A method for generating bytecode, a method for generating machine code, and a device for doing so.
[0001] This application claims priority to Chinese Patent Application No. 202411970936.7, filed on December 27, 2024, entitled "A method for generating bytecode, a method for generating machine code and an apparatus", the entire contents of which are incorporated herein by reference. Technical Field
[0002] The embodiments in this specification belong to the field of blockchain and virtual machine technology, and particularly relate to a method for generating bytecode, a method for generating machine code, and a device for such generation. Background Technology
[0003] In scenarios where applications or programs run on virtual machines, such as smart contracts, overflow checks are frequently used to ensure the safety of integer computations. Existing methods for overflow checking of integer operations can either use multiple bytecode instructions to form the checking logic or call the API (Application Programming Interface) of the virtual machine's host runtime environment. However, the method of checking overflows using multiple bytecode instructions often results in an excessive number of machine instructions being compiled into a large number of machine instructions during actual execution, leading to poor computational performance. The method of performing overflow checks through the host API, on the other hand, leads to frequent switching between the virtual machine and the host runtime environment, incurring additional overhead from function calls and memory operations, resulting in significant computational costs and poor performance. Summary of the Invention
[0004] The purpose of this invention is to provide a method for generating bytecode and a method for generating machine code, which can perform overflow checks on integer operations with lower computational cost and higher efficiency in virtual machine-based application or program execution scenarios.
[0005] To achieve the above objectives, this specification provides a method for generating bytecode in a first aspect, comprising: obtaining target source code; applying a first processing to the target source code, the first processing comprising: replacing a target statement used for integer arithmetic in the target source code with a preset target function; generating target bytecode based on the first-processed target source code, the target bytecode including bytecode instructions generated according to the preset function, the bytecode instructions instructing the execution of integer arithmetic corresponding to the target statement through a preset machine instruction sequence, and performing overflow checks for the integer arithmetic.
[0006] A second aspect of this specification provides a method for generating machine code, comprising: obtaining target bytecode obtained by the method described above; and generating target machine code based on the target bytecode and a target processor type, wherein the target machine code includes a sequence of machine instructions determined according to the bytecode instructions described above.
[0007] A third aspect of this specification provides a computing device, including: a processor; and a memory storing a program, wherein when the processor executes the program, the following operations are performed: acquiring target source code; applying a first processing to the target source code, the first processing including: replacing target statements in the target source code used for integer arithmetic with a preset target function; generating target bytecode based on the first-processed target source code, the target bytecode including bytecode instructions generated according to the preset function, the bytecode instructions instructing the execution of integer arithmetic corresponding to the target statements through a preset machine instruction sequence, and overflow checking for the integer arithmetic.
[0008] In the bytecode generation scheme and machine code generation scheme provided in the embodiments of this specification, target source code can be obtained, and a first processing can be applied to the target source code, including replacing the target statements used for integer arithmetic in the target source code with a preset target function. Then, target bytecode can be generated based on the first-processed target source code. The target bytecode includes bytecode instructions generated according to the preset function, which instruct the execution of integer arithmetic corresponding to the target statements through a preset machine instruction sequence, and to perform overflow checks for integer arithmetic. After obtaining the target bytecode, target machine code can be generated based on the target bytecode and the target processor type. The target machine code includes a machine instruction sequence determined according to the aforementioned bytecode instructions. This method enables overflow checks for integer operations with lower computational cost and higher efficiency in virtual machine-based application or program execution scenarios. Attached Figure Description
[0009] To more clearly illustrate the technical solutions of the embodiments in this specification, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in this specification. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0010] Figure 1 is a schematic diagram of an integer arithmetic overflow check scheme in a virtual machine environment;
[0011] Figure 2 is a schematic diagram of another scheme for integer arithmetic overflow checking in a virtual machine environment;
[0012] Figure 3 is a schematic diagram of a bytecode generation method in one embodiment of this specification;
[0013] Figure 4 is a flowchart of a bytecode generation method according to an embodiment of this specification;
[0014] Figure 5 is a flowchart of a machine code generation method in one embodiment of this specification;
[0015] Figure 6 is an architectural diagram of a bytecode generation apparatus according to an embodiment of this specification;
[0016] Figure 7 is an architectural diagram of a machine code generation apparatus according to one embodiment of this specification. Detailed Implementation
[0017] To enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this specification, and not all embodiments. Based on the embodiments in this specification, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this specification.
[0018] As mentioned earlier, in scenarios where applications or programs run on virtual machines, such as smart contract execution, integer calculations with overflow checks are frequently used to ensure the safety of integer computations. Overflow checking is used to detect and handle overflow errors that may occur during computation. Overflow errors occur when the result of a calculation exceeds the range that the data type can represent. Specifically, an integer overflow error occurs when the result of an integer operation exceeds the maximum value of a specific integer type or is less than its minimum value. Existing schemes for overflow checking of integer operations fall into two categories: one is to use bytecode instructions to compose the checking logic to check for overflow; this can be called the bytecode instruction overflow checking scheme. The other is to perform overflow checks by calling the host API (Application Programming Interface) of the virtual machine runtime environment; this can be called the host environment API overflow checking scheme. However, the bytecode instruction overflow checking scheme often requires a large number of bytecode instructions to check for overflow in actual production environments. Furthermore, due to the differences in data processing architecture between the virtual machine on which bytecode depends and the processor on which machine code depends, these bytecode instructions are often compiled into an excessive number of machine instructions before being executed during actual application runtime, resulting in poor computational performance, especially in smart contract execution scenarios that heavily utilize integer calculations.
[0019] Figure 1 is a schematic diagram of an integer overflow check scheme in a virtual machine environment. As shown in Figure 1, the overflow logic composed of bytecode instructions can contain multiple bytecode instructions, such as bytecode instruction 1, bytecode instruction 2, bytecode instruction 3, etc. These bytecode instructions typically perform overflow checks on integer operations based on the virtual machine's data processing mode. During actual application runtime, these bytecode instructions are often compiled into machine instructions and then executed. However, since bytecode instructions typically do not have a one-to-one correspondence with machine code instructions on a specific processor, and the virtual machine on which the bytecode depends often differs from the processor on which the machine code depends in terms of data processing architecture, the compilation of bytecode instructions into machine code instructions is usually not a one-to-one correspondence. Each bytecode instruction can be actually compiled into one or more machine code instructions, resulting in the generation and execution of a large number of machine instructions during actual application runtime, leading to poor computational performance.
[0020] Figure 2 illustrates another scheme for integer overflow checking in a virtual machine environment. As shown in Figure 2, if the virtual machine's host runtime environment has pre-configured an API for overflow checking of integer operations, overflow checks can be performed on integer operations within the virtual machine by calling the host runtime environment's API. However, the problem with this scheme is that it leads to frequent switching between the bytecode runtime environment and the host runtime environment, resulting in additional overhead from function calls and memory operations, a high computational cost, and poor computational performance.
[0021] To address the aforementioned technical problems, this specification proposes a bytecode generation method and a corresponding machine code generation method. Figure 3 is a schematic diagram of a bytecode generation method according to an embodiment of this specification. As shown in Figure 3, application source code containing integer operations written in a high-level programming language can be compiled into bytecode, for example. Specific bytecode instructions in the bytecode represent overflow checks for integer operations. These bytecode instructions themselves do not execute virtual machine-based overflow checking logic; instead, they instruct overflow checks for integer operations to be performed using a preset sequence of machine instructions. Subsequently, during actual application runtime, machine code can be generated and executed based on this bytecode. This machine code includes the sequence of machine instructions executed by the bytecode instructions, and overflow checks for integer operations are performed using this sequence of machine instructions.
[0022] The advantages of this method are as follows: First, the generated bytecode does not include a large number of bytecode instructions that actually execute the overflow checking logic; instead, it uses only a specific placeholder instruction to indicate overflow checking for integer operations. While ensuring integer overflow checking during actual program execution, it significantly reduces the number of instructions used for integer overflow checking in the bytecode, especially in scenarios involving high-frequency integer operations. Second, overflow checking for integer operations can be performed directly through a sequence of machine instructions. Compared to methods that call the host environment API to check for overflow, this avoids additional function calls and memory accesses, resulting in better computational performance. Third, compared to methods that check overflow through bytecode instructions, directly performing overflow checking for integer operations through a sequence of machine instructions can also fully utilize the characteristics of different processors. For example, in some embodiments, overflow checking instructions specific to a particular processor can be used to perform overflow checks, further improving computational performance.
[0023] The following further describes a bytecode generation method provided by an embodiment of this specification. Figure 4 is a flowchart of a bytecode generation method according to an embodiment of this specification. As shown in Figure 4, the method includes at least the following steps:
[0024] Step S401: Obtain the target source code and apply a first processing to the target source code. The first processing includes: replacing the target statement used for integer arithmetic in the target source code with a preset target function.
[0025] Step S403: Generate target bytecode based on the first processed target source code. The target bytecode includes bytecode instructions generated according to the preset function. The bytecode instructions instruct the execution of integer operations corresponding to the target statement through a preset machine instruction sequence, and to perform overflow checks for the integer operations.
[0026] First, in step S401, the target source code is obtained, and a first processing step is applied to the target source code. This first processing step may involve replacing the target statements used for integer arithmetic in the target source code with a preset target function. The target function instructs the execution of integer arithmetic operations on the target statements and performs overflow checks on the integer arithmetic.
[0027] In different implementations, the target source code can be source code written in different specific high-level programming languages. In one implementation, it could be, for example, one of C++, C# (C Sharp), Python, Java, etc. A target statement is a statement in the target source code used for integer operations. In different implementations, the target statements used for integer operations may be different, and the target functions used to replace the target statements may also be different. In different embodiments, the types of integer operations in the target statements (e.g., multiplication or addition; integer operations in division and addition are essentially multiplication and addition) may be different. Therefore, in different implementations, the target statement can be replaced with a target function corresponding to the type of integer operation in the target statement. In one embodiment, the function may also include parameters used for integer operations in the target statement.
[0028] In one example, the target statement could be a C++ statement such as "int result = a + b;". The target function used to replace the target statement could be, for example, "checked_add(a, b, &result)", where the function instructs the execution of the integer addition operation of the above target statement, and performs an overflow check on it.
[0029] Then, in step S403, target bytecode can be generated based on the first processed target source code. The target bytecode may include bytecode instructions generated according to a preset function, which may instruct the execution of integer operations corresponding to the target statement through a preset machine instruction sequence, and to perform overflow checks for integer operations.
[0030] Bytecode is a low-level, hardware- and operating system-independent representation of code generated by a compiler. It consists of a series of bytecode instructions that are not executed directly on the target machine's hardware but require interpretation or further compilation within a specific virtual machine (VM) environment before execution. In different implementations, the target bytecode can be of different specific types, and this specification does not limit this. In one implementation, it could be, for example, Wasm (WebAssembly) bytecode.
[0031] In different implementations, the representation of the bytecode instructions generated according to the preset function can vary. In one implementation, the bytecode instruction can point to an external call instruction (call) of a preset machine instruction sequence. In different implementations, the type of integer operation in the target statement can be different, and the type of the parameters involved in the integer operation (e.g., signed or unsigned 32-bit integers, signed or unsigned 64-bit integers, etc.) can also be different. Therefore, in different implementations, a machine instruction sequence corresponding to the type of integer operation and the type of the parameters involved in the integer operation can be indicated to perform the integer operation corresponding to the target statement and to perform overflow checks for the integer operation. In one embodiment, the bytecode instruction can be specifically represented as: "(call $checked_i32_add(local.get 0)(local.get 1))", where call represents an external call instruction, and $checked_i32_add represents a pointer to a preset machine instruction sequence that performs signed 32-bit integer addition and overflow checks.
[0032] Machine code is binary code that a computer can directly execute; it serves as the low-level interaction language between computer hardware and software. Typically, different processor architectures (such as x86, ARM, etc.) have different machine instruction sets, and therefore different machine codes. Even different generations of processors with the same architecture often have different machine instruction sets, and their corresponding machine codes can also differ. In different implementations, the default processor type can vary. In one embodiment, it can be one or more of x86, x64, ARM32, and ARM64. Furthermore, integers typically have multiple subtypes, such as signed 8-bit integers (i8), signed 16-bit integers (i16), signed 32-bit integers (i32), signed 64-bit integers (i64), signed 128-bit integers (i128), and unsigned 8-bit (u8), unsigned 16-bit (u16), unsigned 32-bit (u32), unsigned 64-bit (u64), and unsigned 128-bit (u128), etc. Here, a signed pointer can represent positive numbers, negative numbers, and zero. To facilitate overflow checking for integer operations of different types and parameter types on different processors, different machine instruction sequences can be pre-defined for different processor types, integer operation types, and parameter types. Therefore, in one embodiment, the pre-defined machine instruction sequence may include a sequence of machine instructions for performing integer operations on the target statement's corresponding integer operation type and parameter type on a pre-defined processor type, and performing overflow checks on the integer operations.
[0033] In some scenarios, the machine instruction set of the processor type (e.g., x64) upon which the machine code is based includes specific machine instructions that perform addition operations of various parameter types and perform overflow checks on them. To perform overflow checks for integer operations more efficiently, when the machine code is used to perform a specific type of integer addition operation on that processor type, integer operations and overflow checks can be performed using specific machine instructions. Therefore, in one implementation, the target statement can be a statement for integer addition. The processor type can include a first processor type (e.g., an x64 processor), and the machine instruction set corresponding to the first processor type includes a first instruction for performing integer addition and applying overflow checks to the integer addition. The machine instruction sequence includes a first machine instruction sequence for execution on the first processor type, and the first machine instruction sequence contains the first instruction. In different specific embodiments, the specific machine instructions included in the first machine instruction sequence can be different. In a specific example, the first machine instruction sequence can be, for example:
[0034] "addl%edi,%esi / / Addition command with overflow check"
[0035] jo$overflowLabel”
[0036] Here, addl represents the machine instruction to perform addition on %edi and %esi and to perform overflow check on the addition operation, and jo$overflowLabel means that if the addition operation causes an overflow, the process will jump to the location of $overflowLabel for processing.
[0037] In some scenarios, the machine instruction set of the processor type (e.g., ARM64) upon which the machine code is based does not include specific machine instructions capable of performing integer addition operations with specific parameter types (e.g., signed or unsigned 8-bit or 16-bit integers (i8 / i16 / u8 / u16)) and performing overflow checks on the operations. In such scenarios, corresponding machine instruction sequences can be set for integer addition operations with these parameter types. Specifically, in one implementation, the target statement can be a statement for integer addition of a predetermined first number of bits, where the first number of bits is 8 or 16 bits. The processor type may include a second processor type, and the machine instruction sequence includes a second machine instruction sequence for execution on the second processor type. The second machine instruction sequence can be used to execute the following process: extending the first parameter of the first digit to 32 bits, extending the second parameter of the first digit to 32 bits, applying a summation operation to the extended first parameter and the extended second parameter to obtain a 32-bit summation result; applying an extension operation to the first bit of the lower-order bits of the summation result to obtain a 32-bit extended result; determining whether the sum between the first parameter and the second parameter overflows based on whether the summation result and the extended result are the same. In different specific embodiments, the specific machine instructions included in the first machine instruction sequence can be different. In a specific example, the parameter type for integer addition is a signed 8-bit integer or a signed 16-bit integer (i8 / i16), and the first machine instruction sequence can be, for example:
[0038] “sxtb w8,w0
[0039] add w8,w8,w1,sxtb
[0040] sxtb w9,w8
[0041] cmp w9,w8
[0042] b.ne $overflowLabel”
[0043] The code snippet `sxtb w8,w0` extends the signed 8-bit / 16-bit integer `w0` to 32 bits. `add w8,w8,w1,sxtb` extends the signed 8-bit / 16-bit integer `w1` to 32 bits, performs a summation operation on the extended `w0` and `w1`, and stores the 32-bit sum in `w8`. `sxtb w9,w8` extends the lower 8 bits / lower 16 bits of `w8` to 32 bits and stores the result in `w9` (where all bits except the lower 8 / lower 16 bits are 0). `cmp w9,w8 b.ne $overflowLabel` determines whether the sum of `w0` and `w1` overflows based on whether `w8` and `w9` are the same.
[0044] In another specific example, the parameter type for integer addition is a signed 8-bit integer or a signed 16-bit integer (u8 / u16), and the first machine instruction sequence could be, for example:
[0045] "and w8,w0,#0xffff"
[0046] Add w8,w8,w1,uxth
[0047] and w9,w8,#0xffff
[0048] cmp w9,w8
[0049] b.ne $overflowLabel”
[0050] The code snippet `and w8,w0,#0xffff` extends the unsigned 8-bit / 16-bit integer `w0` to 32 bits. `Add w8,w8,w1,uxth` extends the signed 8-bit / 16-bit integer `w1` to 32 bits, performs a summation operation on the extended `w0` and `w1`, and stores the 32-bit sum in `w8`. `sxtb w9,w8` extends the lower 8 bits / lower 16 bits of `w8` to 32 bits and stores the result in `w9` (where all bits except the lower 8 / lower 16 bits are 0). `cmp w9,w8 b.ne$overflowLabel` determines whether the sum of `w0` and `w1` overflows based on whether `w8` and `w9` are the same.
[0051] In some scenarios, the machine instruction set of the processor type (e.g., ARM64) upon which the machine code is based only includes machine instructions that perform integer addition of a specific parameter type (e.g., i32 / i64 / u32 / u64) and perform overflow checks on the operation. When the machine code is used to perform integer addition of a specific parameter type on this processor type, integer arithmetic and overflow checks can be performed through specific machine instructions. Therefore, in one embodiment, the target statement can be a statement for integer addition of a predetermined second number of bits, where the second number of bits is 32 bits or 64 bits. The processor type includes a second processor type, and the machine instruction set corresponding to the second processor type includes a second instruction for performing integer addition of a second number of bits and applying overflow checks to the integer addition of the second number of bits. The machine instruction sequence includes a third machine instruction sequence for execution on the second processor type, wherein the third machine instruction sequence contains the second instruction. In different specific embodiments, the specific machine instructions included in the first machine instruction sequence may be different.
[0052] In a specific example, the third machine instruction sequence could be:
[0053] "adds w8,w0,w1 / / Addition instruction with overflow check"
[0054] b.vs $overflowLabel”
[0055] Here, "adds" represents the machine instruction that performs a 32 / 64-bit addition operation on w0 and w1 and checks for overflow during the addition operation. "b.vs $overflowLabel" means that if the addition operation causes an overflow, the process will jump to the location of $overflowLabel for handling.
[0056] In some scenarios, the machine instruction set of the processor type (e.g., ARM64) upon which the machine code is based may not include specific machine instructions capable of performing integer addition operations with specific parameter types (e.g., signed or unsigned 128-bit integers (i128 / u128)) and performing overflow checks. However, an efficient sequence of machine instructions can be obtained by performing integer addition operations with other parameter types (e.g., signed or unsigned 64-bit integers (i64 / u64)) and performing overflow checks. Therefore, in one implementation, the target statement may be a statement for 128-bit integer addition. The processor type may include a second processor type, and the sequence of machine instructions includes a fourth sequence of machine instructions for execution on the second processor type. The fourth machine instruction sequence can be used to perform the following process: determining a carry flag by summing the lower 64 bits of the first 128-bit first parameter and the lower 64 bits of the second 128-bit second parameter; determining an overflow flag by summing the higher 64 bits of the first parameter, the higher 64 bits of the second parameter, and the carry flag; and determining whether the sum between the first and second parameters overflows based on the overflow flag. In different specific embodiments, the specific machine instructions included in the fourth machine instruction sequence may differ.
[0057] In a specific example, the fourth machine instruction sequence could be:
[0058] "adds x8,x0,x2"
[0059] adcs x9,x1,x3
[0060] b.vs $overflowLabel”
[0061] Here, `adds x8,x0,x2` represents the summation operation on the lower 64 bits of the first parameter and the lower 64 bits of the second parameter, which also determines the carry flag. `adcs x9,x1,x3` represents the summation operation on the higher 64 bits of the first parameter, the higher 64 bits of the second parameter, and the carry flag, which also determines the overflow flag. `b.vs$overflowLabel` indicates that if the overflow flag indicates an overflow, an overflow error is determined between the first and second parameters.
[0062] In addition to integer addition, integer arithmetic also includes integer multiplication. In different implementations, different machine instruction sequences can be preset for integer multiplication based on processor type and different parameter types. In one implementation, the target statement can be a statement for integer multiplication. The processor type can include a first processor type (e.g., x64), and the machine instruction set corresponding to the first processor type includes a third instruction for performing integer multiplication and applying overflow checks to the integer multiplication. The machine instruction sequence includes a fifth machine instruction sequence for execution on the first processor type, and the fifth machine instruction sequence contains the third instruction. In different specific embodiments, the specific machine instructions included in the fifth machine instruction sequence can be different.
[0063] In a specific example, the fifth machine instruction sequence could be:
[0064] “imul%edi,%esi;
[0065] jo$overflowLabel”
[0066] Here, imul represents the machine instruction to perform a multiplication operation on signed %edi and %esi and to perform an overflow check on the multiplication operation, and jo$overflowLabel means that if the multiplication operation causes an overflow, the process will jump to the location of $overflowLabel for handling.
[0067] In another implementation, the target statement can be a statement for integer multiplication of a predetermined signed first bit, where the first bit is 8 or 16 bits. The processor type can include a second processor type (e.g., ARM64), and the machine instruction sequence includes a sixth machine instruction sequence for execution on the second processor type. The sixth machine instruction sequence can be used to perform the following process: extending a first parameter of the signed first bit to 32 bits, extending a second parameter of the signed first bit to 32 bits, performing a product operation on the extended first and second parameters to obtain a 32-bit product result; performing an extension operation on the lower-order first bits of the product result to obtain a 32-bit extended result; and determining whether the product between the first and second parameters overflows based on whether the product result and the extended result are the same. In different specific embodiments, the specific machine instructions included in the sixth machine instruction sequence can be different.
[0068] In a specific example, the sixth machine instruction sequence could be:
[0069] “sxth w8,w1
[0070] sxth w9,w0
[0071] mul w8,w9,w8
[0072] sxth w9,w8
[0073] cmp w9,w8
[0074] b.ne $overflowLabel”
[0075] Here, "sxth w8,w1" means expanding the signed w1 to 32 bits and storing it in w8. "sxth w9,w0" means expanding the signed w0 to 32 bits and storing the result in w9. "mul w8,w9,w8" means performing a product operation on w9 and w8 and storing the result in w8. "sxth w9,w8 cmp w9,w8" means performing an expansion operation on the lower 8 / 16 bits of w8 (determined by the number of bits in w0 and w1), obtaining a 32-bit expanded result and storing it in w9, and determining whether w9 and w8 are the same. "b.ne $overflowLabel" means that if w9 and w8 are not the same, jump to the location of $overflowLabel for overflow handling.
[0076] In another embodiment, the target statement can be a statement for integer multiplication of a predetermined unsigned first bit, where the first bit is 8 bits or 16 bits. The processor type can include a second processor type (e.g., ARM64), and the machine instruction sequence includes a seventh machine instruction sequence for execution on the second processor type. The seventh machine instruction sequence performs the following process: extending a first parameter of the unsigned first bit to 32 bits, extending a second parameter of the unsigned first bit to 32 bits, performing a product operation on the extended first and second parameters to obtain a 32-bit product result; and determining whether the product between the first and second parameters overflows based on whether the high 16 bits of the product result are all 0. In different specific embodiments, the specific machine instructions included in the seventh machine instruction sequence may differ.
[0077] In a specific example, the seventh machine instruction sequence could be:
[0078] "and w8,w1,#0xffff"
[0079] and w9,w0,#0xffff
[0080] mul w8,w9,w8
[0081] tst w8,#0xffff0000
[0082] b.ne $overflowLabel”
[0083] The code snippet shows how to perform a multiplication operation on the extended w1 and w9, w0, #0xffff. Specifically, "and w8, w1, #0xffff" extends the unsigned w0 to 32 bits, "and w9, w0, #0xffff" extends the unsigned w0 to 32 bits, and "mul w8, w9, w8" performs a multiplication operation on the extended w1 and w0, resulting in a 32-bit product stored in w8. "tst w8, #0xffff0000" checks if the high 16 bits of w8 are all 0. "b.ne $overflowLabel" jumps to the $overflowLabel location for overflow handling if the high 16 bits of w8 are not all 0.
[0084] In another implementation, the target statement can be a statement for signed 32-bit integer multiplication. The processor type can include a second processor type (e.g., ARM64), and the machine instruction sequence includes an eighth machine instruction sequence for execution on the second processor type. The eighth machine instruction sequence performs the following process: determining a signed 64-bit first value based on the product of a signed 32-bit first parameter and a signed 32-bit second parameter; logically right-shifting the first value by 32 bits to obtain a second value; and determining whether the product between the first and second parameters overflows based on whether the value obtained after arithmetic right-shifting the first value by 31 bits is equal to the second value. In different specific embodiments, the specific machine instructions included in the eighth machine instruction sequence may differ.
[0085] In a specific example, the eighth machine instruction sequence could be:
[0086] “smull x8,w0,w1
[0087] LSR x9,x8,#32
[0088] cmp w9,w8,asr#31
[0089] b.ne $overflowLabel”
[0090] The `smull x8,w0,w1` function determines the product of the signed 32-bit values `w0` and `w1` and stores it in the 64-bit value `x8`. `lsr x9,x8,#32cmp w9,w8,asr#31` means logically right-shifting `x8` by 32 bits to obtain `x9`, and then determining whether the value obtained by arithmetically right-shifting `w8` by 31 bits is equal to `w9`. `b.ne $overflowLabel` indicates that if the value obtained by arithmetically right-shifting `w8` by 31 bits is not equal to `w9`, then it jumps to the location of `$overflowLabel` for overflow handling.
[0091] In one implementation, the target statement can be a statement for signed 64-bit integer multiplication. The processor type can include a second processor type (e.g., ARM64), and the machine instruction sequence can include a ninth machine instruction sequence for execution on the second processor type. The ninth machine instruction sequence is used to perform the following processes: determining a first signed 64-bit value based on the product of a first signed 64-bit parameter and a second signed 64-bit parameter; determining a third signed 64-bit value based on the high 64 bits of the 128-bit product between the first and second signed 64-bit parameters; and determining whether the product between the first and second parameters overflows based on whether the value obtained by arithmetically right-shifting the first value by 63 bits is equal to the third value. In different specific embodiments, the specific machine instructions included in the ninth machine instruction sequence can be different.
[0092] In a specific example, the ninth machine instruction sequence could be:
[0093] “mul x8,x0,x1
[0094] smulh x9,x0,x1
[0095] cmp x9,x8,asr#63
[0096] b.ne $overflowLabel”
[0097] The function `mul x8,x0,x1` calculates the 64-bit product of the signed 64-bit x0 and the signed 64-bit w1, and stores the result in the 64-bit x8. `smulh x9,x0,x1 cmp x9,x8,asr#63` calculates the high 64 bits of the 128-bit product of the signed 64-bit x0 and the signed 64-bit w1, and stores the result in x9. Then, it compares the value obtained by arithmetically right-shifting x8 by 63 bits with x9. `b.ne $overflowLabel` indicates that if the value obtained by arithmetically right-shifting x8 by 63 bits is not equal to x9, it jumps to the location of `$overflowLabel` for overflow handling.
[0098] In another implementation, the target statement can be a statement for unsigned 32-bit integer multiplication. The processor type includes a second processor type (e.g., ARM64), and the machine instruction sequence includes a tenth machine instruction sequence for execution on the second processor type. The tenth machine instruction sequence performs the following process: determining an unsigned 64-bit first value based on the product of an unsigned 32-bit first parameter and an unsigned 32-bit second parameter; and determining whether the product between the first and second parameters overflows based on whether all bits after logically shifting the first value 32 bits to the right are all 0. In different specific embodiments, the specific machine instructions included in the tenth machine instruction sequence may differ.
[0099] In a specific example, the tenth machine instruction sequence could be:
[0100] “umull x8,w0,w1
[0101] cmp xzr,x8,lsr#32
[0102] b.ne $overflowLabel”
[0103] Here, "umull x8,w0,w1" means calculating the product of unsigned 32-bit integers w0 and w1 within a 64-bit range and storing the result in the 64-bit x8. "cmp xzr,x8,lsr#32" means determining whether all bits are 0 after logically shifting x8 32 bits to the right. "b.ne $overflowLabel" means that if all bits are not 0 after logically shifting x8 32 bits to the right, jump to the location $overflowLabel for overflow handling.
[0104] In one implementation, the target statement can be a statement for unsigned 64-bit integer multiplication. The processor type includes a second processor type (e.g., ARM64), and the machine instruction sequence includes an eleventh machine instruction sequence for execution on the second processor type. The eleventh machine instruction sequence performs the following process: determining a first value based on the high 64 bits of the 128-bit product of an unsigned 64-bit first parameter and an unsigned 64-bit second parameter; and determining whether the product between the first and second parameters overflows based on whether all bits of the first value are set to 0. In different specific embodiments, the specific machine instructions included in the eleventh machine instruction sequence may differ.
[0105] In a specific example, the eleventh machine instruction sequence could be:
[0106] “umulh x8,x0,x1
[0107] cmp xzr,x8
[0108] b.ne $overflowLabel”
[0109] Here, "umulh x8,x0,x1 cmp xzr,x8" means calculating the product of unsigned 64-bit integers x0 and x1 within a 128-bit range, and storing the high 64 bits of the result into a 64-bit x8. Then, it compares whether all bits of x8 are 0. "b.ne $overflowLabel" indicates that if all bits of x8 are not 0, it jumps to the location of $overflowLabel for overflow handling.
[0110] According to another embodiment of this specification, a corresponding machine code generation method is also provided. Figure 5 is a flowchart of a machine code generation method according to an embodiment of this specification. As shown in Figure 5, the method includes at least the following steps:
[0111] S501: Obtain the target bytecode obtained by the method shown in Figure 1;
[0112] S503: Generate target machine code based on the target bytecode and target processor type, wherein the target machine code includes a sequence of machine instructions determined according to the bytecode instructions described in the method shown in FIG1.
[0113] According to another embodiment, a bytecode generation apparatus is also provided. Figure 6 is an architectural diagram of a bytecode generation apparatus according to an embodiment of this specification. As shown in Figure 6, the apparatus 600 includes:
[0114] The first acquisition unit 61 is configured to acquire target source code and apply a first processing to the target source code, wherein the first processing includes: replacing the target statement used for integer arithmetic in the target source code with a preset target function;
[0115] The first generation unit 62 is configured to generate target bytecode based on the first processed target source code. The target bytecode includes bytecode instructions generated according to the preset function. The bytecode instructions indicate that integer operations corresponding to the target statement are performed through a preset machine instruction sequence, as well as overflow checks for the integer operations.
[0116] According to another embodiment, a machine code generation apparatus is also provided. FIG7 is an architectural diagram of a machine code generation apparatus according to an embodiment of this specification. As shown in FIG7, the apparatus 700 includes:
[0117] The second acquisition unit 71 is configured to acquire the target bytecode obtained by the method shown in Figure 1.
[0118] The second generation unit 72 is configured to generate target machine code based on the target bytecode and the target processor type, wherein the target machine code includes a sequence of machine instructions determined according to the bytecode pointer in the method shown in FIG1.
[0119] This specification also provides a computing device, including: a processor; and a memory storing a program, wherein when the processor executes the program, the following operations are performed: acquiring target source code; applying a first processing to the target source code, the first processing including: replacing target statements in the target source code used for integer arithmetic with a preset target function; generating target bytecode based on the first-processed target source code, the target bytecode including bytecode instructions generated according to the preset function, the bytecode instructions instructing the execution of integer arithmetic corresponding to the target statements through a preset machine instruction sequence, and overflow checking for the integer arithmetic.
[0120] This specification also provides a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform any of the methods described above.
[0121] This specification also provides a computer program product, including a computer program / instructions that, when executed by a processor, implement any of the methods described above.
[0122] In the 1990s, improvements to a technology could be clearly distinguished as either hardware improvements (e.g., improvements to the circuit structure of diodes, transistors, switches, etc.) or software improvements (improvements to the methodology). However, with technological advancements, many methodological improvements today can be considered direct improvements to the hardware circuit structure. Designers almost always obtain the corresponding hardware circuit structure by programming the improved methodology into the hardware circuit. Therefore, it cannot be said that a methodological improvement cannot be implemented using hardware physical modules. For example, a Programmable Logic Device (PLD) (such as a Field Programmable Gate Array (FPGA)) is such an integrated circuit whose logic function is determined by the user programming the device. Designers can program and "integrate" a digital system onto a PLD themselves, without needing chip manufacturers to design and manufacture dedicated integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing integrated circuit chips, this programming is mostly implemented using "logic compiler" software. Similar to the software compiler used in program development, the original code before compilation must also be written in a specific programming language, called a Hardware Description Language (HDL). There are many HDLs, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, and RHDL (Ruby Hardware Description Language). Currently, the most commonly used are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. Those skilled in the art should also understand that by simply performing some logic programming on the method flow using one of these hardware description languages and programming it into an integrated circuit, the hardware circuit implementing the logical method flow can be easily obtained.
[0123] The controller can be implemented in any suitable manner. For example, it can take the form of a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro)processor, logic gates, switches, application-specific integrated circuits (ASICs), programmable logic controllers, and embedded microcontrollers. Examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicon Labs C8051F320. A memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art will also recognize that, in addition to implementing the controller in purely computer-readable program code form, the same functionality can be achieved by logically programming the method steps to make the controller take the form of logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded microcontrollers. Therefore, such a controller can be considered a hardware component, and the means included therein for implementing various functions can also be considered as structures within the hardware component. Alternatively, the means for implementing various functions can be considered as both software modules implementing the method and structures within the hardware component.
[0124] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or physical entities, or by products with certain functions. A typical implementation device is a server system. Of course, this application does not exclude the possibility that, with the future development of computer technology, the computer implementing the functions of the above embodiments can be, for example, a personal computer, a laptop computer, an in-vehicle human-machine interaction device, a cellular phone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or any combination of these devices.
[0125] While one or more embodiments of this specification provide the operational steps of the methods described in the embodiments or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps listed in the embodiments is merely one possible order of execution among many steps and does not represent the only possible order. In actual device or end product execution, the methods shown in the embodiments or drawings may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even a distributed data processing environment). The terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, product, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, product, or apparatus. Without further limitations, the presence of other identical or equivalent elements in the process, method, product, or apparatus that includes the elements is not excluded. For example, the use of terms such as "first," "second," etc., is to denote names and does not indicate any particular order.
[0126] For ease of description, the above devices are described in terms of function, divided into various modules. Of course, when implementing one or more of these specifications, the functions of each module can be implemented in one or more software and / or hardware components, or a module that performs the same function can be implemented by a combination of multiple sub-modules or sub-units. The device embodiments described above are merely illustrative. For example, the division of units is only a logical functional division; in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces, indirect coupling or communication connection between devices or units, and may be electrical, mechanical, or other forms.
[0127] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in one or more blocks of the flowchart illustrations and / or one or more blocks of the block diagrams.
[0128] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means that implement the functions specified in one or more flowcharts and / or one or more block diagrams.
[0129] These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowcharts and / or one or more block diagrams.
[0130] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.
[0131] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0132] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information by any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage, graphene storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.
[0133] Those skilled in the art will understand that one or more embodiments of this specification can be provided as a method, system, or computer program product. Therefore, one or more embodiments of this specification may take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of this specification may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0134] One or more embodiments of this specification can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a particular task or implement a particular abstract data type. One or more embodiments of this specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0135] The various embodiments in this specification are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, system embodiments are basically similar to method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments. In the description of this specification, the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of this specification. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described can be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification and the features of different embodiments or examples.
[0136] The above description is merely an embodiment of one or more embodiments of this specification and is not intended to limit the scope of these embodiments. Various modifications and variations can be made to these embodiments by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this specification should be included within the scope of the claims.
Claims
1. A method for generating bytecode, comprising: Obtain the target source code, and apply a first process to the target source code, the first process including: replacing the target statement used for integer arithmetic in the target source code with a preset target function; Based on the first processed target source code, target bytecode is generated. The target bytecode includes bytecode instructions generated according to the preset function. The bytecode instructions indicate that integer operations corresponding to the target statement are performed through a preset machine instruction sequence, as well as overflow checks for the integer operations.
2. The method according to claim 1, wherein, The preset machine instruction sequence includes: a machine instruction sequence for performing integer operations on a preset processor type, for the integer operation type and operation parameter type corresponding to the target statement, and for performing overflow checks on the integer operations.
3. The method according to claim 1, wherein, The target statement is a statement used for integer addition; The processor type includes a first processor type, and the machine instruction set corresponding to the first processor type includes a first instruction for performing integer addition and applying an overflow check to the integer addition. The machine instruction sequence includes a first machine instruction sequence for execution on the first processor type, and the first machine instruction sequence contains the first instruction.
4. The method according to claim 2, wherein, The target statement is a statement for integer addition of a predetermined first number of bits, which is 8 bits or 16 bits; The processor type includes a second processor type, and the machine instruction sequence includes a second machine instruction sequence for execution on the second processor type; The second machine instruction sequence is used to perform the following process: extend the first parameter of the first number to 32 bits, extend the second parameter of the first number to 32 bits, apply a summation operation to the extended first parameter and the extended second parameter to obtain a 32-bit summation result; apply an extension operation to the first bit of the lower bits of the summation result to obtain a 32-bit extended result, and determine whether the sum between the first parameter and the second parameter overflows based on whether the summation result and the extended result are the same.
5. The method according to claim 2, wherein, The target statement is a statement for integer addition of a predetermined second number, which is 32 bits or 64 bits; The processor type includes a second processor type, and the machine instruction set corresponding to the second processor type includes a second instruction for performing integer addition of a second number and applying an overflow check to the integer addition of the second number. The machine instruction sequence includes a third machine instruction sequence for execution on the second processor type, and the third machine instruction sequence contains the second instruction.
6. The method according to claim 2, wherein, The target statement is a statement used for 128-digit integer addition; The processor type includes a second processor type, and the machine instruction sequence includes a fourth machine instruction sequence for execution on the second processor type; The fourth machine instruction sequence is used to execute the following process: determining a carry flag by summing the lower 64 bits of the first 128-bit first parameter and the lower 64 bits of the second 128-bit second parameter; determining an overflow flag by summing the higher 64 bits of the third 64 bits of the first parameter, the higher 64 bits of the second parameter, and the carry flag; and determining whether the sum between the first parameter and the second parameter overflows based on the overflow flag.
7. The method according to claim 2, wherein, The target statement is a statement used for integer multiplication; The processor type includes a first processor type, and the machine instruction set corresponding to the first processor type includes a third instruction for performing integer multiplication and applying overflow checks to the integer multiplication. The machine instruction sequence includes a fifth machine instruction sequence for execution on the first processor type, and the fifth machine instruction sequence contains the third instruction.
8. The method according to claim 2, wherein, The target statement is a statement for integer multiplication of a predetermined signed first digit, where the first digit is 8 or 16 bits. The processor type includes a second processor type, and the machine instruction sequence includes a sixth machine instruction sequence for execution on the second processor type; The sixth machine instruction sequence is used to execute the following process: extend the first parameter of the first signed number to 32 bits, extend the second parameter of the first signed number to 32 bits, and apply a product operation to the extended first parameter and the extended second parameter to obtain a 32-bit product result. An expansion operation is performed on the first least significant bit of the product result to obtain a 32-bit expanded result. Based on whether the product result and the expanded result are the same, it is determined whether the product between the first parameter and the second parameter overflows.
9. A machine code generation method, comprising: Obtain the target bytecode obtained by the method described in claim 1; Based on the target bytecode and the target processor type, a target machine code is generated, wherein the target machine code includes a sequence of machine instructions determined by the bytecode instructions according to claim 1.
10. A computer device, comprising: processor; and a memory containing a program, wherein when the processor executes the program, the following operations are performed: Obtain the target source code, and apply a first process to the target source code, the first process including: replacing the target statement used for integer arithmetic in the target source code with a preset target function; Based on the first processed target source code, target bytecode is generated. The target bytecode includes bytecode instructions generated according to the preset function. The bytecode instructions indicate that integer operations corresponding to the target statement are performed through a preset machine instruction sequence, as well as overflow checks for the integer operations.