A method and system for smart contract fuzzing
By employing a smart contract fuzzing method based on genetic algorithms and hybrid strategies, the problems of state dependency and block information dependency are solved, achieving efficient and accurate smart contract vulnerability detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
- Filing Date
- 2022-11-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing fuzz testing methods for smart contracts have failed to effectively address the issues of state dependency and block information dependency, resulting in poor testing results.
A smart contract fuzz testing method based on genetic algorithms and hybrid strategies is adopted. When generating test cases, block information data and transaction execution data are considered. Fitness is calculated through key data flow analysis and combined with symbolic execution methods to improve coverage and efficiency.
It improves the efficiency and accuracy of smart contract fuzz testing, can generate test cases that explore unknown branches, maintain the influence on the contract state, and quickly generate high-coverage test cases through hybrid strategies.
Smart Images

Figure CN115794625B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of blockchain technology, and in particular to a method and system for fuzz testing of smart contracts. Background Technology
[0002] As an emerging field, smart contract vulnerability detection largely follows the principles of traditional software vulnerability detection methods, with optimizations and improvements made based on the characteristics of Ethereum smart contracts. Currently, mainstream smart contract vulnerability detection methods are based on compliance execution or fuzzing. In recent years, the field of smart contract vulnerability detection has gradually transitioned from traditional methods to deep learning, machine learning, and multi-domain approaches. Unlike traditional vulnerability detection methods such as static and dynamic analysis, novel detection methods based on knowledge from other professional fields or integrating multiple technologies can effectively improve the efficiency and accuracy of smart contract vulnerability detection.
[0003] Current fuzzing techniques often fail to achieve the desired results when fuzzing smart contracts. There are significant differences between fuzzing traditional software and fuzzing smart contracts. When fuzzing smart contracts, the generation of test cases faces two main challenges: state dependency and block information dependency. A smart contract is essentially a transaction-driven state machine. Each transaction executes a smart contract, potentially changing its internal state, including the actual values of variables in its storage area and the contract's Ether value. Different transaction sequences can also lead to different final states, and exposing some vulnerabilities often requires the contract to execute specific transaction sequences, further posing a challenge to fuzzing. Furthermore, smart contract execution sometimes requires additional input related to the underlying blockchain protocol, such as the current block timestamp or block number. Therefore, fuzzing smart contracts should also consider block information beyond transaction information. Current technologies rarely address these issues and significant improvements are needed. Summary of the Invention
[0004] To address the aforementioned problems, this invention provides a method and system for fuzz testing of smart contracts. Specifically, it relates to a smart contract fuzz testing method based on genetic algorithms and hybrid strategies, which solves the two problems of state dependency and block information dependency in existing smart contract fuzz testing.
[0005] A first aspect of the present invention provides a method for fuzz testing of smart contracts, the method comprising the following steps:
[0006] Compile the smart contract source code into EVM runtime bytecode and obtain the application binary interface (ABI) of the smart contract source code;
[0007] A fuzzy testing engine based on genetic algorithms is used to generate test cases based on ABI information. Specifically, the fuzzy testing engine generates an initial population based on the ABI; the fuzzy testing engine selects, crosses over, and mutates the initial population based on genetic algorithms to generate new individuals, which are the test cases. Multiple test cases form a new population.
[0008] Execute test cases using the contract execution monitor, specifically including: performing code coverage and key data flow analysis on the test cases, and returning the code coverage and key data flow information to the fuzz test engine;
[0009] The fuzzing engine continuously repeats the process of selecting, crossovering, and mutating new populations, generating a large number of individuals. The contract execution monitor will continuously execute test cases until at least one of two termination conditions is met: a given number of population generations has been generated or a given fuzzing time has been reached.
[0010] The contract execution monitor outputs information about the detected vulnerabilities and provides the actual exploit path.
[0011] Before performing crossover and mutation on test cases, the fuzz testing engine first calculates the fitness of each individual in the population based on key data flow information, and then selects individuals based on the fitness.
[0012] A further technical solution of the present invention is as follows: the test case is represented as a series of transaction inputs, each transaction including block information data and transaction execution data. The block information data includes the current timestamp and block number; the transaction execution data includes the address of the sending account, the amount of Ether sent by the user, the maximum Gas consumption during contract execution, and the input data for contract execution; wherein the input data for contract execution includes a function selector and the parameter types and actual values required by the function selector. The function selector is obtained through the ABI, and the function selector identifies each function in the smart contract.
[0013] A further technical solution of the present invention is as follows: In the process of generating the initial population according to the ABI, the fuzzy testing engine generates a population consisting of N test cases, each test case containing only a single transaction input, and the function to be executed by each test case is selected from all smart contract functions in a cyclical manner; for each parameter value in the smart contract function, the fuzzy testing engine generates it according to the parameter type in the smart contract ABI.
[0014] A further technical solution of the present invention is: when there is no significant increase in smart contract code coverage in the past M rounds of population iteration, the fuzz testing engine will regenerate the initial population.
[0015] A further technical solution of the present invention is: the step of calculating the fitness of each individual in the population based on key data stream information and selecting individuals based on the fitness specifically includes:
[0016] The formula for calculating the fitness of test case x is:
[0017] Finesse(x)=Fitnesscoverage(x)+Fitnesswrite(x
[0018] Where Fitnesscoverage(x) represents the ability of test case x to explore new branches, and Fitnesswrite(x) represents the ability of test case x to change the state of the contract;
[0019] After calculating the fitness value for each test case in the population, the fuzzing engine selects the first test case based on the fitness value, choosing the individual with the highest fitness value as the basis for crossover and mutation. For the selection of the second test case, the fuzzing engine selects a test case that has a data dependency relationship with the first test case based on the execution information of the key data flow fed back by the contract execution monitor. If there is no test case that has a data dependency relationship with the first test case, the fuzzing engine then selects one based on the fitness value.
[0020] A further technical solution of the present invention is as follows: For the intersection of two test cases, if there is a data dependency between the two test cases, the test case that performs the write operation is placed first, the test case that performs the read operation is placed second, and the two test cases are spliced together to form a new test case. The fuzz test engine sets a maximum transaction execution sequence length L, and the test cases will only be merged when the length of the newly generated test case is less than or equal to L.
[0021] A further technical solution of the present invention is: after the crossover is completed, the fuzzing engine will randomly modify each data item of the test case with the mutation probability to create a new individual, wherein the mutation value is replaced by replacing the original value with a random value or by replacing the original value with a value in the mutation cache pool. The mutation cache pool records high-frequency values and values with high fitness observed or learned during past contract execution.
[0022] A further technical solution of the present invention is that the method also integrates a symbolic execution smart contract vulnerability detection method. The specific integration method includes: alternating execution of symbolic execution and fuzzing to generate test cases, or introducing probability parameters, and selecting whether to use the symbolic execution method or the fuzzing method according to a certain probability distribution each time test cases are generated.
[0023] A further technical solution of the present invention is that the contract execution monitor also includes a vulnerability detector for identifying vulnerability patterns in smart contract code.
[0024] A second aspect of the present invention is a smart contract fuzz testing system, comprising an ABI generation unit, a fuzz testing engine based on a genetic algorithm, and a contract execution monitor;
[0025] The ABI generation unit is used to compile the smart contract source code into EVM runtime bytecode and obtain the application binary interface (ABI) of the smart contract source code.
[0026] The fuzzy testing engine based on genetic algorithms is used to generate test cases based on ABI information. Specifically, the fuzzy testing engine generates an initial population based on the ABI; the fuzzy testing engine selects, crosses over, and mutates the initial population based on genetic algorithms to generate new individuals, which are the test cases. Multiple test cases form a new population.
[0027] The contract execution monitor is used to execute test cases, specifically including: performing code coverage and key data flow analysis on test cases, and returning the code coverage and key data flow information to the fuzz testing engine;
[0028] The fuzzing engine is also used to continuously repeat the process of selecting, crossovering, and mutating new populations and generating a large number of individuals. The contract execution monitor will continuously execute test cases until at least one of two termination conditions is met: a given number of population generations has been generated or a given fuzzing time has been reached.
[0029] The contract execution monitor outputs information about the detected vulnerabilities and provides the actual exploit path.
[0030] Before performing crossover and mutation on test cases, the fuzz testing engine first calculates the fitness of each individual in the population based on key data flow information, and then selects individuals based on the fitness.
[0031] This invention provides a method and system for fuzz testing smart contracts, addressing two problems in existing smart contract fuzz testing: state dependency and block information dependency. Specifically, regarding the block information dependency problem, both block information data and transaction execution data are considered when generating test cases. For the smart contract state dependency problem, a fitness calculation method based on key data flow analysis is proposed. The final fitness value of a test case equals the sum of the test case's ability to explore new branches and the test case's ability to change the contract state. This combination of values enables the fuzz testing engine to generate test cases that explore unknown branches while retaining test cases that have a real impact on the smart contract state, thus resolving the state dependency problem. Furthermore, to further address the state dependency problem, the fuzz testing engine selects two test cases with data dependencies based on key data flow information fed back by the contract execution monitor. The method of this invention also integrates a symbolic execution smart contract vulnerability detection method, specifically a smart contract vulnerability detection method based on a hybrid strategy that combines the characteristics of symbolic execution and fuzzing. It learns from the high-coverage transaction sequences generated by the symbolic execution engine to achieve the goal of quickly and in large quantities generating similar sequences. While ensuring that the fuzzing test case generation speed is fast and efficient, it can maintain a coverage rate comparable to that of symbolic execution. Attached Figure Description
[0032] Figure 1 This is a schematic diagram of a smart contract fuzz testing method in an embodiment of the present invention;
[0033] Figure 2 This is a schematic diagram illustrating how block information data and transaction execution data are considered simultaneously when generating test cases in this embodiment of the invention;
[0034] Figure 3 This is a schematic diagram of the key data flow analysis method in an embodiment of the present invention;
[0035] Figure 4 This is a schematic diagram of a smart contract vulnerability detection method that integrates symbolic execution and fuzz testing in an embodiment of the present invention;
[0036] Figure 5 This is a schematic diagram of the smart contract fuzz testing system structure in an embodiment of the present invention. Detailed Implementation
[0037] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the drawings, not the entire structure.
[0038] Before discussing the exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowcharts describe the steps as sequential processes, many of these steps can be performed in parallel, concurrently, or simultaneously. Furthermore, the order of the steps can be rearranged. The process can be terminated when its operation is complete, but may also have additional steps not included in the figures. The process can correspond to a method, function, procedure, subroutine, subroutine, etc.
[0039] This invention provides a method and system for fuzz testing of smart contracts, and offers the following embodiments:
[0040] Based on Embodiment 1 of the present invention
[0041] This embodiment illustrates a method for fuzz testing of smart contracts. The overall implementation process of the method is as follows: Figure 1 As shown, it includes the following steps:
[0042] Compile the smart contract source code into EVM runtime bytecode and obtain the application binary interface (ABI) of the smart contract source code;
[0043] In the specific implementation process, the source code of the smart contract is compiled into EVM runtime bytecode, and the application binary interface (ABI) of the contract code is obtained at the same time. The smart contract ABI defines the signatures, parameter lists, and return values of all functions in the contract code.
[0044] A fuzzy testing engine based on genetic algorithms is used to generate test cases based on ABI information. Specifically, the fuzzy testing engine generates an initial population based on the ABI; the fuzzy testing engine selects, crosses over, and mutates the initial population based on genetic algorithms to generate new individuals, which are the test cases. Multiple test cases form a new population.
[0045] Before performing crossover and mutation on test cases, the fuzz testing engine first calculates the fitness of each individual in the population based on key data flow information, and then selects individuals based on the fitness.
[0046] Execute test cases using the contract execution monitor, specifically including: performing code coverage and key data flow analysis on the test cases, and returning the code coverage and key data flow information to the fuzz test engine;
[0047] The fuzzing engine continuously repeats the process of selecting, crossovering, and mutating new populations, generating a large number of individuals. The contract execution monitor will continuously execute test cases until at least one of two termination conditions is met: a given number of population generations has been generated or a given fuzzing time has been reached.
[0048] In practice, the fuzz testing engine can generate a large number of test cases based on the ABI information. Here, the fuzz testing engine based on the genetic algorithm first generates an initial population based on the smart contract's ABI. Then, the fuzz testing engine selects, crossovers, and mutates the initial population using the genetic algorithm to generate new individuals, i.e., contract test cases. All test cases will run in the contract execution monitor.
[0049] While the Contract Execution Monitor executes test data, it performs code coverage and key data flow analysis on the test cases and returns information related to the relationship between code coverage and key data flow to the fuzzing engine. The fuzzing engine calculates the fitness for each individual in the population based on this runtime information. Fitness is used to select superior individuals; individuals with high fitness have a higher probability of being selected by the fuzzing engine, while individuals with low fitness are eliminated. The fuzzing engine continuously repeats the selection, crossover, and mutation process, generating a large number of individuals. The Contract Execution Monitor continuously executes test cases until at least one of two termination conditions is met: a given number of population generations has been generated, or a given fuzzing time has been reached.
[0050] The contract execution monitor outputs information about the detected vulnerabilities and provides the actual exploit path.
[0051] Preferably, the test case is represented as a series of transaction inputs, each transaction including block information data and transaction execution data. The block information data includes the current timestamp and block number; the transaction execution data includes the address of the sending account, the amount of Ether sent by the user, the maximum Gas consumption during contract execution, and the input data for contract execution; wherein the input data for contract execution includes a function selector and the parameter types and actual values required by the function selector, the function selector being obtained through the ABI, and the function selector identifying each function in the smart contract.
[0052] In the specific implementation process, in order to solve the block information dependency problem, both block information data and transaction execution data are considered when generating test cases, such as... Figure 2 As shown, test cases are represented as a series of transaction inputs, each containing block information data and transaction execution data. Block information data includes the current timestamp (BlockTimestamp) and block number (BlockNumber), while transaction execution data includes the sending account address (From), the amount of Ether sent by the user (Value), the maximum Gas consumption during contract execution (GasLimit), and the input data for contract execution (Data). The input data for contract execution includes the function selector and all its required parameter types and actual values. The function selector can be obtained through the smart contract's ABI, which uniquely identifies each function in the contract.
[0053] Preferably, during the process of generating the initial population based on the ABI, the fuzzing engine generates a population consisting of N test cases, each test case containing only a single transaction input, and the function to be executed for each test case is selected from all smart contract functions in a cyclical manner; for each parameter value in the smart contract function, the fuzzing engine generates it according to the parameter type in the smart contract ABI.
[0054] In practice, the fuzzing engine initially generates a population of N test cases, each initially containing only a single transaction input. The function to be executed for each test case is selected cyclically from all contract functions. For each parameter value in the contract function, the fuzzing engine generates it based on its parameter type in the smart contract's ABI. Depending on the parameter type and size, the fuzzing engine applies different strategies to generate valid values for the parameters. For example, if the parameter type is a fixed-size uint8, then the valid input range is 0 to 2. 8 A value is randomly selected from -1 as the parameter input. In addition, the fuzzing engine also considers randomly selecting a value from commonly used input domain edge values as the fuzzy input, such as 0, 1, -1, 255, 256, etc.
[0055] Preferably, if the smart contract code coverage has not increased significantly in the past M rounds of population iterations, the fuzzing engine will regenerate the initial population. Specifically, if the contract code coverage has not increased significantly in the past M rounds of population iterations, the fuzzing engine will restart the population initialization, delaying the time when individual population members fall into homogeneity and local optima, and further increasing the diversity of population evolution.
[0056] Preferably, the step of calculating the fitness of each individual in the population based on key data stream information and selecting individuals based on fitness specifically includes:
[0057] The formula for calculating the fitness of test case x is:
[0058] Fitness (x = Fitness) covrage (x)+Fitness write (x)
[0059] Fitness coverage (x) represents the ability of test case x to explore new branches, Fitness write (x) represents the ability of test case x to change the contract state;
[0060] After calculating the fitness value for each test case in the population, the fuzzing engine selects the first test case based on the fitness value, choosing the individual with the highest fitness value as the basis for crossover and mutation. For the selection of the second test case, the fuzzing engine selects a test case that has a data dependency relationship with the first test case based on the execution information of the key data flow fed back by the contract execution monitor. If there is no test case that has a data dependency relationship with the first test case, the fuzzing engine then selects one based on the fitness value.
[0061] In practical implementation, before cross-testing and mutating test cases, the fuzzing engine first needs to select suitable individuals through fitness function calculation. Generally, fuzzing methods aim to fully explore the program code and achieve the highest possible code coverage. In addition, for the smart contract state dependency problem, a fitness calculation method based on key data flow analysis is proposed, as shown in the formula above, where Fitness... coverage (x) measures the ability of test case x to explore new branches. When the Contract Execution Monitor executes each test case, it saves all branches explored by all currently tested test cases. For a new test case, the Contract Execution Monitor analyzes its execution trajectory, focusing on conditional jump instructions (i.e., JUMPI instructions) in the contract code. The Contract Execution Monitor determines the branch jump destination by analyzing these conditional jump instructions.
[0062] For destinations not in the previously saved branch jump list, the corresponding fitness value will be used. coverage (x) Increase by 1. This approach prioritizes test cases capable of exploring unknown branches. Fitness write (x) measures the ability of test case x to change the contract state. Whenever test case x writes to a variable in the contract, the modified variable value may be accessed again by other test cases, introducing a vulnerability risk. Therefore, for every storage write operation performed by test case x during its execution, Fitness will be evaluated. write (x) increments by 1, indicating that it has the ability to change the state of the contract.
[0063] The final fitness value of a test case is equal to the sum of the two fitness values mentioned above. This combination of values allows the fuzzing engine to generate test cases that explore unknown branches while retaining test cases that have a real impact on the smart contract state, thus addressing the state dependency problem.
[0064] After calculating the fitness value for each test case in the population, the fuzzing engine selects the individual with the highest fitness value as the basis for the next step of crossover and mutation. To address the state dependency problem, the fuzzing engine selects two test cases with data dependencies based on key data flow information fed back by the contract execution monitor. Key data flow analysis is as follows: Figure 3 As shown, functions Function1() and Function2() perform write operations on variables A and B respectively, while Function3() performs read operations on variables A and B. This means that Function3() has a data dependency on the first two functions. Modifications to variables by the first two functions may affect the execution result of Function3() itself. Therefore, when generating fuzzy data, it is preferable to generate transaction sequences with data dependencies, especially sequences with critical data stream dependencies such as Ether forwarding or block information (Block.Timestamp).
[0065] Therefore, when selecting the first test case, the fuzz testing engine will select it based on the fitness value. For the second test case, it will try to select a test case that has a data dependency relationship with the first one. If no such test case exists, it will then select one based on the fitness value.
[0066] Preferably, for the intersection of two test cases, if there is a data dependency between the two test cases, the test case that performs the write operation is placed first, and the test case that performs the read operation is placed second. The two test cases are then concatenated to form a new test case. The fuzz testing engine sets a maximum transaction execution sequence length L, and the test cases will only be merged if the length of the newly generated test case is less than or equal to L.
[0067] In practice, when two test cases overlap and there is a data dependency between them, the test case performing the write operation is placed first, followed by the test case performing the read operation, and the two test cases are then concatenated to form a new transaction execution sequence. To prevent the transaction sequence from growing indefinitely, the fuzzing engine sets a maximum transaction execution sequence length L. Merging only occurs when the length of the newly generated test cases is less than or equal to L. The L parameter is adjustable; decreasing the value of L improves the efficiency and speed of fuzzing, while increasing the value of L helps improve code coverage.
[0068] Preferably, after the crossover is completed, the fuzzing engine will randomly modify each piece of data in the test case with the mutation probability to create a new individual. The mutation value is obtained by replacing the original value with a random value or by replacing the original value with a value in the mutation cache pool, which records high-frequency values and values with high fitness observed or learned during past contract execution.
[0069] In practice, after the crossover is complete, the fuzz testing engine will also use the mutation probability P m A new individual is created by randomly modifying the data in the test case Pm item. This step introduces uncertainty and diversity into the population. The mutated value can be selected in two ways: replacing the original value with a random value or replacing the original value with a value from the mutation cache pool. The mutation cache pool records high-frequency values and values with high fitness observed or learned during past contract executions.
[0070] Furthermore, the method also incorporates a symbolic execution smart contract vulnerability detection method. The specific integration method includes: alternating between symbolic execution and fuzzing to generate test cases, or introducing probability parameters, and selecting between symbolic execution and fuzzing methods based on a certain probability distribution each time test cases are generated.
[0071] Specifically, symbolic execution and fuzzing, as two important methods in traditional software vulnerability detection, have been applied to smart contract vulnerability detection in recent years. Symbolic execution, influenced by program structures such as branches and loops, often faces the problem of code path state explosion, making it difficult to explore deeper branches. Furthermore, symbolic execution is time-consuming and has low throughput, making it unsuitable for large-scale smart contract analysis. Conversely, fuzzing generates a large number of test cases, resulting in fast execution speed and high efficiency. However, fuzzing often gets bogged down in invalid inputs, leading to low vulnerability detection accuracy. These two technologies are complementary; therefore, a hybrid strategy-based smart contract vulnerability detection method that integrates the characteristics of symbolic execution and fuzzing is proposed, such as... Figure 4 As shown, the fuzzy learner learns a large amount of valid input generated by symbolic execution, thus acquiring expert knowledge of symbolic execution and generating similar test cases, avoiding the generation of invalid test cases. In actual vulnerability detection, symbolic execution and fuzzing are used alternately, and the execution strategy can be flexibly changed. Simultaneously, the collected runtime information is input into a fine-grained vulnerability detector to achieve the goal of fine-grained vulnerability detection of smart contracts. The fuzzy learner is essentially a learning model based on temporal relationships. It learns from the high-coverage transaction sequences generated by the symbolic execution engine to achieve the goal of quickly and massively generating similar sequences, maintaining coverage comparable to symbolic execution while ensuring fast and efficient fuzzing test case generation.
[0072] The fuzzy learner after training can be viewed as a probability distribution-based smart contract transaction sequence sampler. It can determine what type of transaction data should be generated based on historical transaction sequence knowledge, with the judgment derived from supervised symbolic execution. Initiating a smart contract transaction typically requires three parameters: the transaction address, the attached Ether, and the contract function called by the transaction, including its parameters. Therefore, the fuzzy learner learns about symbolic execution and then automatically generates these three parameters. At each time step t... i The generated transaction use case T ti The transaction sequence is executed up to time t. i-1 The state S of the entire contract at that time i-1 Treating symbolic execution as knowledge, fuzzy learners, through time-series models such as Bi-GRU, Bi-LSTM, and TCN, can effectively capture the relationships between historical transaction sequences and predict the next transaction use case.
[0073] In practice, due to the loss of some expert knowledge during the learning process, the fuzzy learner cannot completely generate transaction sequences consistent with the symbolic execution engine. Furthermore, the fuzzy learner's generalization ability is inferior to that of the symbolic execution engine when encountering unknown contracts. Therefore, a hybrid strategy execution concept is introduced, combining the efficiency of fuzz testing with the accuracy of symbolic execution. The specific strategy can be shown in Formulas 1 and 2 below, where P... E P represents the symbolic execution strategy. F The fuzzing strategy is represented by "Policy," which represents the overall execution strategy.
[0074] Formula 1:
[0075] Formula 1 employs a strategy of alternating between symbolic execution and fuzzing to generate test cases.
[0076] Formula 2:
[0077] Formula 2 introduces a probability parameter p, which determines whether to use the symbolic execution method or the fuzzy testing method based on a certain probability distribution during each execution.
[0078] In a real-world smart contract vulnerability detection environment, the above execution policy can be adaptively modified according to specific scenarios, achieving the effect of flexible replacement of the execution policy. For example, when the contract execution monitor executes test cases, if there is a situation where code coverage does not improve, the vulnerability detection policy is replaced. That is, test cases are generated using symbolic execution methods for the fuzzy testing engine to select, cross, and mutate.
[0079] Furthermore, the contract execution monitor also includes a vulnerability detector for identifying vulnerability patterns in the smart contract code.
[0080] In practice, during fuzz testing, the contract execution monitor analyzes the contract execution process. Essentially, the contract execution monitor is a modification of the Ethereum client, collecting runtime information by instrumenting the underlying virtual machine to provide code coverage and key data flow information to the fuzzing engine. Simultaneously, the program's runtime information is input into the vulnerability detector.
[0081] A vulnerability detector is a precise definition of a smart contract vulnerability pattern. It identifies potential vulnerability patterns in smart contract code. To achieve fine-grained vulnerability identification, the vulnerability detector needs to model and analyze the key statements and variables for each type of vulnerability. This embodiment provides a precise description of vulnerability detectors for several important vulnerability types:
[0082] Vulnerability detectors targeting reentrancy attacks can analyze smart contracts from three perspectives. First, they analyze whether the contract contains calls to functions like `call.value` (similar to transfer functions). The presence of such functions increases the risk of reentrancy attacks. Assuming the contract contains transfer functions, the next step is to analyze whether the contract checks and modifies relevant critical variables before and after the transfer call. Before the transfer, it should determine if the user has sufficient funds, and after the transfer, it should promptly update the user's balance. If the smart contract fails to perform these checks before and after the transfer, it strongly suggests a reentrancy vulnerability.
[0083] For vulnerability detectors targeting timestamp dependencies, analysis is needed from different angles, such as whether the contract code calls block header timestamp instructions, whether the contract code uses timestamps as temporary variables, and whether the contract code uses timestamps as constraints on critical paths. This vulnerability is primarily related to the TIMESTAMP instruction.
[0084] For integer overflow vulnerabilities, vulnerability detectors need to check whether the smart contract code uses the correct integer type and safe integer calculation methods. More specifically, contract execution monitors can detect integer overflow vulnerabilities by observing the ADD, MUL, and SUB instructions during execution and using simulated calculations.
[0085] For vulnerabilities related to improper exception handling, vulnerability detectors should check during program execution for situations where any child call under a parent call throws an exception, but the parent call fails to throw an exception. If such improper exception handling is found, the corresponding vulnerability should be reported. This behavior is related to the CALL and JUMPI instructions, which contract execution monitors monitor to detect vulnerabilities.
[0086] The vulnerability detector, as a standalone module within the contract execution monitor, is scalable. Given a precise definition of vulnerability behavior, more vulnerability detectors can be easily added to extend the detection capabilities of fuzz testing.
[0087] Based on Embodiment 2 of the present invention
[0088] The smart contract fuzz testing system 500 provided in Embodiment 2 of the present invention can execute the smart contract fuzz testing method provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method. The system can be implemented by software and / or hardware (integrated circuit) and can generally be integrated into a server or terminal device. Figure 5 This is a schematic diagram of the structure of a smart contract fuzz testing system 500 according to Embodiment 2 of the present invention. (Refer to...) Figure 5 A smart contract fuzz testing system 500 according to an embodiment of the present invention may specifically include an ABI generation unit 510, a fuzz testing engine 520 based on a genetic algorithm, and a contract execution monitor 530.
[0089] The ABI generation unit 510 is used to compile the smart contract source code into EVM runtime bytecode and obtain the application binary interface (ABI) of the smart contract source code.
[0090] The fuzzy testing engine 520 based on genetic algorithms is used to generate test cases based on ABI information. Specifically, the fuzzy testing engine 520 generates an initial population based on the ABI; the fuzzy testing engine 520 selects, crossovers, and mutates the initial population based on genetic algorithms to generate new individuals, which are the test cases; multiple test cases form a new population.
[0091] The contract execution monitor 530 is used to execute test cases, specifically including: performing code coverage and key data flow analysis on the test cases, and returning the code coverage and key data flow information to the fuzz test engine 520;
[0092] The fuzzing engine 520 is also used to continuously repeat the process of selecting, crossovering, and mutating new populations and generating a large number of individuals. The contract execution monitor 530 will continuously execute test cases until at least one of two termination conditions is met: a given number of population generations has been generated or a given fuzzing time has been reached.
[0093] The Contract Execution Monitor 530 outputs information about the detected vulnerabilities and provides the actual exploit path.
[0094] In particular, before performing crossover and mutation on test cases, the fuzz testing engine 520 first calculates the fitness of each individual in the population based on key data flow information, and then selects individuals based on the fitness.
[0095] In addition to the units and modules described above, the smart contract fuzz testing system 500 may also include other components; however, since these components are not relevant to the content of this disclosure, their illustrations and descriptions are omitted here.
[0096] The specific working process of a smart contract fuzz testing system 500 is described in the above-described embodiment 1 of the smart contract fuzz testing method, and will not be repeated here.
[0097] This invention provides a method and system for fuzz testing smart contracts, addressing two problems in existing smart contract fuzz testing: state dependency and block information dependency. Specifically, regarding the block information dependency problem, both block information data and transaction execution data are considered when generating test cases. For the smart contract state dependency problem, a fitness calculation method based on key data flow analysis is proposed. The final fitness value of a test case equals the sum of the test case's ability to explore new branches and its ability to change the contract state. This combination of values enables the fuzzing engine to generate test cases that explore unknown branches while retaining test cases that have a real impact on the smart contract state, thus resolving the state dependency problem. Furthermore, to further address the state dependency problem, the fuzzing engine selects two test cases with data dependencies based on key data flow information fed back by the contract execution monitor. The method of this invention also integrates a symbolic execution smart contract vulnerability detection method, specifically a smart contract vulnerability detection method based on a hybrid strategy that combines the characteristics of symbolic execution and fuzzing. It learns from the high-coverage transaction sequences generated by the symbolic execution engine to achieve the goal of quickly and in large quantities generating similar sequences. While ensuring that the fuzzing test case generation speed is fast and efficient, it can maintain a coverage rate comparable to that of symbolic execution.
[0098] Note that the above description is merely a preferred embodiment of the present invention and the technical principles employed. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments, and substitutions can be made without departing from the scope of protection of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and may include many other equivalent embodiments without departing from the concept of the present invention, the scope of which is determined by the scope of the appended claims.
Claims
1. A method of smart contract fuzz testing, the method comprising: The method includes the following steps: Compile the smart contract source code into EVM runtime bytecode and obtain the application binary interface (ABI) of the smart contract source code; A fuzzy testing engine based on genetic algorithms is used to generate test cases based on ABI information. Specifically, the fuzzy testing engine generates an initial population based on the ABI; the fuzzy testing engine selects, crosses over, and mutates the initial population based on genetic algorithms to generate new individuals, which are the test cases. Multiple test cases form a new population. Execute test cases using the contract execution monitor, specifically including: performing code coverage and key data flow analysis on the test cases, and returning the code coverage and key data flow information to the fuzz test engine; The fuzzing engine continuously repeats the process of selecting, crossovering, and mutating new populations, generating a large number of individuals. The contract execution monitor will continuously execute test cases until at least one of two termination conditions is met: a given number of population generations has been generated or a given fuzzing time has been reached. The contract execution monitor outputs information about the detected vulnerabilities and provides the actual exploit path. In this process, before cross-testing and mutating test cases, the fuzz testing engine first calculates the fitness of each individual in the population based on key data flow information, and then selects individuals based on the fitness. The test cases are represented as a series of transaction inputs. Each transaction includes block information data and transaction execution data. The block information data includes the current timestamp and block number. The transaction execution data includes the address of the sending account, the amount of Ether sent by the user, the maximum Gas consumption during contract execution, and the input data for contract execution. The input data for contract execution includes a function selector and the parameter types and actual values required by the function selector. The function selector is obtained through the ABI and identifies each function in the smart contract. The process of calculating the fitness of each individual in the population based on key data stream information and selecting individuals based on fitness specifically includes: For test cases The formula for calculating fitness is: ,in Represents test cases x The ability to explore new branches Represents test cases x The ability to change the state of a contract; After calculating the fitness value for each test case in the population, the fuzzing engine selects the first test case based on the fitness value, choosing the individual with the highest fitness value as the basis for crossover and mutation. For the selection of the second test case, the fuzzing engine selects a test case that has a data dependency relationship with the first test case based on the execution information of the key data flow fed back by the contract execution monitor. If there is no test case that has a data dependency relationship with the first test case, the fuzzing engine then selects one based on the fitness value.
2. The method of smart contract fuzz testing of claim 1, wherein, In the process of generating the initial population based on the ABI, the fuzzing engine generates a population consisting of N test cases. Each test case contains only a single transaction input. The function to be executed by each test case is selected from all smart contract functions in a cyclical manner. For each parameter value in the smart contract function, the fuzzing engine generates it according to the parameter type in the smart contract ABI.
3. The method of smart contract fuzz testing of claim 1, wherein, If the smart contract code coverage has not increased significantly in the past M rounds of population iterations, the fuzzing engine will regenerate the initial population.
4. The method of smart contract fuzz testing of claim 1, wherein, For the intersection of two test cases, if there is a data dependency between the two test cases, the test case that performs the write operation will be placed first, and the test case that performs the read operation will be placed second. The two test cases will be concatenated together to form a new test case. The fuzz testing engine sets a maximum transaction execution sequence length L. The test cases will only be merged if the length of the newly generated test case is less than or equal to L.
5. The method of smart contract fuzz testing of claim 1, wherein, After the crossover ends, the fuzzing engine randomly modifies each data item of the test case with mutation probability to create a new individual. The mutation value is either replaced with a random value or a value from the mutation cache pool, which records high-frequency values and values with high fitness observed or learned during past contract executions.
6. The method of smart contract fuzz testing of claim 1, wherein, The method also incorporates a symbolic execution smart contract vulnerability detection method. The specific integration method includes: alternating between symbolic execution and fuzzing to generate test cases, or introducing probability parameters, and selecting between symbolic execution and fuzzing methods based on a certain probability distribution each time test cases are generated.
7. The method of smart contract fuzz testing of claim 1, wherein, The contract execution monitor also includes a vulnerability detector, which is used to identify vulnerability patterns in smart contract code.
8. An intelligent contract fuzzing system, comprising: This includes an ABI generation unit, a fuzz testing engine based on genetic algorithms, and a contract execution monitor; The ABI generation unit is used to compile the smart contract source code into EVM runtime bytecode and obtain the application binary interface (ABI) of the smart contract source code. The fuzzy testing engine based on genetic algorithms is used to generate test cases based on ABI information. Specifically, the fuzzy testing engine generates an initial population based on the ABI; the fuzzy testing engine selects, crosses over, and mutates the initial population based on genetic algorithms to generate new individuals, which are the test cases. Multiple test cases form a new population. The contract execution monitor is used to execute test cases, specifically including: performing code coverage and key data flow analysis on test cases, and returning the code coverage and key data flow information to the fuzz testing engine; The fuzzing engine is also used to continuously repeat the process of selecting, crossovering, and mutating new populations and generating a large number of individuals. The contract execution monitor will continuously execute test cases until at least one of two termination conditions is met: a given number of population generations has been generated or a given fuzzing time has been reached. The contract execution monitor outputs information about the detected vulnerabilities and provides the actual exploit path. In this process, before cross-testing and mutating test cases, the fuzz testing engine first calculates the fitness of each individual in the population based on key data flow information, and then selects individuals based on the fitness. The test cases are represented as a series of transaction inputs. Each transaction includes block information data and transaction execution data. The block information data includes the current timestamp and block number. The transaction execution data includes the address of the sending account, the amount of Ether sent by the user, the maximum Gas consumption during contract execution, and the input data for contract execution. The input data for contract execution includes a function selector and the parameter types and actual values required by the function selector. The function selector is obtained through the ABI and identifies each function in the smart contract. The process of calculating the fitness of each individual in the population based on key data stream information and selecting individuals based on fitness specifically includes: Test cases The fitness calculation formula is: wherein denotes the test case x the ability to explore new branches, denotes the test case x the ability to change the contract state; After calculating the fitness value for each test case in the population, the fuzzing engine selects the first test case based on the fitness value, choosing the individual with the highest fitness value as the basis for crossover and mutation. For the selection of the second test case, the fuzzing engine selects a test case that has a data dependency relationship with the first test case based on the execution information of the key data flow fed back by the contract execution monitor. If there is no test case that has a data dependency relationship with the first test case, the fuzzing engine then selects one based on the fitness value.