A defect program repair processing method and apparatus

By determining the similarity between the defective program and the target similar program and using a multi-population genetic algorithm to generate code patches, the overfitting problem of genetic algorithms in defective program repair is solved, and the repair efficiency is improved.

CN115185853BActive Publication Date: 2026-06-19INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INDUSTRIAL AND COMMERCIAL BANK OF CHINA
Filing Date
2022-07-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, genetic algorithms are prone to overfitting in the repair of defective programs, resulting in low repair efficiency.

Method used

By determining the similarity between the defective program and the target similar program, a multi-population genetic algorithm is used to generate code patches. The fitness of individuals is calculated using a preset fitness function, and excellent individuals are transferred to other subpopulations until the algorithm termination condition is met, thus avoiding overfitting and improving repair efficiency.

🎯Benefits of technology

This effectively avoids the overfitting phenomenon of genetic algorithms in defective program repair and improves repair efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115185853B_ABST
    Figure CN115185853B_ABST
Patent Text Reader

Abstract

This invention provides a method and apparatus for repairing defective programs, relating to the field of data processing technology, and applicable to the financial field or other technical fields. The method includes: determining a target similar program corresponding to the defective program, and determining a repair code block in the target similar program corresponding to the repairable program block of the defective program; determining each repairable program block and each repair code block as an initial subpopulation, generating code patches based on a multi-population genetic algorithm, and obtaining test results of the code patches based on test data; calculating the fitness of individuals according to a preset fitness function, and deleting individuals with fitness values ​​greater than a preset threshold from their respective subpopulations and migrating them to other subpopulations. The apparatus executes the above method. The method and apparatus provided by this invention can avoid the overfitting phenomenon that occurs when using genetic algorithms to generate code patches, thereby improving the efficiency of defective program repair.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of data processing technology, and specifically to a method and apparatus for repairing defective programs. Background Technology

[0002] Defective programs are widespread. To improve the efficiency of defective program repair, existing methods use code patches to repair defective programs. Existing technologies also use genetic algorithms to generate code patches. However, genetic algorithms are prone to overfitting. Summary of the Invention

[0003] In view of the problems in the prior art, the present invention provides a method and apparatus for repairing defective programs, which can at least partially solve the problems existing in the prior art.

[0004] On one hand, the present invention proposes a method for repairing defective programs, comprising:

[0005] Identify the target similar program corresponding to the defective program, and identify the repair code block in the target similar program that corresponds to the repairable program block of the defective program;

[0006] Each program block to be repaired and each code block to be repaired are identified as an initial subpopulation. Code patches are generated based on a multi-population genetic algorithm, and the test results of the code patches are obtained based on the test data.

[0007] The fitness of an individual is calculated according to a preset fitness function. Individuals with fitness values ​​greater than a preset threshold are removed from their respective subpopulations and migrated to other subpopulations. The code patch generated by the multi-population genetic algorithm and subsequent steps are then executed until the preset algorithm termination condition is met. The defective program is then repaired using the code patch at this point.

[0008] The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results.

[0009] The step of determining the target similar program corresponding to the defective program includes:

[0010] Determine the initial similar program corresponding to the defective program, and divide the defective program and the initial similar program into blocks respectively to obtain the first key module structure diagram and the second key module structure diagram corresponding to the defective program and the initial similar program respectively;

[0011] Based on the first key module structure diagram and the second key module structure diagram, the similarity between the defective program and the initial similar program is calculated, and the initial similar program with a similarity greater than a preset similarity threshold is determined as the target similar program.

[0012] The step of calculating the similarity between the defective program and the initial similar program based on the first key module structure diagram and the second key module structure diagram includes:

[0013] Based on the parameters of the first key module structure diagram and the second key module structure diagram, the parameter similarity is calculated, and based on the structure of the first key module structure diagram and the second key module structure diagram, the structural similarity is calculated.

[0014] The similarity between the defective program and the initial similar program is calculated based on the parameter similarity, the structural similarity, and their respective weights.

[0015] The step of calculating parameter similarity based on the parameters of the first key module structure diagram and the second key module structure diagram includes:

[0016] Obtain the program parameter type and number of parameters in the start node of the first key module structure diagram and the second key module structure diagram respectively, and calculate the difference in the number of parameters and the number of program parameter types that are the same.

[0017] The parameter similarity is calculated based on the difference in the number of parameters, the number of program parameters of the same type, and their respective weights.

[0018] The step of calculating the structural similarity based on the structures of the first key module structure diagram and the second key module structure diagram includes:

[0019] Obtain the common subgraphs of the first key module structure diagram and the second key module structure diagram respectively, and calculate the structural similarity based on the number of nodes in the common subgraphs and the number of nodes in the target program;

[0020] The target program node number is the number of nodes corresponding to the structure diagram with more nodes in the first key module structure diagram and the second key module structure diagram.

[0021] The defect repair method further includes:

[0022] Individuals with fitness values ​​less than or equal to a preset threshold are not processed.

[0023] The defect repair method further includes:

[0024] If the preset algorithm termination condition is not met, genetic operations will continue to be performed on the subpopulation.

[0025] On one hand, the present invention proposes a defective program repair processing device, comprising:

[0026] The determining unit is used to determine a target similar program corresponding to the defective program, and to determine the repair code block in the target similar program corresponding to the repair code block of the defective program;

[0027] The generation unit is used to determine each program block to be repaired and each code block to be repaired as an initialization subpopulation, generate code patches based on a multi-population genetic algorithm, and obtain the test results of the code patches based on test data;

[0028] The repair unit is used to calculate the fitness of an individual according to a preset fitness function, delete individuals with fitness values ​​greater than a preset threshold from their respective subpopulations, migrate them to other subpopulations, and continue to execute the code patch generated by the multi-population genetic algorithm and subsequent steps until the preset algorithm termination condition is met, and use the code patch at this time to repair the defective program.

[0029] The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results.

[0030] In another aspect, embodiments of the present invention provide an electronic device, including: a processor, a memory, and a bus, wherein,

[0031] The processor and the memory communicate with each other via the bus;

[0032] The memory stores program instructions that can be executed by the processor, and the processor can execute the following methods by calling the program instructions:

[0033] Identify the target similar program corresponding to the defective program, and identify the repair code block in the target similar program that corresponds to the repairable program block of the defective program;

[0034] Each program block to be repaired and each code block to be repaired are identified as an initial subpopulation. Code patches are generated based on a multi-population genetic algorithm, and the test results of the code patches are obtained based on the test data.

[0035] The fitness of an individual is calculated according to a preset fitness function. Individuals with fitness values ​​greater than a preset threshold are removed from their respective subpopulations and migrated to other subpopulations. The code patch generated by the multi-population genetic algorithm and subsequent steps are then executed until the preset algorithm termination condition is met. The defective program is then repaired using the code patch at this point.

[0036] The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results.

[0037] This invention provides a non-transitory computer-readable storage medium, comprising:

[0038] The non-transitory computer-readable storage medium stores computer instructions that cause the computer to perform the following methods:

[0039] Identify the target similar program corresponding to the defective program, and identify the repair code block in the target similar program that corresponds to the repairable program block of the defective program;

[0040] Each program block to be repaired and each code block to be repaired are identified as an initial subpopulation. Code patches are generated based on a multi-population genetic algorithm, and the test results of the code patches are obtained based on the test data.

[0041] The fitness of an individual is calculated according to a preset fitness function. Individuals with fitness values ​​greater than a preset threshold are removed from their respective subpopulations and migrated to other subpopulations. The code patch generated by the multi-population genetic algorithm and subsequent steps are then executed until the preset algorithm termination condition is met. The defective program is then repaired using the code patch at this point.

[0042] The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results.

[0043] The defective program repair processing method and apparatus provided in this invention determine a target similar program corresponding to the defective program, and determine a repair code block in the target similar program corresponding to the program block to be repaired in the defective program; determine each program block to be repaired and each repair code block as an initial subpopulation, generate code patches based on a multi-population genetic algorithm, and obtain the test results of the code patches according to test data; calculate the individual fitness according to a preset fitness function, delete individuals with fitness values ​​greater than a preset threshold from their respective subpopulations, migrate them to other subpopulations, and continue to execute the code patch generation based on the multi-population genetic algorithm and subsequent steps until a preset algorithm termination condition is met, and use the code patch at this time to repair the defective program; wherein, the preset fitness function includes a test factor term; the test factor term is calculated based on the test data and the test results, which can avoid the overfitting phenomenon that occurs when using genetic algorithm to generate code patches, thereby improving the efficiency of defective program repair. Attached Figure Description

[0044] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. In the drawings:

[0045] Figure 1 This is a flowchart illustrating a defective program repair method provided in an embodiment of the present invention.

[0046] Figure 2 This is a schematic diagram illustrating the control flow graph provided in an embodiment of the present invention.

[0047] Figure 3 This is a schematic diagram illustrating the key module structure provided in the embodiments of the present invention.

[0048] Figure 4 This is a flowchart illustrating a defective program repair method provided in another embodiment of the present invention.

[0049] Figure 5 This is a schematic diagram of the structure of a defect program repair processing device provided in an embodiment of the present invention.

[0050] Figure 6 This is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation

[0051] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the embodiments of the present invention will be further described in detail below with reference to the accompanying drawings. Here, the illustrative embodiments and descriptions of the present invention are used to explain the present invention, but are not intended to limit the present invention. It should be noted that, unless otherwise specified, the embodiments and features in the embodiments of this application can be arbitrarily combined with each other.

[0052] Figure 1 This is a flowchart illustrating a defective program repair method provided in an embodiment of the present invention, as shown below. Figure 1 As shown, the defective program repair method provided in this embodiment of the invention includes:

[0053] Step S1: Determine the target similar program corresponding to the defective program, and determine the repair code block in the target similar program corresponding to the repair code block of the defective program.

[0054] Step S2: Determine each program block to be repaired and each code block to be repaired as an initialization subpopulation, generate code patches based on a multi-population genetic algorithm, and obtain the test results of the code patches based on the test data.

[0055] Step S3: Calculate the individual fitness according to the preset fitness function, delete individuals with fitness values ​​greater than the preset threshold from their respective subpopulations, migrate them to other subpopulations, and continue to execute the code patch generated by the multi-population genetic algorithm and subsequent steps until the preset algorithm termination condition is met, and use the code patch at this time to repair the defective program;

[0056] The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results.

[0057] In step S1 above, the device determines a target similar program corresponding to the defective program, and determines a repair code block in the target similar program corresponding to the repairable program block of the defective program. The device can be a computer device executing the method, for example, it may include a server. It should be noted that the acquisition and analysis of data involved in this embodiment of the invention are authorized by the user.

[0058] Defective programs can be instrumented. The test dataset Γ, which fully covers the designed program, is input. The test results determine the program blocks to be repaired, denoted as B1, B2, ..., B. n The repair code blocks in the target-similar program can correspond one-to-one with the program blocks to be repaired mentioned above.

[0059] The process of determining the target similar program corresponding to the defective program includes:

[0060] An initial similar program corresponding to the defective program is determined, and both the defective program and the initial similar program are divided into blocks to obtain a first key module structure diagram and a second key module structure diagram corresponding to the defective program and the initial similar program, respectively. The initial similar program can be understood as a preliminary similar program corresponding to the defective program, which can be determined using existing similar program comparison methods. The method for dividing the defective program and the initial similar program into blocks is the same. Taking the defective program as an example, it is explained as follows:

[0061] like Figure 2 As shown, defective programs can be segmented according to keywords defined in various programming languages ​​to obtain control flow graphs. Keywords can be program statements in programming languages, including conditional statements, jump statements, and loop statements. Taking Java as an example, keywords include if, for, switch, while, and do.

[0062] A control flow graph can be described as G = (V, L, s, e). Here, V is the set of nodes, each node corresponding to a program statement. L is the set of edges, each edge representing the direction of program statement flow. s and e are the start and end points of the control flow graph, respectively.

[0063] like Figure 3 As shown, the first key module structure diagram is constructed based on the control flow graph. It is transformed from the control flow graph and its rules are as follows: the start node contains the program parameter type and the number of parameters; when only one branch in the true / false branches of the keyword node has an executable statement, a new empty node is inserted in the branch without an executable statement; the loop node is decomposed into several keyword nodes; the branch must eventually return to a certain keyword node.

[0064] The following triangle classification program is given as an example, as shown in Table 1. The corresponding control flow graph and key module structure diagram are as follows: Figure 2 and Figure 3 As shown.

[0065] Table 1

[0066]

[0067] Based on the first key module structure diagram and the second key module structure diagram, the similarity between the defective program and the initial similar program is calculated, and the initial similar program with a similarity greater than a preset similarity threshold is determined as the target similar program. The preset similarity threshold can be set independently according to actual conditions.

[0068] The step of calculating the similarity between the defective program and the initial similar program based on the first key module structure diagram and the second key module structure diagram includes:

[0069] Based on the parameters of the first key module structure diagram and the second key module structure diagram, parameter similarity is calculated, and based on the structures of the first key module structure diagram and the second key module structure diagram, structural similarity is calculated; the calculation of parameter similarity based on the parameters of the first key module structure diagram and the second key module structure diagram includes:

[0070] Obtain the program parameter type and number of parameters in the start node of the first key module structure diagram and the second key module structure diagram respectively, and calculate the difference in the number of parameters and the number of program parameter types that are the same; that is, the difference (positive number) between the number of parameters corresponding to the first key module structure diagram and the second key module structure diagram respectively is used as the parameter number difference.

[0071] For example, the first key module structure diagram and the second key module structure diagram each have 10 types of program parameters. Among them, 6 groups of program parameter types are the same, so the number of program parameter types with the same type is 6.

[0072] The parameter similarity is calculated based on the difference in the number of parameters, the number of parameters of the same type, and their respective weights. The difference in the number of parameters (denoted as parNum) and the number of parameters of the same type (denoted as parType) are used as evaluation indicators to calculate the parameter similarity (denoted as parSim), as follows:

[0073]

[0074] Where α and β are the weights corresponding to the difference in the number of parameters and the number of program parameters of the same type, respectively.

[0075] The step of calculating structural similarity based on the structures of the first key module structure diagram and the second key module structure diagram includes:

[0076] The common subgraphs of the first key module structure diagram and the second key module structure diagram are obtained respectively, and the structural similarity is calculated based on the number of nodes in the common subgraph and the number of nodes in the target program. The common subgraph can be understood as the largest common subgraph containing the most common parts of the two programs. The structural similarity structSim is calculated as follows:

[0077]

[0078] Where sameNodeNum is the number of nodes in the common subgraph and nodeMax is the number of nodes in the target program.

[0079] The target program node count is the number of nodes corresponding to the structure diagram with more nodes in the first key module structure diagram and the second key module structure diagram. That is, if the number of nodes in the first key module structure diagram is greater than the number of nodes in the second key module structure diagram, then the target program node count is the number of nodes in the first key module structure diagram.

[0080] If the number of nodes in the second critical module structure diagram is greater than the number of nodes in the first critical module structure diagram, then the number of nodes in the target program is equal to the number of nodes in the second critical module structure diagram.

[0081] Based on the parameter similarity, the structural similarity, and their respective weights, the similarity between the defective program and the initial similar program is calculated. The similarity sim is calculated as follows:

[0082] sim = A × parSim + B × structSim

[0083] Where A and B are the weights corresponding to parameter similarity and structural similarity, respectively.

[0084] In step S2 above, the device determines each program block to be repaired and each repair code block as an initial subpopulation, generates code patches based on a multi-population genetic algorithm, and obtains the test results of the code patches based on test data. Individual migration: In the multi-population genetic algorithm, not only is it determined whether an individual in the current subpopulation is the best, but excellent individuals can also be periodically migrated to other subpopulations.

[0085] For the subspecies cluster pop = {pop1, pop2, ..., pop...} n Pop the i-th (1≤i≤n) subpopulation. i ={l i1 ,l i2 ,…,l im The individual l in} ij (1≤j≤m), except that it is necessary to determine whether it corresponds to the fitness function max(F) i The optimal solution can also periodically migrate superior individuals to other subpopulations to participate in evolution, determining whether an individual is a member of another subpopulation corresponding to the fitness function max(F). k The optimal solution for (1≤k≤n and k≠i).

[0086] Each of the defective program blocks B that needs to be repaired i (i∈{1...n}), and n corresponding repair code blocks in the target-similar program, are used as the initial subpopulation (n subpopulations). The multi-population genetic algorithm starts iterative calculation based on these initial subpopulations. Each time a calculation is completed, a code patch (individual) is generated, denoted as C. n The test results of the code patch are obtained based on the test data, N tot N represents the total number of test data sets, i.e., the total number of data sets in the test dataset Γ mentioned above. suc This indicates the number of successful test results. Based on the test results and test data, a preset fitness function is constructed as follows:

[0087]

[0088] Among them, F(C n ) represents the preset fitness function, and distance represents C. n With B i Branch distance, approach represents C n With B i The layer proximity and test factor items are a and b are the corresponding weights.

[0089] In step S3 above, the device calculates the fitness of individuals according to a preset fitness function. Individuals with fitness values ​​greater than a preset threshold are removed from their respective subpopulations, migrated to other subpopulations, and the code patch generation based on the multi-population genetic algorithm and subsequent steps continue to be executed until the preset algorithm termination condition is met. The defective program is then repaired using the code patch at this point. The preset threshold can be set independently based on actual conditions; that is, the total number of subpopulations is n, denoted as d1…dn. If, during the first execution of individual fitness calculation, an individual with a fitness value greater than the preset threshold is in d1, this individual is removed from d1, migrated to d2, and the code patch generation based on the multi-population genetic algorithm and subsequent steps are executed again. After n iterations, it indicates that all subpopulations have generated individuals with fitness values, thus satisfying the preset algorithm termination condition. The preset algorithm termination condition can also include the maximum iteration time or the maximum number of iterations, both of which can be set independently based on actual conditions.

[0090] The preset fitness function includes a test factor term; the test factor term is calculated based on the test data and the test results. Refer to the above description; further details are omitted. Figure 4 As shown, the defective program repair method further includes:

[0091] Individuals with fitness values ​​less than or equal to a preset threshold are not processed.

[0092] The defective program repair method also includes:

[0093] If the preset algorithm termination condition is not met, genetic operations will continue to be performed on the subpopulation. Genetic operations may include crossover, mutation, etc., on the subpopulation.

[0094] This invention implements automatic defect repair strategies based on individual migration and similar code reuse to repair defective programs. The relevant terminology is explained below:

[0095] Automatic defect repair: Without human intervention, the computer automatically generates the correct repair package to fix the bugs in the target software.

[0096] Individual migration: In multi-population genetic algorithms, it is not only necessary to determine whether an individual in the current population is the best, but also to migrate some individuals in the current population to other populations at certain intervals.

[0097] Reuse: If program a is similar to program b, the existing test data of program b can be reused as heuristic data for program a to guide the generation of test data for program a.

[0098] First, the program is divided into blocks based on the keywords in the code, and a key module structure diagram is constructed. The structural features and parameter features of the code are used as similarity indicators to determine the similarity between the code to be repaired and the existing code. Second, similar code is used as heuristic information (initial population) to guide multi-population genetic algorithms to generate code patches. During the algorithm evolution process, excellent individuals periodically migrate to other populations.

[0099] Existing technologies suffer from low patch generation efficiency due to the lack of guidance from the initial population. Therefore, this invention proposes a strategy that reuses program code with similar structure and parameters to guide patch generation, and enriches the population genes in the genetic algorithm through individual migration to avoid overfitting. This strategy aims to generate patches that can more accurately locate and repair defects.

[0100] The beneficial effects of the embodiments of the present invention are as follows:

[0101] 1) Propose a key module structure diagram to facilitate the calculation of the similarity between the defective program and other programs.

[0102] The program is divided into key sections, and a key module structure diagram is constructed. This diagram provides a convenient and intuitive way to reflect the similarity between the defective program and other programs. When the similarity between two programs exceeds a threshold, it indicates the existence of a similar program to the defective one. The code blocks of the defective program and the code blocks of the similar programs can be used as an initial population to evolve and generate patches.

[0103] 2) Utilize individual migration to avoid overfitting in subpopulation evolution.

[0104] This paper utilizes individual migration optimization to generate defective program patches using a multi-population genetic algorithm. When the number of defects to be repaired is large, the effect of using only a standard multi-population genetic algorithm to generate patches is not good. After multiple generations of evolution, the genes of the population tend to be similar, and the algorithm may overfit, potentially requiring more time to generate patches. Therefore, after generating a patch for the target defect in a certain subpopulation, individuals in that subpopulation need to migrate to other subpopulations to enrich the gene diversity of those subpopulations and inject superior genes.

[0105] The defective program repair method provided in this embodiment of the invention determines a target similar program corresponding to the defective program, and determines a repair code block in the target similar program corresponding to the program block to be repaired in the defective program; determines each program block to be repaired and each repair code block as an initial subpopulation, generates code patches based on a multi-population genetic algorithm, and obtains the test results of the code patches based on test data; calculates the individual fitness according to a preset fitness function, deletes individuals with fitness values ​​greater than a preset threshold from their respective subpopulations, migrates them to other subpopulations, and continues to execute the code patch generation based on the multi-population genetic algorithm and subsequent steps until a preset algorithm termination condition is met, and uses the code patch at this time to repair the defective program; wherein, the preset fitness function includes a test factor term; the test factor term is calculated based on the test data and the test results, which can avoid the overfitting phenomenon that occurs when using genetic algorithms to generate code patches, thereby improving the efficiency of defective program repair.

[0106] Furthermore, determining the target similar program corresponding to the defective program includes:

[0107] The initial similar program corresponding to the defective program is determined, and the defective program and the initial similar program are divided into blocks respectively to obtain the first key module structure diagram and the second key module structure diagram corresponding to the defective program and the initial similar program respectively; the above description is as described above and will not be repeated.

[0108] Based on the first key module structure diagram and the second key module structure diagram, the similarity between the defective program and the initial similar program is calculated, and the initial similar program with a similarity greater than a preset similarity threshold is determined as the target similar program. This can be referred to the above description and will not be repeated here.

[0109] The defective program repair method provided in this embodiment of the invention uses similar target programs as heuristic information (initial population), which can further improve the efficiency of defective program repair.

[0110] Further, the step of calculating the similarity between the defective program and the initial similar program based on the first key module structure diagram and the second key module structure diagram includes:

[0111] Based on the parameters of the first key module structure diagram and the second key module structure diagram, the parameter similarity is calculated, and based on the structure of the first key module structure diagram and the second key module structure diagram, the structural similarity is calculated; the above description is provided and will not be repeated here.

[0112] The similarity between the defective program and the initial similar program is calculated based on the parameter similarity, the structural similarity, and their respective weights. This can be referred to the above description and will not be repeated here.

[0113] The defective program repair method provided in this embodiment of the invention, based on the parameters and structure of the key module structure diagram, helps to improve the similarity of similar programs and can further improve the efficiency of defective program repair.

[0114] Further, the step of calculating parameter similarity based on the parameters of the first key module structure diagram and the second key module structure diagram includes:

[0115] Obtain the program parameter types and number of parameters in the start nodes of the first key module structure diagram and the second key module structure diagram respectively, and calculate the difference in the number of parameters and the number of program parameter types that are the same; refer to the above description, and will not be repeated here.

[0116] The parameter similarity is calculated based on the difference in the number of parameters, the number of parameters of the same type, and their respective weights. This can be referred to the above explanation and will not be repeated here.

[0117] The defective program repair method provided in this embodiment of the invention, based on the parameters and structure of the key module structure diagram, helps to improve the parameter similarity of similar programs and can further improve the efficiency of defective program repair.

[0118] Further, the step of calculating structural similarity based on the structures of the first key module structure diagram and the second key module structure diagram includes:

[0119] Obtain the common subgraphs of the first key module structure diagram and the second key module structure diagram respectively, and calculate the structural similarity based on the number of nodes in the common subgraph and the number of nodes in the target program; refer to the above description, and will not be repeated here.

[0120] The target program node number refers to the number of nodes in the structure diagram with more nodes in the first key module structure diagram and the second key module structure diagram. This can be referred to the above explanation and will not be repeated here.

[0121] The defective program repair method provided in this embodiment of the invention, based on the parameters and structure of the key module structure diagram, helps to improve the structural similarity of similar programs and can further improve the efficiency of defective program repair.

[0122] Furthermore, the defective program repair method also includes:

[0123] Individuals with fitness values ​​less than or equal to a preset threshold are not processed. Refer to the above explanation; further details are omitted.

[0124] The defective program repair method provided in this embodiment of the invention can further improve the efficiency of defective program repair.

[0125] Furthermore, the defective program repair method also includes:

[0126] If the preset algorithm termination condition is not met, genetic operations will continue to be performed on the subpopulation. Refer to the above explanation; further details are omitted.

[0127] The defective program repair method provided in this embodiment of the invention can further improve the efficiency of defective program repair.

[0128] It should be noted that the defective program repair method provided in this embodiment of the invention can be used in the financial field, or in any technical field other than the financial field. This embodiment of the invention does not limit the application field of the defective program repair method.

[0129] Figure 5 This is a schematic diagram of the structure of a defect program repair processing device provided in an embodiment of the present invention, as shown below. Figure 5 As shown, the defective program repair processing apparatus provided in this embodiment of the invention includes a determining unit 501, a generating unit 502, and a repair unit 503, wherein:

[0130] The determining unit 501 is used to determine the target similar program corresponding to the defective program, and to determine the repair code block in the target similar program corresponding to the repairable program block of the defective program; the generating unit 502 is used to determine each repairable program block and each repair code block as an initial subpopulation, generate code patches based on a multi-population genetic algorithm, and obtain the test results of the code patches according to the test data; the repair unit 503 is used to calculate the individual fitness according to a preset fitness function, delete individuals with fitness values ​​greater than a preset threshold from their respective subpopulations, migrate them to other subpopulations, and continue to execute the code patch generation based on the multi-population genetic algorithm and subsequent steps until the preset algorithm termination condition is met, and use the code patch at this time to repair the defective program; wherein, the preset fitness function includes a test factor item; the test factor item is calculated based on the test data and the test results.

[0131] Specifically, the determining unit 501 in the device is used to determine the target similar program corresponding to the defective program, and to determine the repair code block in the target similar program corresponding to the repairable program block of the defective program; the generating unit 502 is used to determine each repairable program block and each repair code block as an initial subpopulation, generate code patches based on a multi-population genetic algorithm, and obtain the test results of the code patches according to test data; the repair unit 503 is used to calculate the individual fitness according to a preset fitness function, delete individuals with fitness values ​​greater than a preset threshold from their respective subpopulations, migrate them to other subpopulations, and continue to execute the code patch generation based on the multi-population genetic algorithm and subsequent steps until the preset algorithm termination condition is met, and use the code patch at this time to repair the defective program; wherein, the preset fitness function includes a test factor item; the test factor item is calculated based on the test data and the test results.

[0132] The defective program repair processing apparatus provided in this embodiment of the invention determines a target similar program corresponding to the defective program, and determines a repair code block in the target similar program corresponding to the program block to be repaired in the defective program; determines each program block to be repaired and each repair code block as an initial subpopulation, generates code patches based on a multi-population genetic algorithm, and obtains the test results of the code patches based on test data; calculates the individual fitness according to a preset fitness function, deletes individuals with fitness values ​​greater than a preset threshold from their respective subpopulations, migrates them to other subpopulations, and continues to execute the code patch generation based on the multi-population genetic algorithm and subsequent steps until a preset algorithm termination condition is met, and uses the code patch at this time to repair the defective program; wherein, the preset fitness function includes a test factor term; the test factor term is calculated based on the test data and the test results, which can avoid the overfitting phenomenon that occurs when using genetic algorithms to generate code patches, thereby improving the efficiency of defective program repair.

[0133] Furthermore, the determining unit 501 is specifically used for:

[0134] Determine the initial similar program corresponding to the defective program, and divide the defective program and the initial similar program into blocks respectively to obtain the first key module structure diagram and the second key module structure diagram corresponding to the defective program and the initial similar program respectively;

[0135] Based on the first key module structure diagram and the second key module structure diagram, the similarity between the defective program and the initial similar program is calculated, and the initial similar program with a similarity greater than a preset similarity threshold is determined as the target similar program.

[0136] The defective program repair processing device provided in this embodiment of the invention uses target similar programs as heuristic information (initial population), which can further improve the efficiency of defective program repair.

[0137] Furthermore, the determining unit 501 is also specifically used for:

[0138] Based on the parameters of the first key module structure diagram and the second key module structure diagram, the parameter similarity is calculated, and based on the structure of the first key module structure diagram and the second key module structure diagram, the structural similarity is calculated.

[0139] The similarity between the defective program and the initial similar program is calculated based on the parameter similarity, the structural similarity, and their respective weights.

[0140] The defective program repair processing device provided in this embodiment of the invention, based on the parameters and structure of the key module structure diagram, helps to improve the similarity of similar programs and can further improve the efficiency of defective program repair.

[0141] Furthermore, the determining unit 501 is also specifically used for:

[0142] Obtain the program parameter type and number of parameters in the start node of the first key module structure diagram and the second key module structure diagram respectively, and calculate the difference in the number of parameters and the number of program parameter types that are the same.

[0143] The parameter similarity is calculated based on the difference in the number of parameters, the number of program parameters of the same type, and their respective weights.

[0144] The defective program repair processing device provided in this embodiment of the invention, based on the parameters and structure of the key module structure diagram, helps to improve the parameter similarity of similar programs and can further improve the efficiency of defective program repair.

[0145] Furthermore, the determining unit 501 is also specifically used for:

[0146] Obtain the common subgraphs of the first key module structure diagram and the second key module structure diagram respectively, and calculate the structural similarity based on the number of nodes in the common subgraphs and the number of nodes in the target program;

[0147] The target program node number is the number of nodes corresponding to the structure diagram with more nodes in the first key module structure diagram and the second key module structure diagram.

[0148] The defective program repair processing device provided in this embodiment of the invention, based on the parameters and structure of the key module structure diagram, helps to improve the structural similarity of similar programs and can further improve the efficiency of defective program repair.

[0149] Furthermore, the defect repair processing device is also used for:

[0150] Individuals with fitness values ​​less than or equal to a preset threshold are not processed.

[0151] The defect program repair processing device provided in this embodiment of the invention can further improve the efficiency of defect program repair.

[0152] Furthermore, the defect repair processing device is also used for:

[0153] If the preset algorithm termination condition is not met, genetic operations will continue to be performed on the subpopulation.

[0154] The defect program repair processing device provided in this embodiment of the invention can further improve the efficiency of defect program repair.

[0155] The embodiments of the present invention provide a defective program repair processing device that can be used to execute the processing flow of the above-described method embodiments. Its functions will not be repeated here, but can be referred to the detailed description of the above-described method embodiments.

[0156] Figure 6 This is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present invention, such as... Figure 6 As shown, the electronic device includes: a processor 601, a memory 602, and a bus 603;

[0157] The processor 601 and the memory 602 communicate with each other via the bus 603.

[0158] The processor 601 is used to call program instructions in the memory 602 to execute the methods provided in the above-described method embodiments, including, for example:

[0159] Identify the target similar program corresponding to the defective program, and identify the repair code block in the target similar program that corresponds to the repairable program block of the defective program;

[0160] Each program block to be repaired and each code block to be repaired are identified as an initial subpopulation. Code patches are generated based on a multi-population genetic algorithm, and the test results of the code patches are obtained based on the test data.

[0161] The fitness of an individual is calculated according to a preset fitness function. Individuals with fitness values ​​greater than a preset threshold are removed from their respective subpopulations and migrated to other subpopulations. The code patch generated by the multi-population genetic algorithm and subsequent steps are then executed until the preset algorithm termination condition is met. The defective program is then repaired using the code patch at this point.

[0162] The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results.

[0163] This embodiment discloses a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions, and when the program instructions are executed by a computer, the computer can perform the methods provided in the above-described method embodiments, such as:

[0164] Identify the target similar program corresponding to the defective program, and identify the repair code block in the target similar program that corresponds to the repairable program block of the defective program;

[0165] Each program block to be repaired and each code block to be repaired are identified as an initial subpopulation. Code patches are generated based on a multi-population genetic algorithm, and the test results of the code patches are obtained based on the test data.

[0166] The fitness of an individual is calculated according to a preset fitness function. Individuals with fitness values ​​greater than a preset threshold are removed from their respective subpopulations and migrated to other subpopulations. The code patch generated by the multi-population genetic algorithm and subsequent steps are then executed until the preset algorithm termination condition is met. The defective program is then repaired using the code patch at this point.

[0167] The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results.

[0168] This embodiment provides a computer-readable storage medium storing a computer program that causes the computer to execute the methods provided in the above-described method embodiments, including, for example:

[0169] Identify the target similar program corresponding to the defective program, and identify the repair code block in the target similar program that corresponds to the repairable program block of the defective program;

[0170] Each program block to be repaired and each code block to be repaired are identified as an initial subpopulation. Code patches are generated based on a multi-population genetic algorithm, and the test results of the code patches are obtained based on the test data.

[0171] The fitness of an individual is calculated according to a preset fitness function. Individuals with fitness values ​​greater than a preset threshold are removed from their respective subpopulations and migrated to other subpopulations. The code patch generated by the multi-population genetic algorithm and subsequent steps are then executed until the preset algorithm termination condition is met. The defective program is then repaired using the code patch at this point.

[0172] The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results.

[0173] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0174] This invention is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0175] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0176] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0177] In the description of this specification, the references to terms such as "an embodiment," "a specific embodiment," "some embodiments," "for example," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0178] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.

Claims

1. A method for repairing defective programs, characterized in that, include: Identify the target similar program corresponding to the defective program, and identify the repair code block in the target similar program that corresponds to the repairable program block of the defective program; Each program block to be repaired and each code block to be repaired are identified as an initial subpopulation. Code patches are generated based on a multi-population genetic algorithm, and the test results of the code patches are obtained based on the test data. The fitness of an individual is calculated according to a preset fitness function. Individuals with fitness values ​​greater than a preset threshold are removed from their respective subpopulations and migrated to other subpopulations. The code patch generated by the multi-population genetic algorithm and subsequent steps are then executed until the preset algorithm termination condition is met. The defective program is then repaired using the code patch at this point. The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results. The step of determining the target similar program corresponding to the defective program includes: Determine the initial similar program corresponding to the defective program, and divide the defective program and the initial similar program into blocks respectively to obtain the first key module structure diagram and the second key module structure diagram corresponding to the defective program and the initial similar program respectively; Based on the parameters of the first key module structure diagram and the second key module structure diagram, the parameter similarity is calculated, and based on the structure of the first key module structure diagram and the second key module structure diagram, the structural similarity is calculated. Based on the parameter similarity, the structural similarity, and their respective weights, the similarity between the defective program and the initial similar program is calculated, and the initial similar program with a similarity greater than a preset similarity threshold is determined as the target similar program.

2. The defective program repair method according to claim 1, characterized in that, The step of calculating parameter similarity based on the parameters of the first key module structure diagram and the second key module structure diagram includes: Obtain the program parameter type and number of parameters in the start node of the first key module structure diagram and the second key module structure diagram respectively, and calculate the difference in the number of parameters and the number of program parameter types that are the same. The parameter similarity is calculated based on the difference in the number of parameters, the number of program parameters of the same type, and their respective weights.

3. The defective program repair method according to claim 1, characterized in that, The step of calculating structural similarity based on the structures of the first key module structure diagram and the second key module structure diagram includes: Obtain the common subgraphs of the first key module structure diagram and the second key module structure diagram respectively, and calculate the structural similarity based on the number of nodes in the common subgraphs and the number of nodes in the target program; The target program node number is the number of nodes corresponding to the structure diagram with more nodes in the first key module structure diagram and the second key module structure diagram.

4. The defective program repair method according to any one of claims 1 to 3, characterized in that, The defective program repair method also includes: Individuals with fitness values ​​less than or equal to a preset threshold are not processed.

5. The defective program repair method according to any one of claims 1 to 3, characterized in that, The defective program repair method also includes: If the preset algorithm termination condition is not met, genetic operations will continue to be performed on the subpopulation.

6. A defective program repair and processing device, characterized in that, include: The determining unit is used to determine a target similar program corresponding to the defective program, and to determine the repair code block in the target similar program corresponding to the repair code block of the defective program; The generation unit is used to determine each program block to be repaired and each code block to be repaired as an initialization subpopulation, generate code patches based on a multi-population genetic algorithm, and obtain the test results of the code patches based on test data; The repair unit is used to calculate the fitness of an individual according to a preset fitness function, delete individuals with fitness values ​​greater than a preset threshold from their respective subpopulations, migrate them to other subpopulations, and continue to execute the code patch generated by the multi-population genetic algorithm and subsequent steps until the preset algorithm termination condition is met, and use the code patch at this time to repair the defective program. The preset fitness function includes a test factor; the test factor is calculated based on the test data and the test results. Specifically, the determining unit is used to: determine an initial similar program corresponding to the defective program, and divide the defective program and the initial similar program into blocks to obtain a first key module structure diagram and a second key module structure diagram corresponding to the defective program and the initial similar program, respectively; calculate parameter similarity based on the parameters of the first key module structure diagram and the second key module structure diagram, and calculate structural similarity based on the structure of the first key module structure diagram and the second key module structure diagram; calculate the similarity between the defective program and the initial similar program based on the parameter similarity, the structural similarity, and their respective weights, and determine the initial similar program with a similarity greater than a preset similarity threshold as the target similar program.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 5.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 5.

Citation Information

Patent Citations

  • Fuzz testing system on basis of multi-swarm collaboration evolution genetic algorithm

    CN103914383A

  • Reuse method of test cases between similar programs and implementation system thereof

    CN110262957A