Symbolic execution based cross-architecture binary executable file vulnerability detection method and system

By constructing a reference tree and using a taint propagation algorithm to detect embedded firmware vulnerabilities, the problem of firmware vulnerability detection is solved, and efficient vulnerability discovery is achieved.

CN115344866BActive Publication Date: 2026-06-30STATE GRID HEILONGJIANG ELECTRIC POWER COMPANY +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
STATE GRID HEILONGJIANG ELECTRIC POWER COMPANY
Filing Date
2022-07-20
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies are insufficient to effectively detect vulnerabilities in embedded firmware, especially due to the difficulty in obtaining firmware source code and the challenges in security protection caused by the diversity of processor architectures.

Method used

A cross-architecture binary executable vulnerability detection method based on symbolic execution is adopted. By constructing a reference tree, setting taint sources, using taint propagation algorithms and state functions, it detects whether function parameters are affected by tainted data and determines whether there are potential risks in function calls.

Benefits of technology

It has achieved firmware vulnerability detection for different processor architectures, with a recall rate and accuracy rate of over 90%, effectively discovering potential vulnerability risks.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115344866B_ABST
    Figure CN115344866B_ABST
Patent Text Reader

Abstract

The application discloses a cross-architecture binary executable file vulnerability detection method and system based on symbolic execution, and the method comprises the following steps: constructing a reference tree based on a call graph, inputting an executable file to be analyzed and a target function into the reference tree for reference tree analysis; setting a pollution source as a buffer of a socket data receiving function to complete pollution source data marking; introducing a state function to record a pollution analysis process from a node i to a target function in a reference chain based on a pollution propagation algorithm of backward symbolic execution, obtaining a data analysis result of a pollution sink point; extracting actual parameters of the target function to construct the pollution sink point into a pollution sink point forest, and traversing each tree in the pollution sink point forest to detect whether values on each node are polluted. k The method detects whether function parameters are affected by pollution data at a pre-defined frequently-occurring vulnerability function to determine whether potential risks exist in this function call under the path.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of software testing and vulnerability discovery, and in particular to a method for detecting specific vulnerabilities in binary executable files of desktop application software and embedded firmware. Background Technology

[0002] The development of computer technology has enabled various embedded computers to penetrate into all aspects of life. Firmware is special software that runs in embedded systems. Due to limitations such as system resources and programming methods, it is difficult to protect firmware with common mitigation measures. Furthermore, the high privileges and difficulty in updating firmware make firmware vulnerabilities extremely dangerous. Therefore, there is an urgent need for an effective way to detect and fix vulnerabilities in firmware in a timely manner. Summary of the Invention

[0003] The present invention aims to at least partially solve one of the technical problems in the related art.

[0004] Therefore, the first objective of this invention is to propose a cross-architecture binary executable vulnerability detection method based on symbolic execution.

[0005] The second objective of this invention is to propose a cross-architecture binary executable vulnerability detection system based on symbolic execution.

[0006] The third objective of this invention is to provide a computer device.

[0007] A fourth objective of this invention is to provide a non-transitory computer-readable storage medium.

[0008] To achieve the above objectives, one embodiment of the present invention proposes a cross-architecture binary executable vulnerability detection method based on symbolic execution, comprising the following steps: Step S1, constructing a reference tree based on the call graph, and inputting the executable file to be analyzed and the target function into the reference tree for reference tree parsing; Step S2, setting the taint source as the buffer of the socket data receiving function to complete the taint source data marking; Step S3, based on the taint propagation algorithm of backward symbolic execution of the call chain, introducing a state function to record the node i in the reference chain in the reference tree. k The taint analysis process reaches the objective function and obtains the data analysis results of the taint convergence points; Step S4, extract the actual parameters of the objective function and construct the taint convergence points into a taint convergence point forest. Traverse each tree in the taint convergence point forest and detect whether the value on each node is tainted.

[0009] The cross-architecture binary executable vulnerability detection method based on symbolic execution in this invention addresses the challenges of obtaining firmware source code and the diversity of processor architectures. It marks network data as taint sources, uses symbolic execution methods for taint tracking, and detects whether function parameters at predefined vulnerability-prone functions are affected by tainted data to determine whether there is a potential risk in the function call under that path.

[0010] In addition, the cross-architecture binary executable vulnerability detection method based on symbolic execution according to the above embodiments of the present invention may also have the following additional technical features:

[0011] Furthermore, in one embodiment of the present invention, the node set of the reference tree is a set of tuples (function, call address), with (x, null) as the root node. The reference tree contains a directed edge from (i, addr1) to (j, addr2) if and only if the following two conditions are met simultaneously:

[0012] (1) There exists a directed edge from j to i in the call graph, and the call address corresponding to the edge is addr2;

[0013] (2) The function j is not included in the function set corresponding to the node set formed by the path from the root node to (i, addr1).

[0014] Furthermore, in one embodiment of the present invention, the reference tree parsing process is as follows: the executable file to be analyzed and the target function are input into the reference tree, the root node of the reference tree is initialized to (f, null), each reference function of the current function is parsed and traversed, wherein the Ref function is implemented using the binary function cross-reference parsing function of IDAPython.

[0015] Furthermore, in one embodiment of the present invention, step S2 uses the function hooking functionality of the angr framework to fill the buffer used to receive messages with symbolic values ​​marked as dangerous.

[0016] Further, in one embodiment of the present invention, step S3 specifically includes: step S301, the taint analysis function in the taint propagation algorithm based on backward symbolic execution of the call chain tracks each reference chain in the reference tree and analyzes the security of the reference chain; step S302, the tracking function in the taint propagation algorithm based on backward symbolic execution of the call chain is responsible for updating the state function according to the return result of the exploration function, and completing the classification of the security of the call chain; step S303, the exploration function in the taint propagation algorithm based on backward symbolic execution of the call chain uses the symbolic execution engine to extract the reference chain from node i. k Reach the objective function and return the data analysis results of the taint convergence point.

[0017] Further, in one embodiment of the present invention, the state functions include UNKNOWN, CONTINUE, USEUNINITDATA, DANGEROUS, and SAFE, wherein UNKNOWN is the unresolved state of node i; CONTINUE uses node i as the starting point of symbolic execution, and when it reaches the target function, its parameters contain the symbolic values ​​initially used as input parameters, but do not contain uninitialized symbolic values ​​or symbolic values ​​marked as dangerous; USEUNINITDATA uses node i as the starting point of symbolic execution, and when it reaches the target function, its parameters contain uninitialized symbolic values, i.e., an illegal memory address has been accessed, but do not contain symbolic values ​​marked as dangerous; DANGEROUS uses node i as the starting point of symbolic execution, and when it reaches the target function, its parameters contain symbolic values ​​marked as dangerous; SAFE uses node i as the starting point of symbolic execution, and when it reaches the target function, all its parameters are specific values.

[0018] Furthermore, in one embodiment of the present invention, the initial state of the state function is:

[0019]

[0020] Where i is a node and x is the objective function.

[0021] To achieve the above objectives, another embodiment of the present invention proposes a cross-architecture binary executable vulnerability detection system based on symbolic execution, comprising: a construction and parsing module, used to construct a reference tree based on a call graph, and input the executable file to be analyzed and the target function into the reference tree for reference tree parsing; a marking module, used to set taint sources as buffers of socket data receiving functions to complete taint source data marking; and a taint analysis module, used to introduce a state function to record the node i in the reference tree from the reference chain based on a taint propagation algorithm of backward symbolic execution of the call chain. k The process of taint analysis reaches the objective function and obtains the data analysis results of taint convergence points; the detection module is used to extract the actual parameters of the objective function, construct the taint convergence points into a taint convergence point forest, traverse each tree in the taint convergence point forest, and detect whether the value on each node is tainted.

[0022] The cross-architecture binary executable vulnerability detection system based on symbolic execution in this invention addresses the challenges of obtaining firmware source code and the diversity of processor architectures. It uses network data as taint sources, employs symbolic execution methods for taint tracking, and detects whether function parameters at predefined vulnerability-prone functions are affected by tainted data to determine whether there is a potential risk in the function call under that path.

[0023] Another aspect of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the cross-architecture binary executable vulnerability detection method based on symbolic execution as described in the above embodiments.

[0024] Another aspect of the present invention provides a non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the cross-architecture binary executable vulnerability detection method based on symbolic execution as described in the above embodiments.

[0025] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description

[0026] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:

[0027] Figure 1 This is a flowchart of a cross-architecture binary executable vulnerability detection method based on symbolic execution according to an embodiment of the present invention;

[0028] Figure 2 This is a schematic diagram of a reference tree parsing algorithm according to an embodiment of the present invention;

[0029] Figure 3 This is a schematic diagram of a taint analysis function according to an embodiment of the present invention;

[0030] Figure 4 This is a schematic diagram of a tracking function according to an embodiment of the present invention;

[0031] Figure 5 This is a schematic diagram of an exploration function according to an embodiment of the present invention;

[0032] Figure 6 This is a schematic diagram of the structure of a cross-architecture binary executable vulnerability detection system based on symbolic execution, according to an embodiment of the present invention. Detailed Implementation

[0033] Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present invention, and should not be construed as limiting the present invention.

[0034] The following describes, with reference to the accompanying drawings, a method and system for detecting vulnerabilities in cross-architecture binary executable files based on symbolic execution, according to embodiments of the present invention. First, the method for detecting vulnerabilities in cross-architecture binary executable files based on symbolic execution, according to embodiments of the present invention, will be described with reference to the accompanying drawings.

[0035] Figure 1 This is a flowchart of a cross-architecture binary executable vulnerability detection method based on symbolic execution, according to an embodiment of the present invention.

[0036] like Figure 1 As shown, this method for detecting cross-architecture binary executable vulnerabilities based on symbolic execution includes the following steps:

[0037] In step S1, a reference tree is constructed based on the call graph, and the executable file and target function to be analyzed are input into the reference tree for reference tree parsing.

[0038] Specifically, a reference relationship is defined as follows: if function i calls function j, then function j is said to be referenced by function i. The call graph (CG) of a program is a graph describing the function call relationships in the program. The set of nodes is the set of functions. There exists a directed edge from node i to node j if and only if function i calls function j. Multiple edges may point from node i to node j, representing possible multiple calls from i to j at different addresses. The reference tree (RT) for function x is also shown. x Based on CG, the node set of the reference tree is a set of tuples (function, call address), with (x, null) as the root node. A directed edge exists in the reference tree from (i, addr1) to (j, addr2) if and only if the following two conditions are met simultaneously:

[0039] (1) There exists a directed edge from j to i in the call graph, and the call address corresponding to the edge is addr2;

[0040] (2) The function j is not included in the function set corresponding to the node set formed by the path from the root node to (i, addr1).

[0041] In RT, each path from the root node to the leaf node represents an execution path in the program from the top-level caller (i.e., the leaf node) to the target function (i.e., the root node), and is called a reference chain. The definition of this reference tree is a simplification of the reference graph of function x. The reference tree ignores recursive calls of functions, i.e., i→i, and circular calls, i.e., i→j→i. The reasons for this are: (1) The simplification makes the length of any path of any function finite, reducing the complexity of the analysis; (2) Generally speaking, recursive calls and circular calls are usually used to implement some kind of algorithm rather than logical function, and ignoring them in vulnerability analysis usually does not affect the final result.

[0042] like Figure 2 As shown, the algorithm accepts two inputs: the executable file B to be analyzed and the target function f. First, the root node of the reference tree is initialized to (f, null). Note that the second component of the tuple represents the call address to the parent node function, hence this value is null for the root node. Next, each reference function of the current function x is parsed and traversed. This Ref function is implemented using IDAPython's binary function cross-reference parsing feature. IDAPython is the Python language interface for IDA Pro. IDA Pro is a decompilation product developed by Interactive Disassembler. As a disassembler, IDA Pro can display the actual binary instructions executed by the processor in symbolic representation (assembly language), and can even generate pseudo-high-level code, improving code readability. A cross-reference is defined as a cross-reference between address a1 and address a2. There are generally two types: data cross-references and code cross-references. Table 1 shows several common cross-reference types. For reference tree parsing, we are mainly concerned with code cross-references such as far / near calls / jumps. Then, reference functions not on the path between x and the root node are added to the reference tree. The above process is executed recursively until the reference tree parsing is complete.

[0043] Table 1 Cross-reference types

[0044]

[0045] In step S2, the taint source is set as the buffer of the socket data receive function to complete the taint source data marking.

[0046] Furthermore, in one embodiment of the present invention, step S2 uses the function hooking functionality of the angr framework to fill the buffer used to receive messages with symbolic values ​​marked as dangerous.

[0047] Specifically, network input from sockets is one of the primary input sources for firmware programs. However, since network data originates from outside the device, its security cannot be guaranteed. Therefore, this embodiment of the invention sets the taint source as the buffer of the socket data receiving function. Table 2 lists the functions in the C language library used to receive messages from sockets. The `recv` function is typically used on connection-oriented sockets, such as the TCP protocol. The `recvfrom` and `recvmsg` functions are typically used on connectionless sockets, such as the UDP protocol.

[0048] Table 2 shows the functions for receiving messages from a socket.

[0049]

[0050]

[0051] Taint source data marking is implemented using the function hooking feature of the angr framework. The angr framework provides a mechanism to replace (hook) a function or library function at a specific address in the program using a custom or predefined Python function. When the program calls the function at the corresponding address, control flow is transferred to that Python function instead of the function called in the original program. Upon return, control flow is returned to the original program, thus achieving function rewriting or optimization. For the taint source data marking problem, this invention implements hook functions for the three message receiving functions in Table 1. The logic is as follows: the buffer used for receiving messages (i.e., the memory area pointed to by the void*buf pointer) is filled with symbolic values ​​marked as dangerous, and the filling length is given by the size_t len ​​parameter.

[0052] In step S3, based on the taint propagation algorithm of backward symbolic execution of the call chain, a state function is introduced to record the node i in the reference chain of the reference tree. k The taint analysis process reaches the objective function and obtains the data analysis results of the taint convergence point.

[0053] Furthermore, in one embodiment of the present invention, step S3 specifically includes:

[0054] Step S301: The taint analysis function in the taint propagation algorithm based on backward symbolic execution of the call chain tracks each reference chain in the reference tree and analyzes the security of the reference chain;

[0055] In step S302, the tracing function in the taint propagation algorithm based on backward symbolic execution of the call chain is responsible for updating the state function according to the return result of the exploration function, thus completing the classification of the call chain security.

[0056] Step S303, the exploration function in the taint propagation algorithm based on backward symbolic execution of the call chain uses the symbolic execution engine to start from node i in the reference chain. k Reach the objective function and return the data analysis results of the taint convergence point.

[0057] A state function is introduced to record the analysis results of nodes in the reference tree during a single taint analysis process. For the reference tree RT of function x... x The state function S(i) maps each node i to one of five states, the names and meanings of which are as follows:

[0058] (1) UNKNOWN: The state of function i has not been resolved yet;

[0059] (2) CONTINUE: When function i is the starting point of symbolic execution, when x is reached, its parameters contain the symbolic values ​​that were initially input as parameters, and do not contain uninitialized symbolic values ​​or symbolic values ​​marked as dangerous;

[0060] (3)USEUNINITDATA: Taking function i as the starting point of symbolic execution, when it reaches x, its parameters contain uninitialized symbolic values, that is, it accesses an illegal memory address and does not contain symbolic values ​​marked as dangerous;

[0061] (4) DANGEROUS: The symbolic execution starts from function i and reaches x when its parameters contain symbolic values ​​marked as dangerous.

[0062] (5)SAFE: The symbolic execution starts with function i, and all parameters are specific values ​​when x is reached.

[0063] The initialization state function is:

[0064]

[0065] That is, except for the objective function which is in the CONTINUE state, all other functions are in the unanalyzed state.

[0066] The state function is used to guide the taint analysis process. For a reference chain x→i1→i2→Li n :

[0067] (1) If S(i k ) is DANGEROUS or SAFE, which means that i k If a path with i as the starting point and x as the ending point is either dangerous or has been proven safe, then for any i... k' (k'>k), with i k' The path starting at x and ending at x is either dangerous or safe.

[0068] (2) If S(ik If CONTINUE or USEUNINITDATA is used, it means that the parameters used by x under this path still come from x→L→i. k Outside of the whole composed of partial reference chains, therefore it should continue with i k+1 Let's continue the analysis from this starting point;

[0069] (3) If S(i k If ) is UNKNOWN, it means that i k The path starting from [the first path] has not yet been analyzed.

[0070] like Figure 3-5 As shown, these elements together form the core of the taint propagation algorithm. The taint analysis process is the process of updating the state function. The final output of the algorithm is the updated state function. In each reference chain, there is at least one node whose state function value is DANGEROUS or SAFE. Figure 5 The exploratory function task in the process is to use the symbolic execution engine to extract information from node i in the reference chain. k The objective function x is reached, and the taint convergence point data analysis results are returned. The exploration uses a step-by-step approach, that is, exploring function i... j Set the target to i j-1 The call point, after exploration, then from i j-1 Exploring the starting position i j-2 The call points are repeated until the target function x is found. This method can significantly reduce the degree of symbolic execution path explosion and improve symbolic execution efficiency; Figure 4 The tracing function in the code is responsible for updating the state function based on the return result of the exploration function, thereby completing the classification of the call chain safety. For a call chain x→i1→i2→Li... n Tracing the function from the direct caller (i1) to the top-level caller (i... n The chain is explored sequentially, and based on the node's state, the process chooses to end the chain analysis, attempt to explore from the next level caller, or continue exploring the node. The return value of the exploration function determines whether to end the chain analysis. Figure 3 In this process, the taint analysis engine tracks each reference chain in the reference tree to analyze the safety of that chain. Simultaneously, during symbolic execution, the library function hooking mechanism continues to be used to replace complex library function calls, improving symbolic execution efficiency.

[0071] In step S4, the actual parameters of the objective function are extracted, and the taint convergence points are constructed into a taint convergence point forest. Each tree in the taint convergence point forest is traversed, and the value at each node is checked to see if it is contaminated.

[0072] Specifically, the ultimate goal of taint confluence point data inspection is to determine whether the target function uses tainted data (i.e., symbolic values ​​marked as dangerous) as parameters. However, for functions that use pointers as parameters, firstly, the function may use the data in the memory region pointed to by the pointer in addition to the literal value of the pointer. Secondly, the pointer may not point to a single element smaller than the architecture's bit width; it may point to an array. Thirdly, the pointer may point to a pointer table, where each pointer points to an independent memory region. The taint confluence point should contain all of the above possible memory regions to ensure that tainted data is not missed. Therefore, this embodiment of the invention models the taint confluence point as a forest, where each tree represents a parameter of the target function, the root node of the tree is the outermost pointer of the parameter, and the leaf nodes are the data that the pointer ultimately points to.

[0073] When the symbolic execution engine encounters the objective function, it performs taint convergence analysis, which consists of two steps: extraction and inspection. First, it extracts the actual parameters of the objective function to construct a taint convergence forest. In addition to the model described above, the extraction process also follows these strategies:

[0074] (1) Some functions can extract only some of their parameters. For example, for formatted output functions such as printf in the context of format string vulnerability detection, it is neither necessary (the existence of format string vulnerability is only related to the format string, i.e., the first parameter) nor possible (printf is a variable-length parameter function and has no end marker).

[0075] (2) Both concrete and symbolic values ​​may appear when extracting parameters, so the behavior for extracting symbolic values ​​must be specified. When an undefined symbolic value is encountered while traversing each character of a string, it indicates that an illegal address has been accessed, and no meaningful string exists at that address. The parameter extractor then stops extracting the string. The same applies when an undefined symbolic value is encountered while traversing a pointer table, and attempts to extract data from the address pointed to by that pointer are stopped.

[0076] In the above model, the process of checking taint convergence point data involves traversing each tree in the taint convergence point forest and observing whether the values ​​at each node are contaminated. The principle of angr symbolic computation is that the result of n symbolic variables performing n-ary operations is an abstract syntax tree (AST), with the root node representing the operator and the child nodes representing the operands. In angr, in addition to arithmetic and logical operations, conditional statements (if statements) are also considered as a type of operation. This method unifies the propagation of taint data dependencies and control dependencies. Therefore, observing whether a symbolic value (actually a symbolic AST) at a node is contaminated only requires traversing the AST. If it contains a symbolic value marked as dangerous by the taint source, then the parameter is contaminated; otherwise, it is not contaminated.

[0077] The cross-architecture binary executable vulnerability detection method based on symbolic execution proposed in this invention addresses the challenges of obtaining firmware source code and the diversity of processor architectures. It uses tagged network data as taint sources and employs symbolic execution for taint tracking. By detecting whether function parameters at predefined frequently vulnerable functions are affected by tainted data, it determines whether there is a potential risk in the function call under that path. Furthermore, using 1152 programs from the NIST SARD dataset as the test set, the accuracy rate reaches over 90%, and the recall rate reaches over 80%.

[0078] Next, referring to the accompanying drawings, a cross-architecture binary executable vulnerability detection system based on symbolic execution, according to an embodiment of the present invention, is described.

[0079] Figure 6 This is a schematic diagram of the structure of a cross-architecture binary executable vulnerability detection system based on symbolic execution, according to an embodiment of the present invention.

[0080] like Figure 6 As shown, the system 10 includes: a construction and parsing module 100, a marking module 200, a stain analysis module 300, and a detection module 400.

[0081] The parsing module 100 constructs a reference tree based on the call graph, inputting the executable file and target function to be analyzed into the reference tree for parsing. The marking module 200 sets taint sources as buffers for socket data receiving functions to mark taint source data. The taint analysis module 300 uses a taint propagation algorithm based on backward symbolic execution of the call chain, introducing a state function to record the node i in the reference tree from the reference chain. k The taint analysis process reaches the objective function, obtaining the data analysis results of taint aggregation points. The detection module 400 is used to extract the actual parameters of the objective function, construct a taint aggregation point forest from the taint aggregation points, traverse each tree in the taint aggregation point forest, and detect whether the value at each node is tainted.

[0082] It should be noted that the foregoing explanation of the embodiment of the cross-architecture binary executable vulnerability detection method based on symbolic execution also applies to the system of this embodiment, and will not be repeated here.

[0083] The cross-architecture binary executable vulnerability detection system based on symbolic execution proposed in this invention addresses the challenges of obtaining firmware source code and the diversity of processor architectures. It uses tagged network data as taint sources and employs symbolic execution methods for taint tracking. By detecting whether function parameters at predefined frequently vulnerable functions are affected by tainted data, it determines whether a function call under that path carries potential risks. Using 1152 programs from the NIST SARD dataset as a test set, the accuracy rate reaches over 90%, and the recall rate reaches over 80%.

[0084] To implement the above embodiments, the present invention also proposes a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the cross-architecture binary executable vulnerability detection method based on symbolic execution as described in the foregoing embodiments.

[0085] To implement the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium storing a computer program thereon, wherein the computer program, when executed by a processor, implements the cross-architecture binary executable vulnerability detection method based on symbolic execution as described in the foregoing embodiments.

[0086] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.

[0087] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "N" means at least two, such as two, three, etc., unless otherwise explicitly specified.

[0088] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more N executable instructions for implementing custom logic functions or processes, and the scope of preferred embodiments of the invention includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as should be understood by those skilled in the art to which embodiments of the invention pertain.

[0089] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Alternatively, the computer-readable medium may be paper or other suitable media on which the program can be printed, since the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.

[0090] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0091] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

[0092] Furthermore, the functional units in the various embodiments of the present invention can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

[0093] The storage medium mentioned above can be a read-only memory, a disk, or an optical disk, etc. Although embodiments of the present invention have been shown and described above, it is to be understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions, and variations to the above embodiments within the scope of the present invention.

Claims

1. A method for detecting vulnerabilities in cross-architecture binary executables based on symbolic execution, characterized in that, Includes the following steps: Step S1: Construct a reference tree based on the call graph, and input the executable file and target function to be analyzed into the reference tree for reference tree parsing; Step S2: Set the taint source as the buffer of the socket data receive function to complete the taint source data marking; Step S3, based on the call chain back symbol execution of the contamination propagation algorithm, the state function records the reference tree from the reference chain node The data analysis result of the contamination analysis process reaching the target function obtains the contamination convergence point. Step S3 specifically includes: Step S301: The taint analysis function in the taint propagation algorithm based on backward symbolic execution of the call chain tracks each reference chain in the reference tree and analyzes the security of the reference chain; In step S302, the tracing function in the taint propagation algorithm based on backward symbolic execution of the call chain is responsible for updating the state function according to the return result of the exploration function, thus completing the classification of the call chain security. Step S303, the exploration function in the taint propagation algorithm of the backward call chain symbolic execution uses the symbolic execution engine to reach the target function from the node of the reference chain, and returns the data analysis result of the taint sink. Step S4: Extract the actual parameters of the objective function and construct a forest of taint aggregation points. Traverse each tree in the forest of taint aggregation points and detect whether the value on each node is tainted.

2. The method of claim 1, wherein the method is based on symbolic execution of the cross-architecture binary executable file.

2. The method of claim 1, wherein the method is based on symbolic execution of the cross-architecture binary executable file. The node set of the reference tree is a set of pairs (function, call address) with the root node, and there is a directed edge from to in the reference tree if and only if both of the following conditions are true: (1) there is a directed edge from to in the call graph, and the call address corresponding to the edge is ; (2) the node set corresponding to the path from the root node to does not contain the function in the function set.

3. The method of claim 1, wherein the method further comprises: The reference tree resolution process is as follows: The executable file to be analyzed and the target function are input into the reference tree, the root node of the reference tree is initialized as , each reference function of the current function is parsed and traversed, wherein the Ref function uses the binary function cross-reference parsing function of IDAPython to realize.

4. The method of claim 1, wherein the method further comprises: In step S2, the function hooking feature of the angr framework is used to fill the buffer used to receive messages with symbolic values ​​marked as dangerous.

5. The method of claim 1, wherein the method further comprises: The state functions include UNKNOWN, CONTINUE, USEUNINITDATA, DANGEROUS, and SAFE, where, The UNKNOWN is an unresolved node The CONTINUE is a node that is reached as a symbolic execution starting point The USEUNINITDATA is a node that is reached as a symbolic execution starting point The DANGEROUS is a node that is reached as a symbolic execution starting point The SAFE is a node that is reached as a symbolic execution starting point The SAFE is a node that is reached as a symbolic execution starting point 6. The method of claim 5, wherein the method further comprises: The initial state of the state function is: wherein is a node, is an objective function.

7. A system for cross-architecture binary executable vulnerability detection based on symbolic execution, the system comprising: include: A parsing module is constructed to build a reference tree based on the call graph. The executable file to be analyzed and the target function are input into the reference tree for reference tree parsing. The tagging module is used to set taint sources as buffers for socket data receive functions in order to complete taint source data tagging; A taint analysis module for a backward symbolic execution of a call chain based on a taint propagation algorithm, introduces a state function to record the nodes in the reference tree from the reference chain A taint analysis process to a target function, obtains the data analysis result of the taint sink point; The stain analysis module specifically includes: The taint analysis function in the taint propagation algorithm based on backward symbolic execution of the call chain tracks each reference chain in the reference tree and analyzes the security of the reference chain. In the taint propagation algorithm based on backward symbolic execution of the call chain, the tracking function is responsible for updating the state function based on the return result of the exploration function, thus completing the classification of the call chain safety. The exploration function in a taint propagation algorithm based on call chain backward symbolic execution uses a symbolic execution engine to reach a target function from a node in the reference chain and returns a data analysis result of a taint sink. point. The detection module is used to extract the actual parameters of the objective function, construct a forest of taint aggregation points, traverse each tree in the forest of taint aggregation points, and detect whether the value on each node is contaminated.

8. A computer device, comprising: The method includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the symbolic execution-based cross-architecture binary executable vulnerability detection method as described in any one of claims 1-6.

9. A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by a processor, it implements the cross-architecture binary executable vulnerability detection method based on symbolic execution as described in any one of claims 1-6.