A non-direct call target identification method and device based on static features

By converting the source code into an LLVM intermediate representation, extracting static features, and constructing an undirected call graph, the method utilizes multiple static features to filter indirect call targets, thus solving the accuracy problem in identifying indirect call targets in existing technologies and achieving low false alarm rate and high accuracy in identification.

CN116360888BActive Publication Date: 2026-06-16PEKING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PEKING UNIV
Filing Date
2021-12-27
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing static analysis techniques struggle to accurately identify indirect call targets in C/C++ programs, impacting the accuracy of vulnerability detection and control flow integrity protection technologies.

Method used

By converting the source code into an intermediate representation of LLVM, static features such as type information, function names, and code metrics are extracted to construct an undirected call graph. Indirect call targets are filtered out using call distance, path reachability, and function similarity. A two-stage filtering rule is used to reduce the false alarm rate.

🎯Benefits of technology

It effectively reduces the false alarm rate of indirect target identification, improves the accuracy and reliability of identification, and ensures a low false negative rate.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116360888B_ABST
    Figure CN116360888B_ABST
Patent Text Reader

Abstract

The application discloses a non-direct call target identification method and device based on static features, converts source code into an LLVM intermediate representation, extracts static features in the LLVM intermediate representation, identifies non-direct call targets according to type information in the static features, constructs an undirected call graph with functions as vertices and call relationships as edges according to function names, direct call instruction targets and the identified non-direct call targets, calculates the call distance of each target of a non-direct call instruction, the call path reachability and the similarity between functions according to the function names and the code metrics, formulates a screening rule according to the call distance, the call path reachability and the similarity between functions, and screens out non-direct call targets. The application identifies non-direct call targets in a program by combining multiple static feature analysis modes, guarantees a low false negative rate, reduces the false positive rate according to the screening rule, and effectively improves the identification accuracy.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer security technology, and in particular to a method and apparatus for non-direct target identification based on static features. Background Technology

[0002] C / C++ programs generally use function pointers to implement dynamic behavior. Different programs implement dynamic behavior in different ways. Dynamic function calls mainly appear in three forms: callback functions, jump tables, and virtual functions. These types of dynamic function calls are collectively referred to as indirect calls. Because the target of an indirect call can only be determined at runtime, current static analysis techniques largely face the problem of difficulty in identifying the target. Inaccurate indirect call targets can affect the accuracy of some vulnerability detection techniques and the effectiveness of Control Flow Integrity (CFI) protection techniques.

[0003] Currently, the main method for identifying indirect call targets is based on type matching. The basic idea is to extract all address-taken functions in the source code and classify them according to function type. When identifying the target of an indirect call instruction, the previously extracted address-taken functions are matched according to the type of the function pointer loaded by the instruction. Functions with the same type are the targets of the call instruction.

[0004] This type-matching-based method has the advantages of high efficiency and no false alarms, but because it cannot make full use of the static features in the program, the false alarm rate is high. Summary of the Invention

[0005] To overcome the shortcomings of the prior art, the present invention provides a method and apparatus for identifying indirect call targets based on static features, which can identify targets of indirect call instructions in source code and effectively reduce the false alarm rate of identification based on a variety of static features.

[0006] The technical solution provided by this invention is: a non-direct target recognition method based on static features, comprising the following steps:

[0007] The source code is converted into an LLVM intermediate representation, and static features are extracted from the LLVM intermediate representation. The static features include: type information, function name, code metric, and direct call instruction target.

[0008] The non-direct call target is identified based on the type information in the static features, wherein the type information includes: function type information and function pointer type information;

[0009] Based on the function name, the direct call instruction target, and the identified indirect call targets, an undirected call graph is constructed with functions as vertices and call relationships as edges; and the call distance, call path reachability, and similarity between functions for each target of the indirect call instruction are calculated based on the function name and the code metric.

[0010] Filtering rules are established based on the call distance, call path reachability, and similarity between functions to filter out non-direct call targets.

[0011] Furthermore, the type information also includes: the type of the address-fetching function and the type of the memory region where the function pointer loaded by the non-direct call instruction resides;

[0012] After identifying the indirect invocation target based on the type information in the static features, the method further includes:

[0013] The non-direct call target is identified by filtering based on the type of the address-fetching function and the type of the memory region where the function pointer loaded by the non-direct call instruction is located. If the non-direct call target has not appeared in the memory of the corresponding type, the non-direct call target is invalid.

[0014] The filtering rules include:

[0015] If the call distance of a target that is not directly invoked is 0, then that target is a seed call target;

[0016] If the call path to a target that is not directly invoked by an instruction is well reachable, then that target is a seed call target;

[0017] If the similarity between a target caller and a target function that is not directly invoked by an instruction exceeds a set threshold, then that target is a seed call target.

[0018] Furthermore, after filtering out the seed call targets, it also includes:

[0019] Based on the seed call target and the second filtering rule, a call target is selected again; the second filtering rule includes:

[0020] The indirect call instruction target is compared with the seed call target. If the similarity exceeds a set threshold, the target is a valid indirect call target.

[0021] The call path of the indirect call instruction target is compared with the call path of the seed call target. If the overlap of the call paths exceeds a set threshold, then the target is a valid indirect call target.

[0022] Furthermore, the good reachability of the call path includes: calculating the call distance from each target to the call point to obtain the shortest path; if the number of functions with function pointer type parameters or function pointer type return values ​​in a path exceeds a set threshold for the total number of vertices on the path, then the path is judged to have good reachability.

[0023] On the other hand, the present invention also provides an indirect call target identification device based on static features, comprising: a source code static feature extraction subsystem, an indirect call target identification subsystem based on type analysis, and an indirect call target identification subsystem based on static features; wherein,

[0024] The source code static feature extraction subsystem is used to convert the source code into an LLVM intermediate representation and extract static features from the LLVM intermediate representation. The static features include: type information, function name, code metric and direct call instruction target.

[0025] The indirect call target identification subsystem based on type analysis includes a single-layer type analysis module, which is used to identify indirect call targets based on the type information in the static features. The type information includes: function type information and function pointer type information.

[0026] The static feature-based indirect call target identification subsystem includes a call graph construction module and an indirect call target identification module. The call graph construction module is used to construct an undirected call graph with functions as vertices and call relationships as edges based on the function name, the direct call instruction target, and the identified indirect call targets. The indirect call target identification module is used to calculate the call distance, call path reachability, and function similarity of each target of the indirect call instruction based on the function name and the code metric; and to formulate filtering rules based on the call distance, call path reachability, and function similarity to filter out indirect call targets.

[0027] The type information also includes: the type of the address-fetching function and the type of the memory region where the function pointer loaded by the indirect call instruction is located; the indirect call target identification subsystem based on type analysis also includes a multi-level type analysis module, which is used to: filter the identified indirect call targets according to the type of the address-fetching function and the type of the memory region where the function pointer loaded by the indirect call instruction is located; if the indirect call target has not appeared in the memory of the corresponding type, the indirect call target is invalid.

[0028] The filtering rules defined by the non-directly invoked target recognition module include:

[0029] If the call distance of a target that is not directly invoked is 0, then that target is a seed call target;

[0030] If the call path to a target that is not directly invoked by an instruction is well reachable, then that target is a seed call target;

[0031] If the similarity between a target caller and a target function that is not directly invoked by an instruction exceeds a set threshold, then that target is a seed call target.

[0032] After filtering out the seed invocation targets, the indirect invocation target identification module is also used for:

[0033] Based on the seed call target and the second filtering rule, a call target is selected again; the second filtering rule includes:

[0034] The indirect call instruction target is compared with the seed call target. If the similarity exceeds a set threshold, the target is a valid indirect call target.

[0035] The call path of the indirect call instruction target is compared with the call path of the seed call target. If the overlap of the call paths exceeds a set threshold, then the target is a valid indirect call target.

[0036] The indirect call target identification module is used to: calculate the call distance from each target to the call point, obtain the shortest path, and if the number of functions with function pointer type parameters or function pointer type return values ​​in a path exceeds a set threshold, then the path is judged to have good reachability.

[0037] The beneficial effects of this invention are:

[0038] This invention provides a method and apparatus for identifying indirect call targets based on static features. By converting source code into an LLVM intermediate representation and extracting static features from the source code that facilitate indirect call target identification to the greatest extent possible, a batch of indirect call targets is identified using type information, ensuring a low false negative rate. By constructing an undirected call graph using existing call relationships and further pruning the call graph using static features contained in the program, the false positive rate of indirect call target identification is effectively reduced. Furthermore, in the final identification stage, a two-stage screening method can be used to ensure a low false negative rate for indirect call target identification. This invention can statically identify indirect call targets in C / C++ programs and fully utilizes various static features contained in the source code to effectively reduce the false positive rate, significantly improving identification accuracy. Attached Figure Description

[0039] Figure 1 This is a flowchart of the identification method of the present invention.

[0040] Figure 2 This is a system structure block diagram provided for an embodiment of the identification device of the present invention. Detailed Implementation

[0041] The present invention will be further described below with reference to the accompanying drawings and embodiments, but the scope of the invention is not limited in any way.

[0042] The technical solution provided by this invention is: a method and apparatus for indirect target recognition based on static features. The method for indirect target recognition based on static features is as follows: Figure 1 As shown, by compiling the source code into an LLVM intermediate representation (LLVM IR), static features are extracted from the source code, including type information, data flow information, and code metrics. Then, a batch of indirect call targets is obtained first through type information. Next, a coarse-grained call graph is constructed based on the first batch of results. Finally, other information is used to prune the call graph, thereby effectively identifying the targets of indirect call instructions in the source code. The process mainly includes: extracting static features from the source code, type analysis to identify indirect call targets, and constructing and pruning the call graph to obtain the final target results.

[0043] The following steps are performed during the static feature extraction phase of the source code:

[0044] Step 1: Convert the source code into an LLVM intermediate representation, that is, use a compiler to compile the source code into an LLVM-analyzable .bc file, and avoid information loss during the compilation process;

[0045] Step 2: Extract static features from the source code. These static features are the features that will be used in subsequent stages, including but not limited to the source and function type of the address-taken function, the type of the function pointer loaded by the non-direct call instruction, the function name, code metrics, and the target of the direct call instruction.

[0046] Type analysis identifies indirect calls to the target phase and performs the following steps:

[0047] Step 3: Identify the target of the indirect call instruction based on the type information obtained in Step 2. The type information involved in this step only includes function type information and function pointer type information.

[0048] Step 4: Filter invalid targets obtained in Step 3 based on the type information obtained in Step 2. The information used in this step is the type of the memory region where the address-taken function and the function pointer loaded by the non-direct call instruction are located. If a target has not appeared in the memory of the corresponding type, the target is invalid. Filtering out invalid targets can reduce the false alarm rate.

[0049] The specific steps for constructing and pruning the call graph and obtaining the final target result are as follows:

[0050] Step 5: Construct the program call graph based on the direct call instruction targets obtained in Step 2 and the indirect call instruction targets obtained in Step 4;

[0051] Step 6: Calculate the similarity between the caller and the callee, and the similarity between equivalent callees, based on the function name and code metrics obtained in Step 2.

[0052] Step 7: Calculate the call distance of each target of the indirect call instruction based on the program call graph constructed in Step 5. The call distance is the path length from the source of the target to the call point in the graph.

[0053] Step 8: Calculate the path from each target to the call point based on the type information obtained in Step 2 and the program call graph constructed in Step 5, and calculate the reachability of the path.

[0054] Step 9: Based on the information calculated in Steps 6-8, filter the call targets obtained in Step 4 to select a batch of seed call targets. The filtering rules are as follows: Rule 1: If the call distance of the target is 0, then the target is a seed call target; Rule 2: If the path reachability from the target to the call point is good / poor, then the target is / is not a seed call target; Rule 3: If the caller and the target function are highly similar, then the target is a seed call target.

[0055] Step 10: Select other calling targets based on the information calculated in Steps 6 and 8 and the seed targets selected in Step 9. The selection rules are as follows: Rule 1: If a target has a high degree of similarity to a certain seed calling target, then the target is a valid indirect calling target; Rule 2: If the calling path of a target has a high degree of overlap with the calling path of a certain seed calling target, then the target is a valid indirect calling target.

[0056] Through steps 9 and 10, this embodiment of the invention provides an optimal method for identifying indirect call targets. It employs a two-step screening method, and the screening rules can be satisfied simultaneously or individually. Those skilled in the art can also formulate other screening rules based on existing technologies. This invention does not impose specific limitations on the screening rules.

[0057] This invention provides a method for identifying indirect call targets based on static features. By converting the source code into an LLVM intermediate representation and extracting the static features contained in the source code that are helpful for identifying indirect call targets to the greatest extent, a batch of indirect call targets are identified using type information, ensuring a low false negative rate. By constructing an undirected call graph using existing call relationships and pruning the call graph using static features contained in the program, the false positive rate of indirect call target identification is effectively reduced. In addition, the further identification stage adopts a two-stage screening method to ensure a low false negative rate of indirect call target identification.

[0058] like Figure 2 As shown, utilizing the aforementioned indirect call target recognition method based on static features, this invention implements a corresponding indirect call target recognition device based on static features, comprising: a source code static feature extraction subsystem one, an indirect call target recognition subsystem two based on type analysis, and an indirect call target recognition subsystem three based on static features; wherein,

[0059] Subsystem one includes the following modules:

[0060] The static feature extraction module extracts static features from LLVM IR, such as type information, direct call targets, function names, and code metrics.

[0061] Subsystem 2 includes the following modules:

[0062] The single-layer type analysis module identifies the target of indirect call instructions by matching the function type of the address-taken function with the type of the function pointer;

[0063] Furthermore, it also includes a multi-level type analysis module, which uses the type information of the memory area where the function pointer is located to filter the non-direct call targets obtained by the single-level type analysis module. In order to avoid missed detections, the analysis in this module will be skipped for types that have undergone type conversion.

[0064] Subsystem 3 includes the following modules:

[0065] The call graph construction module uses existing function information, direct call targets, and indirect call targets to construct an undirected call graph with functions as vertices and call relationships as edges;

[0066] The indirect call target identification module obtains the program's call graph and calculates the following information based on the call graph and other existing information, including but not limited to call distance, call path reachability, and function similarity. Then, it identifies indirect call targets in two stages. First, it identifies a batch of seed call targets based on information such as call distance, function similarity, and path reachability. Then, it identifies all indirect call targets based on the seed call targets and information such as function similarity and path overlap.

[0067] The present invention provides a specific embodiment to illustrate the implementation of the method of the present invention.

[0068] Step 1: Convert the source code into an LLVM intermediate representation, that is, use a compiler to compile the source code into an LLVM-analyzable .bc file, and avoid information loss during the compilation process;

[0069] Furthermore, the detailed compilation method is as follows:

[0070] (1) Use wlvm to compile and link the source code into a single bc file to avoid the problem that LLVM is prone to missing address-taken functions when performing cross-module analysis;

[0071] (2) During the configuration phase, use the default configuration from the source code;

[0072] (3) The compilation flag is “-g–O0–no-inlining” to ensure that the generated bc file contains accurate debugging information, including but not limited to line numbers and function names.

[0073] Step 2: Extract static features from the source code. These static features are the features needed in subsequent stages, including but not limited to the source and function type of the address-taken function, the type of the function pointer loaded by the non-direct call instruction, the function name, code metrics, and the target of the direct call instruction.

[0074] Furthermore, static features include, but are not limited to:

[0075] The source of the address-taken function refers to the location where the address-taken function is taken. Here, the location refers to the function. In particular, if the address-taken function is assigned to a global variable, the source of the global variable is searched recursively.

[0076] The type of the address-taken function refers to the type of the function, including the return type and the types of all parameters.

[0077] The type of a function pointer loaded by an indirect call instruction, including the return type and the types of all parameters;

[0078] The function name refers to the unique identifier of each function in the bc file after compilation;

[0079] Code metrics, including but not limited to the cyclomatic complexity and lines of code for each function;

[0080] The direct call target indicates the caller and callee of the direct call instruction in the program;

[0081] Phase Two includes the following steps:

[0082] Step 3: Identify the target of the indirect call instruction based on the type information obtained in Step 2. The type information involved in this step only includes function type information and function pointer type information. The function type information is stored in a dictionary DictF =<key,value> In the expression, `key = hash(type)` represents the hash value of the function type. The hash value is calculated by converting the function type to a string and then calculating the hash value. `value = {f1, f2, f3, ..., f...}` n The `}` tag represents a set of all functions of that type, where the hash value of the type of each function in the set is consistent with its corresponding key. For example, in the code below, the set `DictF[hash("int(int)")]` will contain the functions `reverse` and `minus`; the target of each indirect call instruction `fptr_t` is all the functions in the set `DictF[hash(type(fptr_t))]`.

[0083] Step 4: Filter the invalid targets obtained in Step 3 based on the type information obtained in Step 2. The information used in this step is the type of the memory region where the address-taken function and the function pointer loaded by the non-direct call instruction are located. If a target has not appeared in the memory of the corresponding type, the target is invalid.

[0084] The device in this embodiment of the invention maintains a dictionary DictM =<key,value> This is used to store the type information of the memory region where the address-taken function resides, where key = hashIdx(type, Idx), where type represents the type of the memory region where the address-taken function resides, and Idx represents its offset in memory; value = {f1, f2, f3, ..., f nThe `}` directive represents all address-taken functions stored in an object of type `type` with a corresponding memory offset of `Idx`. For example, in the following code, the function `minus` is an address-taken function stored in the first field of the structure type `Example`, so the collection `DictM[hashIdx(Example,0)]` contains the function `minus`, while the function `reverse` is not stored in this structure type, so it is not in the collection.

[0085]

[0086]

[0087] In this embodiment of the invention, the device maintains a set Set to store all types that have undergone type conversion, Set = {t1, t2, t3, ..., t...} n The `}` directive includes all types that have undergone type conversion. For example, in the following code, a type conversion occurs between two variables of different composite types; therefore, both `struct B` and `struct A` belong to the `Set` collection.

[0088]

[0089] In this step, the device of this embodiment first determines whether the type of the memory region where the function pointer loaded by the non-direct call instruction is located is a composite type, including struct, class, array, etc., and then determines whether the composite type belongs to the set. If not, this step is skipped; if so, the intersection of the target set in step 3 and DictM[hashIdx(type,Idx)] is taken as the new target set.

[0090] The third stage includes the following steps:

[0091] Step 5: Based on the direct call instruction targets obtained in Step 2 and the indirect call instruction targets obtained in Step 4, construct the program call graph CG.<V,E> , where CG is an undirected graph, V = {f1, f2, f3, ..., f n} represents all vertices in the graph, each vertex represents a function in the program, E represents an edge of the undirected graph, and each undirected edge represents a call relationship between two vertices;

[0092] Step 6: Calculate the similarity between the caller and the callee, and the similarity between equivalent callees, based on the function name and code metrics obtained in Step 2.

[0093] Furthermore, the similarity calculation method based on function names is as follows;

[0094] (1) Splitting: Split the function names of each function. When encountering case switching or "_", treat it as the beginning of a new word. For example, the function name "ZnGet_UserModeInformation" will be split into a list of words in the form of ["zn", "get", "user", "mode", "information"] wl;

[0095] (2) Filtering: Filter out common conjunctions in the word list, such as “and”, “or”, “for”, etc., and filter out symbols added by the LLVM compiler in the word list, such as “zn”, “ztsd”, etc.

[0096] (3) Calculate the similarity of function names. Compare the words in the word lists wl1 and wl2 of function names pairwise and find their longest common substring. The similarity calculation formula is Equation 1:

[0097]

[0098] Where Sim represents the similarity between function names, wl1 and wl2 represent the word lists generated by the two function names, respectively. i word j These represent words in wl1 and wl2 respectively, common_sub_string(word) i word j The expression ) represents the longest common substring of two words. Note that the longest common substring here is counted only from the first letter. For example, the similarity between the function names `make_complete` and `get_completion` is:

[0099]

[0100] Step 7: Calculate the call distance of each target of the non-direct call instruction based on the program call graph constructed in Step 5. The call distance is the path length from the source of the target to the call point in the graph.

[0101] Furthermore, the method for calculating the call distance is as follows:

[0102] If an indirect call instruction appears in the function `func`, and it has one call target `callee`, then according to step 2, the source of `callee` is known. Let the set of its sources be:

[0103] Src[callee]={srcfunc1, srcfunc2, srcfunc3,..., srcfuncn}

[0104] Calculate the shortest path from each vertex in the set to vertex func in the graph CG obtained in step 5, and take the length of the shortest path as calllee, which corresponds to the call distance of func.

[0105] Step 8: Calculate the path from each target to the call point based on the type information obtained in Step 2 and the program call graph constructed in Step 5, and calculate the reachability of the path.

[0106] Furthermore, the metrics for path reachability are as follows:

[0107] The path is the shortest path obtained in step 7. If the number of functions on the path that have function pointer type parameters or function pointer type return values ​​accounts for 50% or more of the total number of vertices on the path (this threshold can be adjusted), then the path is considered to have good reachability.

[0108] Step 9: Based on the information calculated in Steps 6-8, filter the call targets obtained in Step 4 to select a batch of seed call targets. The filtering rules are as follows: Rule 1: If the call distance of the target is 0, then the target is a seed call target; Rule 2: If the path reachability from the target to the call point is good / poor, then the target is / is not a seed call target; Rule 3: If the similarity between the caller and the target function exceeds 75% (this threshold can be adjusted), then the target is a seed call target.

[0109] Step 10: Based on the information calculated in Steps 6 and 8 and the seed targets selected in Step 9, select other calling targets. The selection rules are as follows: Rule 1: If the similarity between a target and a certain seed calling target exceeds 75% (this threshold can be adjusted), then the target is a valid indirect calling target; Rule 2: If the overlap between the calling path of the target and the calling path of a certain seed calling target is >50% (overlap is the ratio of the number of vertices contained in the intersection of the two paths to the number of vertices contained in the shorter path, this threshold can be adjusted), then the target is a valid indirect calling target.

[0110] It should be noted that the purpose of disclosing the embodiments is to help further understand the present invention. However, those skilled in the art will understand that various substitutions and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, the present invention should not be limited to the content disclosed in the embodiments, and the scope of protection of the present invention is defined by the scope of the claims.

Claims

1. A static feature based non-direct call target identification method, characterized in that, Includes the following steps: The source code is converted into an LLVM intermediate representation, and static features are extracted from the LLVM intermediate representation. The static features include: type information, function name, code metric, and direct call instruction target. The non-direct call target is identified based on the type information in the static features, wherein the type information includes: function type information and function pointer type information; Based on the function name, the direct call instruction target, and the identified indirect call targets, an undirected call graph is constructed with functions as vertices and call relationships as edges; and the call distance, call path reachability, and similarity between functions for each target of the indirect call instruction are calculated based on the function name and the code metric. Based on the aforementioned call distance, call path reachability, and function similarity, filtering rules are established to filter out indirect call targets, wherein: The filtering rules include: If the call distance of a target that is not directly invoked is 0, then that target is a seed call target; If the call path to a target that is not directly invoked by an instruction is well reachable, then that target is a seed call target; If the similarity between a target caller and a target function that is not directly invoked exceeds a set threshold, then that target is a seed call target; After filtering out the seed call targets, the following are also included: Based on the seed call target and the second filtering rule, a call target is selected again; the second filtering rule includes: The indirect call instruction target is compared with the seed call target. If the similarity exceeds a set threshold, the target is a valid indirect call target. The call path of the indirect call instruction target is compared with the call path of the seed call target. If the overlap of the call paths exceeds a set threshold, then the target is a valid indirect call target.

2. The method of claim 1, wherein, The type information also includes: the type of the address-fetching function and the type of the memory region where the function pointer loaded by a non-direct call instruction is located; After identifying the indirect invocation target based on the type information in the static features, the method further includes: The non-direct call target is identified by filtering based on the type of the address-fetching function and the type of the memory region where the function pointer loaded by the non-direct call instruction is located. If the non-direct call target has not appeared in the memory of the corresponding type, the non-direct call target is invalid.

3. The method of claim 1, wherein, The good reachability of the call path includes: calculating the call distance from each target to the call point to obtain the shortest path; if the number of functions with function pointer type parameters or function pointer type return values ​​in a path exceeds a set threshold, then the path is judged to have good reachability.

4. A static feature based non-direct call target identification apparatus, characterized by, include: The system comprises a source code static feature extraction subsystem, a type analysis-based indirect call target identification subsystem, and a static feature-based indirect call target identification subsystem; among which... The source code static feature extraction subsystem is used to convert the source code into an LLVM intermediate representation and extract static features from the LLVM intermediate representation. The static features include: type information, function name, code metric and direct call instruction target. The indirect call target identification subsystem based on type analysis includes a single-layer type analysis module, which is used to identify indirect call targets based on the type information in the static features. The type information includes: function type information and function pointer type information. The static feature-based indirect call target identification subsystem includes a call graph construction module and an indirect call target identification module. The call graph construction module constructs an undirected call graph with functions as vertices and call relationships as edges, based on the function name, the direct call instruction target, and the identified indirect call targets. The indirect call target identification module calculates the call distance, call path reachability, and function similarity for each target of the indirect call instruction based on the function name and the code metric. It then uses the call distance, call path reachability, and function similarity to formulate filtering rules to identify indirect call targets, wherein: The filtering rules defined by the non-directly invoked target recognition module include: If the call distance of a target that is not directly invoked is 0, then that target is a seed call target; If the call path to a target that is not directly invoked by an instruction is well reachable, then that target is a seed call target; If the similarity between a target caller and a target function that is not directly invoked exceeds a set threshold, then that target is a seed call target; After filtering out the seed invocation targets, the indirect invocation target identification module is also used for: Based on the seed call target and the second filtering rule, a call target is selected again; the second filtering rule includes: The indirect call instruction target is compared with the seed call target. If the similarity exceeds a set threshold, the target is a valid indirect call target. The call path of the indirect call instruction target is compared with the call path of the seed call target. If the overlap of the call paths exceeds a set threshold, then the target is a valid indirect call target.

5. The apparatus of claim 4, wherein, The type information also includes: the type of the address-fetching function and the type of the memory region where the function pointer loaded by the indirect call instruction is located; the indirect call target identification subsystem based on type analysis also includes a multi-level type analysis module, which is used to: filter the identified indirect call targets according to the type of the address-fetching function and the type of the memory region where the function pointer loaded by the indirect call instruction is located; if the indirect call target has not appeared in the memory of the corresponding type, the indirect call target is invalid.

6. The apparatus of claim 4, wherein, The indirect invocation of the target recognition module is used for: Calculate the call distance from each target to the call point to obtain the shortest path. If the number of functions with function pointer type parameters or function pointer type return values ​​on a path exceeds a set threshold compared to the total number of vertices on the path, then the path is considered to have good reachability.