High throughput embedding generation system for executable code and applications
a generation system and high throughput technology, applied in computing, complex mathematical operations, instruments, etc., can solve the problems of inability to handle an enormous amount of executable code in the wild, methods all lack in accuracy, and the existing binary code similarity detection approach is far from being scalable to handle enormous amounts of executable code, so as to speed up the embedding generation process and high throughput
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Benefits of technology
Problems solved by technology
Method used
Image
Examples
embodiment 103
[0036]Raw features extracted by Raw Feature Extraction embodiment 103 can be implemented in many ways. Raw features include but are not limited to Control Flow Graph, Attributed Control Flow Graph, etc. This invention presents one implementation of raw feature : Bi-Directional Attributed Control Flow Graph (BACFG) 104 defined as follows.
[0037]Definition 1. (Bi-directional Attributed Control Flow Graph) The bi-directional attributed control flow graph, or BACFG in short, is a special directed graphs with two edges G=1, E2, φ>, where V is a set of basic blocks; E1⊆V×V is a set of edges representing the connections between these basic blocks, E2=E1T⊆V×V is a set of edges representing the reversed connections between these basic blocks, and φ: V→Σ is the labeling function which maps a basic block in V to a set of attributes in .
embodiment 102
[0038]Bi-directional ACFG extraction embodiment 102 can be implemented using different approaches. One approach relies on disassemblers such as IDA pro and Binary Ninja to disassemble the executable code 101. Every function in the executable code is recovered and its raw features (control flow graph, basic block information) are extracted. Finally, BACFG 104 is constructed from this information for every function in the executable code 101. FIG. 2 presents an example of constructing BACFG 104 from executable code. The disassembled raw code 201 is extracted using IDA pro, a commercial disassembler from a piece of OpenSSL executable code. It contains the control flow graph of function SSL_get_psk_identity_hint and basic block (n1, n2, n3, n4) information. 202 is the corresponding BACFG constructed for function SSL_get_psk_identity_hint. Every node in 201 represented in a set of attributes. The edges in 202 are kept in generated BACFG 202. The doted arrow line in 202 represents the rev...
embodiment 10
[0066]Since k is usually a small number (<20), expensive program analysis can be applied to exactly determine if the functions in candidate list are indeed vulnerable. The conditional formula based function identification embodiment 10 is implemented to identify the true vulnerable functions in candidate list.
[0067]Generally speaking, a conditional formula consists of an If-clause and a Then-clause, and each clause is a symbolic formula, describing under what condition (stated in the If-clause) a given action (in the Then-clause) will take place. A conditional formula explicitly captures two cardinal factors of a buggy code: (1) erroneous data dependencies, and (2) missing or incorrect condition checks. Instead of treating the vulnerable function as a whole, searching on structured conditional formulas can effectively localize the possibly vulnerable code logic. By contrasting conditional formulas between the vulnerable function and a target candidate, we can quickly diagnose whethe...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


