Unlock instant, AI-driven research and patent intelligence for your innovation.

High throughput embedding generation system for executable code and applications

a generation system and high throughput technology, applied in computing, complex mathematical operations, instruments, etc., can solve the problems of inability to handle an enormous amount of executable code in the wild, methods all lack in accuracy, and the existing binary code similarity detection approach is far from being scalable to handle enormous amounts of executable code, so as to speed up the embedding generation process and high throughput

Inactive Publication Date: 2021-02-25
DEEPBITS TECH INC +1
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The patent describes some technical features that speed up the vulnerability search process. One feature is to use matrix manipulation to speed up the comparison of vulnerability related functions. Another feature is to stack the functions to speed up the embedding generation process for the whole system. The combination of high-throughput embedding generation and comparison with condition formula comparison allows for a precise and scalable vulnerability search. Overall, these features enable faster and more efficient vulnerability searches.

Problems solved by technology

However, the existing binary code similarity detection approaches are far from being scalable to handle an enormous amount of executable code in the wild.
These methods all lack in accuracy, and most of them are fairly expensive and do not satisfy the needs for processing large volume of malware samples and search over a large code base.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High throughput embedding generation system for executable code and applications
  • High throughput embedding generation system for executable code and applications
  • High throughput embedding generation system for executable code and applications

Examples

Experimental program
Comparison scheme
Effect test

embodiment 103

[0034]Raw features extracted by Raw Feature Extraction embodiment 103 can be implemented in many ways. Raw features include but are not limited to Control Flow Graph, Attributed Control Flow Graph, etc. This invention presents one implementation of raw feature: Bi-Directional Attributed Control Flow Graph (BACFG) 104 defined as follows.

[0035]Definition 1. (Bi-directional Attributed Control Flow Graph) The bi-directional attributed control flow graph, or BACFG in short, is a special directed graphs with two edges G=1, E2, φ>, where V is a set of basic blocks; E1⊆V×V is a set of edges representing the connections between these basic blocks, E2=E1T⊆V×V is a set of edges representing the reversed connections between these basic blocks, and φ: V→Σ is the labeling function which maps a basic block in V to a set of attributes in.

embodiment 102

[0036]Bi-directional ACFG extraction embodiment 102 can be implemented using different approaches. One approach relies on disassemblers such as IDA pro and Binary Ninja to disassemble the executable code 101. Every function in the executable code is recovered and its raw features (control flow graph, basic block information) are extracted. Finally, BACFG 104 is constructed from this information for every function in the executable code 101. FIG. 2 presents an example of constructing BACFG 104 from executable code. The disassembled raw code 201 is extracted using IDA pro, a commercial disassembler from a piece of OpenSSL executable code. It contains the control flow graph of function SSL_get_psk_identity_hint and basic block (n1, n2, n3, n4) information. 202 is the corresponding BACFG constructed for function SSL_get_pskidentity_hint. Every node in 201 represented in a set of attributes. The edges in 202 are kept in generated BACFG 202. The doted arrow line in 202 represents the reve...

embodiment 10

[0063]Since k is usually a small number (10 is implemented to identify the true vulnerable functions in candidate list.

[0064]Generally speaking, a conditional formula consists of an If-clause and a Then-clause, and each clause is a symbolic formula, describing under what condition (stated in the If-clause) a given action (in the Then-clause) will take place. A conditional formula explicitly captures two cardinal factors of a buggy code: (1) erroneous data dependencies, and (2) missing or incorrect condition checks. Instead of treating the vulnerable function as a whole, searching on structured conditional formulas can effectively localize the possibly vulnerable code logic. By contrasting conditional formulas between the vulnerable function and a target candidate, we can quickly diagnose whether the target is vulnerable or a false positive.

[0065]The embodiment 10 first utilizes a binary lifting tool (such as Binary Ninja) to convert vulnerable functions and the candidate list to the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A novel high-throughput embedding generation and comparison system for executable code is presented in this invention. More specifically, the invention relates to a deep-neural-network based graph embedding generation and comparison system. A novel bi-directional code graph embedding generation has been proposed to enrich the information extracted from code graph. Furthermore, by deploying matrix manipulation, the throughput of the system has significantly increased for embedding generation. Potential applications such as executable file similarity calculation, vulnerability search are also presented in this invention.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]The present application claims priority to U.S. Provisional Application No. 62 / 875,830, filed on Jul. 18, 2019.STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT[0002]Not ApplicableTHE NAMES OF THE PARTIES TO A JOINT RESEARCH AGREEMENT[0003]Not applicableINCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED ON A COMPACT DISC[0004]Not applicableBACKGROUND OF THE INVENTION[0005]Given two binary functions, we would like to detect whether they are semantically equivalent or similar. This problem is known as “binary code similarity detection” or “binary code search”, which has many security applications, such as plagiarism detection, malware detection, vulnerability search, etc. E.g., “binary code similarity detection” can be applied on determination if new incoming code binaries are variants of known examples of malware.[0006]In cybersecurity industry, to process the huge volume of executable code (e.g., malware, firmware images, etc....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F16/14G06K9/62G06F17/16
CPCG06F16/148G06K9/6215G06F17/16G06K9/6247G06K9/6296G06K9/6202G06F17/10G06F16/2237G06F18/2135G06F18/22G06F18/29
Inventor YIN, HENGHU, XUNCHAOYU, SHENGZHENG, YU
Owner DEEPBITS TECH INC