Unlock instant, AI-driven research and patent intelligence for your innovation.

High throughput embedding generation system for executable code and applications

a generation system and high throughput technology, applied in computing, complex mathematical operations, instruments, etc., can solve the problems of inability to handle an enormous amount of executable code in the wild, methods all lack in accuracy, and the existing binary code similarity detection approach is far from being scalable to handle enormous amounts of executable code, so as to speed up the embedding generation process and high throughput

Active Publication Date: 2021-07-15
DEEPBITS TECH INC +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The present invention relates to a method for efficiently identifying vulnerabilities in a computer system. One technical effect of the invention is to use matrix manipulation to speed up the process of comparing vulnerability patterns to a computer system's code. Another technical effect is to combine high-throughput vulnerability generation and comparison with condition formula comparison to enable precise and scalable vulnerability search. These technical effects lead to faster and more comprehensive vulnerability identification, which can help to improve computer system security.

Problems solved by technology

However, the existing binary code similarity detection approaches are far from being scalable to handle an enormous amount of executable code in the wild.
These methods all lack in accuracy, and most of them are fairly expensive and do not satisfy the needs for processing large volume of malware samples and search over a large code base.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High throughput embedding generation system for executable code and applications
  • High throughput embedding generation system for executable code and applications
  • High throughput embedding generation system for executable code and applications

Examples

Experimental program
Comparison scheme
Effect test

embodiment 103

[0036]Raw features extracted by Raw Feature Extraction embodiment 103 can be implemented in many ways. Raw features include but are not limited to Control Flow Graph, Attributed Control Flow Graph, etc. This invention presents one implementation of raw feature : Bi-Directional Attributed Control Flow Graph (BACFG) 104 defined as follows.

[0037]Definition 1. (Bi-directional Attributed Control Flow Graph) The bi-directional attributed control flow graph, or BACFG in short, is a special directed graphs with two edges G=1, E2, φ>, where V is a set of basic blocks; E1⊆V×V is a set of edges representing the connections between these basic blocks, E2=E1T⊆V×V is a set of edges representing the reversed connections between these basic blocks, and φ: V→Σ is the labeling function which maps a basic block in V to a set of attributes in .

embodiment 102

[0038]Bi-directional ACFG extraction embodiment 102 can be implemented using different approaches. One approach relies on disassemblers such as IDA pro and Binary Ninja to disassemble the executable code 101. Every function in the executable code is recovered and its raw features (control flow graph, basic block information) are extracted. Finally, BACFG 104 is constructed from this information for every function in the executable code 101. FIG. 2 presents an example of constructing BACFG 104 from executable code. The disassembled raw code 201 is extracted using IDA pro, a commercial disassembler from a piece of OpenSSL executable code. It contains the control flow graph of function SSL_get_psk_identity_hint and basic block (n1, n2, n3, n4) information. 202 is the corresponding BACFG constructed for function SSL_get_psk_identity_hint. Every node in 201 represented in a set of attributes. The edges in 202 are kept in generated BACFG 202. The doted arrow line in 202 represents the rev...

embodiment 10

[0066]Since k is usually a small number (<20), expensive program analysis can be applied to exactly determine if the functions in candidate list are indeed vulnerable. The conditional formula based function identification embodiment 10 is implemented to identify the true vulnerable functions in candidate list.

[0067]Generally speaking, a conditional formula consists of an If-clause and a Then-clause, and each clause is a symbolic formula, describing under what condition (stated in the If-clause) a given action (in the Then-clause) will take place. A conditional formula explicitly captures two cardinal factors of a buggy code: (1) erroneous data dependencies, and (2) missing or incorrect condition checks. Instead of treating the vulnerable function as a whole, searching on structured conditional formulas can effectively localize the possibly vulnerable code logic. By contrasting conditional formulas between the vulnerable function and a target candidate, we can quickly diagnose whethe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A novel high-throughput embedding generation and comparison system for executable code is presented in this invention. More specifically, the invention relates to a deep-neural-network based graph embedding generation and comparison system. A novel bi-directional code graph embedding generation has been proposed to enrich the information extracted from code graph. Furthermore, by deploying matrix manipulation, the throughput of the system has significantly increased for embedding generation. Potential applications such as executable file similarity calculation, vulnerability search are also presented in this invention.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]The present application is a continuation of U.S. patent application Ser. No. 15 / 930,321, filed May 12, 2020, currently pending, which claims priority to U.S. Provisional Application No. 62 / 875,830, filed on Jul, 18, 2019, the disclosures of both of which are hereby incorporated by reference in their entireties into the present disclosure.STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT[0002]This invention was made with government support under Contract No. 1719175 awarded by the National Science Foundation and under Contract No. N00014-17-1-2893 awarded by the Office of Naval Research. The government has certain rights in the invention.THE NAMES OF THE PARTIES TO A JOINT RESEARCH AGREEMENT[0003]Not applicableINCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED ON A COMPACT DISC[0004]Not applicableBACKGROUND OF THE INVENTION[0005]Given two binary functions, we would like to detect whether they are semantically equivalent or sim...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F16/14G06K9/62G06F17/16
CPCG06F16/148G06K9/6215G06K9/6202G06K9/6247G06K9/6296G06F17/16G06F17/10G06F16/2237G06F18/2135G06F18/22G06F18/29
Inventor YIN, HENGHU, XUNCHAOYU, SHENGZHENG, YU
Owner DEEPBITS TECH INC