Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method for detecting byte code similarity of N-Gram

A bytecode and similarity technology, applied in character and pattern recognition, electrical digital data processing, instruments, etc., can solve the problem of program running speed reduction and achieve fast execution speed, strong scalability, and simple operation

Inactive Publication Date: 2019-10-22
SHANGHAI JIAO TONG UNIV
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As an ancient research direction of program analysis, code clone detection has been developed extremely well, from the initial text-based detection to the later control flow graph-based, and now a combination of multiple methods, but in general, as long as it involves Static analysis of the program requires the use of structural representation and similarity comparison (for example, comparing abstract syntax trees, etc.), and the running speed of the entire program will be greatly reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting byte code similarity of N-Gram
  • Method for detecting byte code similarity of N-Gram
  • Method for detecting byte code similarity of N-Gram

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] This embodiment uses Android R8 (ver.1.4.9). Android R8 is Google's new user-customizable bytecode obfuscator, which directly confuses and converts Java executable programs into Dalvikvm bytecodes, namely .dex files. The operating system is Ubuntu16.04, JDK1.8.

[0024] This embodiment includes the following steps:

[0025] Step 1) Data preparation: First, 50 obfuscation configuration files of Android R8 are randomly generated, and using different obfuscation configuration files will lead to differences in the final generated .dex files. Secondly, use Android R8 without any obfuscation configuration to convert the source Java program into a .dex file.

[0026] Transformation as described, using as figure 2 The shown Binary2Bytecode interface calls different decompilation tools by identifying the type of executable file. This design can shield the differences between executable files, and it is transparent to users who use this method.

[0027] Step 2) Perform N-Gra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for detecting the byte code similarity of N-Gram. According to the method, the method comprises the following steps: converting an executable binary file to be comparedinto a byte code by using an N-element grammar model, and adopting N-element grammar model; analyzing the byte code by a Gram hash algorithm to obtain a corresponding hash value; and finally, extracting features from the byte code by a winning algorithm, and calculating to obtain the similarity. According to the method, similarity judgment on the byte code level can be carried out on the Java executable file. The hash algorithm is used for analysis. The execution efficiency of the method is improved, and therefore the method can be widely applied to Java byte code confusion degree evaluation,code clone detection and other aspects.

Description

technical field [0001] The invention relates to a technology in the field of computer information processing, in particular to a method for detecting bytecode similarity based on a language model (N-Gram). Background technique [0002] Bytecode similarity calculation is a research direction of program analysis. It is of great significance for code clone detection and confusion evaluation. It can help programmers reduce redundant code, improve coding efficiency, improve code security, protect code and support property rights. The biggest advantage of the Java language is its platform independence, compile once and run anywhere, but this feature also causes its defect of easy decompilation. So how to protect the intellectual property rights of Java programs and safeguard the interests of programmers, obfuscation technology came into being. [0003] Obfuscation, that is, under the premise of ensuring that the original semantics of the bytecode program remains unchanged, it is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75G06K9/62
CPCG06F8/751G06F18/22
Inventor 彭艳茹陈雨亭沈备军
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products