Method for discriminating re-package of application based on keyword context frequency matrix

A frequency matrix and keyword technology, applied in the processing field of identifying application repackaging, can solve the problems of complexity and inefficiency, relying on the order of code text, unable to handle inserting useless code, etc., to improve execution efficiency, avoid extra overhead, and reduce space. effect of overhead

Active Publication Date: 2013-12-25
PEKING UNIV
View PDF1 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method needs to analyze all codes, which is complex and inefficient, and depends on the order of the code text, and cannot handle

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for discriminating re-package of application based on keyword context frequency matrix
  • Method for discriminating re-package of application based on keyword context frequency matrix
  • Method for discriminating re-package of application based on keyword context frequency matrix

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0045] A. When preprocessing the application file, do the following:

[0046] A1. Extract the Android application binary code file and the author's signature information file in the META-INFO file;

[0047] A2. Use existing tools, such as backsmali (https: / / code.google.com / p / smali / ), to convert the binary code (.dex file) into a smali code file;

[0048] A3. Use existing tools, such as keytool (JDK (Java Development Kit) development component tool), to extract the author's signature content from the corresponding file (CERT.RSA);

[0049]A4. Construct keyword vectors. The basis for selecting keywords is to select sentences with a relatively high frequency of occurrence; there is no obvious repetition of keywords in semantics, that is, the keyword set can cover sentences with different functions; select semantically important instructions , such as operation instructions, function call instructions, etc.

[0050] B. In the part of generating the smali operator sequence, perfo...

Embodiment 1

[0063] Suppose an Android application, whose Chinese name is "Automatic Desktop Photo Filter", needs to detect whether it is a repackaged application, and its package name is AutodeskzhaopianlvjingPixlr_o_matic_V2.2.1_mumayi_ac32a.apk.

[0064] A. The preprocessing process includes the following steps:

[0065] A1. After decompressing the apk program package, you can get several files and folders, among which the class.dex file is the binary code file of the application program, and the CERT.RSA file under the META-INFO folder is the author's signature information;

[0066] A2. Use the backsmali tool to convert the class.dex file into a smali code file, and a folder will be generated, including multiple .smali files;

[0067] A3. Use the keytool tool to extract the author's signature content from CERT.RSA, and enter the keytool-printcert-fileCERT.RSA command to get some information about the application, including the owner, publisher, serial number, validity period, and certi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for discriminating re-package of an application based on a keyword context frequency matrix, which is applied to an android system. The method comprises the following steps of firstly processing an application program file to obtain a smali code file, processing smali codes, extracting an operator sequence, counting keyword information, constructing context-related characteristic triple to each specific type keyword to generate a characteristic matrix based on context frequency, performing comparison in pairs to the characteristic matrixes of application programs, and obtaining the similarity degree of two application programs according to the similarity degree of the characteristic matrixes; finally combining contents such as writer information to judge whetherthe re-package relationshipexists inthe application programsor not. By utilizing the technical scheme of the method provided by the invention, re-packed android application programs can be judged, and meanwhile, the additional expense for performing mega string hash process to a whole application program is avoided; no dependency on a binary code sequenceof an original file exists; by limiting the sizes of the characteristic matrixes, the space expense is reduced; the performing efficiency of android application program re-package judgment is improved.

Description

technical field [0001] The present invention relates to an application repackaging identification method based on a keyword context frequency matrix, and in particular to a processing method for identifying application repackaging by using an application code keyword frequency matrix under an Android platform. Background technique [0002] Android (Android) system is a Linux-based free and open source operating system developed and promoted by Google, mainly used in mobile devices, such as smartphones or tablets. The Android system is currently the mobile phone operating system with the largest market share in the world. According to official data, there are more than 975,000 applications on the Android system. [0003] Usually, Android system applications are developed and released by third-party developers, which brings about a problem, that is, application repackaging. Application repackaging means that some developers grab applications released by other developers throu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/445
Inventor 郭耀吕骁博王浩宇刘梦馨陈向群
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products