A mobile application apk file embedded privacy policy extraction method

A technology of mobile application and extraction method, applied in special data processing application, unstructured text data retrieval, text database clustering/classification, etc. Problems such as slow speed, to achieve the effect of convenient collection and acquisition of data, increase rate of submission, and high success rate of extraction

Active Publication Date: 2021-12-14
BEIJING UNIV OF POSTS & TELECOMM
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] However, there are few existing studies on the extraction of privacy policies embedded in mobile applications. The main method is to analyze the application file structure through static analysis, preprocess the input samples, extract the required information, and generate the Activity tree. Figure; and then based on the Activity tree diagram and tree hierarchy, the Activity traversal script written by the traversal strategy, the main task is to find the page where the privacy agreement is located, and obtain the privacy policy file inside the mobile application by matching the keywords of the page-related control text
[0009] Since the accuracy of the activity tree diagram decreases with the level, and there may be omissions in the judgment of the privacy policy link based on the control text matching, there is still room for improvement in the success rate of privacy policy discovery
In addition, this method requires two steps of static analysis and automated testing for each input sample. Due to the complexity of the steps and the time-consuming automated testing, the extraction speed is slow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A mobile application apk file embedded privacy policy extraction method
  • A mobile application apk file embedded privacy policy extraction method
  • A mobile application apk file embedded privacy policy extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be further described in detail and in-depth below in conjunction with the accompanying drawings.

[0046] The present invention is based on the Android application program, uses dynamic and static detection of the privacy policy link embedded in the APK file of the mobile application, and automatically discovers and extracts the privacy policy link contained in the APK file; the overall workflow is as follows figure 1 Shown: First, statically analyze the APK file of the mobile application to be tested, and obtain all the URL link sets contained in the APK file through decompilation and rule matching; The privacy policy crawler extracts the features of each page and inputs them into the classification model to train the privacy policy page judgment model; through the privacy policy page judgment model based on machine learning algorithms,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting a privacy policy embedded in an APK file of a mobile application, which belongs to the field of analysis and detection of Android mobile terminal application software. Crawl the content of each webpage and extract the feature words in the privacy policy text. At the same time, collect the characteristic words of several web pages to train the binary classification model in advance; input each characteristic word of the APK file to be detected into the trained binary classification model one by one, and judge whether there is a privacy policy page in the output result, and if so, output the privacy policy And end; otherwise, conduct automated dynamic testing, extract the corresponding URL links by monitoring the request address in the traffic, crawl the content of each page to extract feature words, and enter the binary classification model to judge until the privacy policy page is found or exceeds the set value End of traversal depth. The invention improves the extraction efficiency and success rate of the privacy policy through the combination of dynamic and static tests.

Description

technical field [0001] The invention belongs to the field of analysis and detection of Android mobile terminal application software, and relates to a method for extracting a privacy policy embedded in an APK file of a mobile application. Background technique [0002] Static analysis refers to a technology that uses various means such as lexical analysis or syntax analysis to scan program files without running them, thereby generating decompiled code of the program, and then reading the decompiled code to grasp the function of the program. It is essentially a static text analysis, so it has high analysis efficiency. [0003] Common static decompilation tools include apktool, backsmali, and dex2jar, among which apktool is the most commonly used decompilation tool for static analysis. It is written in Java and can decompile and recompile APK files. It also has the installation-specific framework-res framework. Functions such as cleaning up the last decompiled folder. [0004]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F21/56G06F21/62G06F16/35G06F16/951G06F16/955G06F40/284
CPCG06F21/563G06F21/566G06F21/6245G06F16/951G06F16/955G06F40/284G06F16/35G06F2221/033
Inventor 郭燕慧徐国爱徐国胜张淼王皓月
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products