A system for applet privacy policy and permission call consistency analysis

By constructing a consistency analysis system for mini-program privacy policies and permission calls, the problems of low automation and insufficient accuracy in existing technologies for mini-program privacy compliance detection are solved, achieving efficient and accurate consistency detection of privacy policies and permission calls.

CN117056966BActive Publication Date: 2026-06-26XIDIAN UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
XIDIAN UNIV
Filing Date
2023-08-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies for detecting privacy compliance in mini-programs suffer from low automation, insufficient accuracy, and inadequate granularity in parsing privacy policies, making it impossible to comprehensively detect the consistency between the mini-program's privacy policy and permission calls. This results in low detection efficiency and high costs.

Method used

This paper provides a system for analyzing the consistency of privacy policies and permission calls in mini-programs. The system includes a privacy policy extraction module, an automated parsing module, a program analysis module, and a consistency analysis module. By simulating user behavior, constructing page transition graphs, and setting sensitive permission sets, it achieves automated and accurate detection.

Benefits of technology

It achieves comprehensive and accurate detection of consistency between the privacy policy and permission calls of mini-programs, improving detection efficiency, reducing labor costs, and increasing detection accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117056966B_ABST
    Figure CN117056966B_ABST
Patent Text Reader

Abstract

The application provides a system for analyzing the consistency between the privacy policy and the permission call of an applet, comprising: a privacy policy extraction module for extracting the privacy policy of the privacy page in the applet to be analyzed; a privacy policy automatic analysis module for identifying the privacy policy and extracting the sensitive permission to form a first sensitive permission call set; a program analysis module for constructing a guide page transition graph and a dynamic page transition graph, and combining the two to obtain an overall page transition graph; determining a second sensitive permission call set from the overall transition graph by simulating the call process of a specified page; and a consistency analysis module for performing consistency analysis on the first sensitive permission call set and the second sensitive permission call set to determine whether the applet to be analyzed has risks. The privacy policy analysis of the application is more comprehensive, the entity recognition model is more accurate, and the program analysis depth is combined, so that the permission call detection is more comprehensive and accurate.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of mini-program analysis technology, specifically relating to a system for analyzing the consistency between mini-program privacy policies and permission calls. Background Technology

[0002] In recent years, regulatory agencies have continuously carried out special governance work on data security and strengthened law enforcement and supervision, resulting in the promulgation of various regulations in the security field. Data security, exemplified by mobile application security, is related to the protection of individual user privacy information. Mobile application development must comply with relevant regulations and documents to protect privacy. Leading internet companies such as Tencent, Huawei, and Baidu, as well as some security service providers, have launched compliance testing products. Meanwhile, mini-programs, as a rising star in the internet industry, have rapidly captured the mobile application market, with their user base surging by hundreds of millions, naturally making them a key focus of data security supervision and governance. A series of mini-program security products have emerged on the market, aiming to improve the overall security level of mini-programs based on the current state of the industry.

[0003] A survey of several representative compliance testing products currently on the market, combined with the actual needs of compliance testing scenarios, summarized the industry's pain points. The survey results of seven mainstream privacy compliance products show that the testing capabilities and coverage of existing privacy compliance products vary considerably. Furthermore, current compliance products primarily provide compliance testing services for Android / iOS applications, with very few supporting privacy compliance testing for mini-programs. Given the significant differences in architecture and operation between mini-programs and applications, privacy compliance testing frameworks built on applications are difficult to migrate and apply to mini-programs. Among the few products that offer mini-program privacy compliance services, most rely on manual auditing. Currently, only Tencent T-Sec Application Compliance Platform provides an automated mini-program compliance testing solution.

[0004] Tencent's T-sec application compliance platform, based on static code analysis and dynamic sandbox testing technologies, performs automated privacy compliance checks on mini-programs from four dimensions: privacy policy compliance, user authorization compliance, personal data collection, and data subject rights. However, the Tencent T-sec platform has significant shortcomings: 1) Regarding privacy policy compliance, T-sec focuses on whether the mini-program has a dedicated privacy policy, whether the privacy policy has pop-ups, and whether the privacy policy clearly informs users about the collection of sensitive personal data. It does not provide sufficient analysis of the privacy policy content and lacks extraction and analysis of content related to sensitive data processing within the privacy policy. 2) If the mini-program reasonably declares the purpose, method, and scope of collecting sensitive data in its privacy policy, then even if such sensitive information is collected and used, it does not violate legal regulations. However, the Tencent T-sec privacy compliance platform does not establish a link between the mini-program's privacy policy and actual behavior, leading to false positives for the aforementioned compliance behaviors and a low detection accuracy rate.

[0005] Meanwhile, the survey results show that most products neglect the text parsing of privacy policies. A privacy policy is an agreement between a company and its users regarding how to handle and protect users' personal information. It typically exists as part of legal requirements or industry standards and serves as an important legal basis in cases of disputes concerning the handling of personal information. However, privacy policies are often lengthy, complex, and difficult to understand, requiring users to possess a certain level of legal knowledge and professional expertise to fully comprehend their content. Therefore, automatically extracting information related to personal data processing from privacy policies is of significant importance and practical value. However, current privacy policy parsing solutions offered by products have considerable limitations: some products rely on manual evaluation, which is inefficient and time-consuming; others, while employing automated natural language processing technology, fail to extract relevant privacy statement information effectively or with sufficient granularity, lacking a comprehensive understanding of the privacy policy.

[0006] In summary, there is still significant room for improvement in the industry's compliance testing of privacy statements and permission calls for mini-programs: 1) Although a series of privacy compliance products have been launched, research on privacy compliance focuses primarily on Android / iOS applications, with little attention paid to mini-programs; 2) Existing products also lack sufficient and superficial research on automated parsing of privacy policies, failing to provide comprehensiveness and granularity; 3) The unique architecture mechanism means that compliance testing frameworks based on native applications cannot be directly applied to mini-programs, hindering automated guided analysis. Among the few products that offer mini-program privacy compliance services, most rely on manual auditing, resulting in low testing efficiency and high time and labor costs, failing to meet the industry's testing needs. Summary of the Invention

[0007] To address the aforementioned problems in the existing technology, this invention provides a system for consistency analysis of privacy policies and permission calls in mini-programs. The technical problem to be solved by this invention is achieved through the following technical solution:

[0008] This invention provides a system for analyzing the consistency between privacy policies and permission calls in mini-programs, including:

[0009] The privacy policy extraction module is used to obtain the mini-program to be analyzed and extract the privacy policy of the privacy page in the mini-program to be analyzed;

[0010] The privacy policy automated parsing module is used to identify and classify the privacy policy using a trained privacy policy recognition model to obtain a privacy policy seven-tuple, and extract sensitive permissions from the privacy policy seven-tuple to form a first sensitive permission call set;

[0011] The program analysis module is used to perform guided analysis to construct a guided page transition diagram and dynamic analysis to construct a dynamic page transition diagram for the program to be analyzed, and to combine the dynamic page transition diagram with the guided page transition diagram to obtain an overall page transition diagram; by simulating the calling process of a specified page, the second sensitive permission call set of the mini-program to be analyzed is determined from the overall transition diagram.

[0012] The consistency analysis module is used to perform consistency analysis between the first sensitive permission call set and the second sensitive permission call set to determine whether the mini-program to be analyzed has any risks.

[0013] This invention provides a system for consistency analysis of privacy policies and permission calls in mini-programs, comprising: a privacy policy extraction module for acquiring the mini-program to be analyzed and extracting the privacy policy from the privacy pages of the mini-program; an automated privacy policy parsing module for identifying and classifying the privacy policy using a trained privacy policy recognition model to obtain a seven-tuple of privacy policies, and extracting sensitive permissions from the seven-tuple to form a first sensitive permission call set; a program analysis module for performing guided analysis to construct a guided page transition graph and dynamic analysis to construct a dynamic page transition graph, and combining the dynamic page transition graph with the guided page transition graph to obtain an overall page transition graph; determining a second sensitive permission call set from the overall transition graph by simulating the call process of a specified page; and a consistency analysis module for performing consistency analysis between the first and second sensitive permission call sets to determine whether the mini-program to be analyzed has any risks. This invention provides more comprehensive privacy policy parsing, more accurate entity recognition models, and deep integration with program analysis, resulting in more comprehensive and accurate detection of permission calls. The system of this invention has a high degree of automation, strong usability, a user-friendly UI, and broad application prospects.

[0014] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0015] Figure 1 This is a system diagram provided by the present invention for analyzing the consistency between the privacy policy and permission calls of a mini-program;

[0016] Figure 2 The principle architecture diagram of the mobile mini-program privacy policy and permission call consistency compliance detection system provided for the implementation of this invention;

[0017] Figure 3 A dynamic analysis framework diagram of the mobile mini-program privacy policy and permission call consistency compliance detection system provided for the implementation of this invention;

[0018] Figure 4a Example diagram of ontology and sensitive API mapping provided for implementation of the present invention;

[0019] Figure 4b A diagram illustrating the principle of the consistency analysis scheme for the privacy policy and permission calls of mini-programs provided for the implementation of this invention;

[0020] Figure 5 Example diagram of privacy policy parsing results provided by the mobile mini-program privacy policy and permission call consistency compliance detection system for implementation of the present invention;

[0021] Figure 6 Example diagram of automated dynamic test logs for the mobile mini-program privacy policy and permission call consistency compliance detection system provided for the implementation of this invention;

[0022] Figure 7 The image shows the homepage of the mobile app's privacy policy and permission call consistency compliance detection system provided for the implementation of this invention. Detailed Implementation

[0023] The present invention will be further described in detail below with reference to specific embodiments, but the implementation of the present invention is not limited thereto.

[0024] Example 1

[0025] Combination Figures 1 to 3 This invention provides a system for analyzing the consistency between a mini-program's privacy policy and permission invocation, comprising:

[0026] The privacy policy extraction module is used to obtain the mini-program to be analyzed and extract the privacy policy of the privacy page in the mini-program to be analyzed;

[0027] It is worth noting that the privacy policy extraction module uses OpenCV template matching and UIAutomator and ChromeDevTools automatic UI testing tools to parse the mini-program pages, simulates user click behavior based on Appium and triggers page redirection, parses the content of the redirected page and extracts the privacy policy.

[0028] The privacy policy automated parsing module is used to identify and classify the privacy policy using a trained privacy policy recognition model to obtain a privacy policy seven-tuple, and extract sensitive permissions from the privacy policy seven-tuple to form a first sensitive permission call set;

[0029] It is worth noting that this invention is based on a fine-grained Chinese privacy policy annotation dataset, trains a BERT-BiLSTM-CRF model, performs named entity recognition on the privacy policies embedded in mini-programs, and extracts the privacy policy seven-tuple.

[0030] The program analysis module is used to perform guided analysis to construct a guided page transition diagram and dynamic analysis to construct a dynamic page transition diagram for the program to be analyzed, and to combine the dynamic page transition diagram with the guided page transition diagram to obtain an overall page transition diagram; by simulating the calling process of a specified page, the second sensitive permission call set of the mini-program to be analyzed is determined from the overall transition diagram.

[0031] It's worth noting that in the guided analysis section, the relationship between page transition states and function calls in the mini-program is modeled, constructing a UI Transition Graph (UTG) and a Function Call Graph (FCG) to determine all pages reachable through UI interactions within the mini-program, as well as all reachable functions / sensitive API calls on specific pages. In the dynamic analysis section, the mini-program's page state space is traversed in a depth-first manner to further refine the UTG obtained from the guided analysis. Guided by the sensitive API analysis results from the guided analysis section and the aforementioned UTG, user clicks are simulated using the Appium framework to trigger sensitive APIs on specific pages, thereby obtaining the second set of sensitive permission calls during mini-program runtime.

[0032] The consistency analysis module is used to perform consistency analysis between the first sensitive permission call set and the second sensitive permission call set to determine whether the mini-program to be analyzed has any risks.

[0033] It is worth noting that this invention performs consistency analysis on the sensitive permission set obtained from guidance analysis and the sensitive permission set obtained from privacy policy parsing to identify redundantly declared permissions and undeclared sensitive permission calls. Using the FCG obtained from guidance analysis, bounded backtracking is performed on permission request points to find pages that call sensitive permissions, accurately identifying unauthorized sensitive permission calls. Combining taint analysis and sensitive API hooking technology, relevant data related to sensitive permissions is tracked to analyze potential data leakage or permission abuse risks.

[0034] The system of the present invention supports the adaptation of simulators and real devices, and the simulators and real devices are connected to the system via USB or network.

[0035] Example 2

[0036] As an optional embodiment of the present invention, the privacy policy extraction module is used for:

[0037] The mini-program to be analyzed is parsed to obtain multiple trigger pages;

[0038] Simulate user click behavior to trigger the trigger page, so as to redirect to the next page;

[0039] Determine whether the redirected page is a privacy page; if so, extract the privacy policy of the privacy page.

[0040] Combination Figures 3 to 7 The privacy policy extraction module of this invention uses the Appium engine to perform native mobile application interactions to trigger page redirection, utilizes UIAutomator and Chrome Dev Tools to parse the page, and performs template graph matching based on OpenCV to ultimately locate the privacy policy page and extract the privacy policy of the mini-program. The privacy policy extraction module of this invention extracts the privacy policy in three steps:

[0041] 1) Location of the Privacy Policy Page

[0042] The OpenCV template matching method is used to find whether the page contains key elements such as "privacy policy". For pages that do not contain key elements, UI automation testing tools are used to further parse the page. Based on the above search results or parsing results, Appium is used to simulate user click behavior to trigger page redirection, thereby locating the privacy policy page.

[0043] 2) Privacy Policy Page Determination: Pages containing keywords such as "privacy" and "policy" with a keyword length exceeding a certain threshold are determined to be privacy policy pages;

[0044] 3) Extraction of privacy policy text: If the privacy policy is an external link, this invention crawls the privacy policy text by accessing the link without displaying it; if it is a mini-program page, the page text is extracted directly or the privacy policy is extracted using OCR text recognition method.

[0045] The testing architecture for this mini-program can be implemented using Appium, or other mainstream UI automation testing tools such as Selenium and Airtest. Selenium is an open-source web application automation testing tool that can run directly on multiple browser platforms, supporting almost all major browsers. Airtest, like Appium, is an automation testing tool for mobile apps. Both can record and play back automated test scripts. Its advantage lies in the fact that even testers without programming or scripting knowledge can automatically complete script recording through normal user clicks and drags, thus significantly reducing automation maintenance costs.

[0046] Example 3

[0047] As an optional embodiment of the present invention, the privacy policy extraction module is used for:

[0048] The UI automation testing tool was used to parse the page of the mini-program to be analyzed, resulting in multiple trigger pages;

[0049] The Appium-based system simulates user clicks to trigger a page, which then redirects the user to a different page.

[0050] If the redirected page contains multiple keywords related to privacy policies and the number of keywords exceeds a preset threshold, then the redirected page is determined to be a privacy page.

[0051] If the privacy page is an external link, the privacy policy of the privacy page is crawled by accessing the external link without displaying it; if the privacy page is a mini-program page, the privacy policy of the privacy page is extracted directly or extracted using OCR text recognition method.

[0052] Example 4

[0053] As an optional embodiment of the present invention, the privacy policy automated parsing module is used for:

[0054] Select the CA4P-483 Chinese privacy policy annotation set and the BERT-BiLSTM-CRF model;

[0055] The BERT-BiLSTM-CRF model was trained using the CA4P-483 Chinese privacy policy annotation set to obtain a trained privacy policy recognition model, and a pre-defined binary classifier was trained using the CA4P-483 Chinese privacy policy annotation set to obtain a trained binary classifier.

[0056] It is worth noting that: before parsing, the automatic privacy policy parsing module of this invention selects the CA4P-483 Chinese privacy policy annotation set for model training, then uses the BERT large language model to train a binary classifier for privacy policy preprocessing, and then selects the BERT-BiLSTM-CRF model to identify each named entity in the privacy policy, thus training the privacy policy recognition model and the binary classifier. The process is as follows:

[0057] 1) The CA4P-483 Chinese privacy policy annotation set was selected for model training. CA4P-483, based on relevant laws and regulations concerning Chinese privacy policies, organizes and extracts seven fine-grained annotation entities related to personal information processing, including data controller, data entity, collection, sharing, condition, purpose, and data receiver. These seven fine-grained annotations strictly adhere to regulatory requirements, encompassing all content related to personal information processing, providing a more comprehensive and accurate understanding of privacy policy content.

[0058] 2) In the privacy policy preprocessing section, privacy policies are divided into privacy-related and privacy-unrelated categories based on whether the statements contain personal information processing information. A binary classifier is trained using the CA4P-483 Chinese privacy policy annotation set to remove statements in the privacy policy that are unrelated to personal information processing.

[0059] 3) After that, the named entity recognition model is trained only using sentences related to personal information processing in the privacy policy, making the model's task clearer and more explicit, thereby improving the model's performance.

[0060] In the named entity recognition section of the privacy policy, this invention uses the BERT-BiLSTM-CRF model. The main architecture of the model consists of three modules: the BERT language model, the BiLSTM module, and the CRF module. The BERT language model pre-trains on the corpus, obtains the corresponding word vectors, and inputs them into the BiLSTM module for further processing. The BiLSTM module encodes the word vectors, captures the contextual semantics through long-distance dependencies, and obtains the hidden layer representation. The CRF module performs Viterbi decoding on the output of the BiLSTM module to obtain the predicted labeled sequence, extracts and classifies each entity in the sequence, and outputs the final labeled result of the model.

[0061] This invention trains a named entity recognition model based on a fine-grained Chinese privacy policy annotation set. By adhering to relevant legal and regulatory requirements, it comprehensively extracts seven types of entities related to data processing behavior within the privacy policy, overcoming the limitations of previous coarse-grained approaches in Chinese privacy policy parsing and achieving a comprehensive analysis of Chinese privacy policies. The BERT-BiLSTM-CRF model trained in this privacy policy parsing module combines the outstanding word vector representation advantages of the BERT preprocessing model with the superior contextual semantic capture capabilities of the BiLSTM-CRF model. It achieves an f1-score of 91.1% for the identification of the seven data processing-related entities, further advancing research in Chinese privacy policy parsing.

[0062] Example 5

[0063] As an optional embodiment of the present invention, the privacy policy automated parsing module is used for:

[0064] The privacy policy of the privacy page is divided into privacy-related and privacy-irrelevant parts by using a trained binary classifier.

[0065] The privacy-related policies are input into the trained privacy policy recognition model to obtain a privacy policy seven-tuple.

[0066] An automated privacy policy parsing module. Based on a fine-grained Chinese privacy policy annotation dataset, a BERT-BiLSTM-CRF model is trained to perform named entity recognition on the embedded privacy policies in mini-programs, extracting the privacy policy septuplets.

[0067] Example 6

[0068] As an optional embodiment of the present invention, the program analysis module is used for:

[0069] The source code of the mini-program to be analyzed is used to obtain the original page;

[0070] The analysis of the original pages yields the bootstrap states between the original pages and the relationships between bootstrap call functions;

[0071] Construct a boot page transition graph based on the boot transition state and a function call graph based on the boot call function;

[0072] Identify all interactive pages reachable through UI interaction, all reachable functions of the interactive pages, and all sensitive API calls from the aforementioned onboarding page transition diagram;

[0073] Traverse all trigger pages and the redirected pages from each trigger page in a depth-first manner to obtain the dynamic transition states and dynamic function call relationships;

[0074] A dynamic page transition graph is constructed based on the dynamic transition states and the dynamic function call relationships.

[0075] The overall page transition diagram is obtained by combining the guided page transition diagram and the dynamic page transition diagram;

[0076] All sensitive API calls identified based on the overall page transition diagram and the guide page transition diagram;

[0077] Simulate the triggering process of sensitive API calls on a specified page to obtain the set of second sensitive permission calls during the runtime of the mini-program to be analyzed.

[0078] This invention traverses the state space of a mini-program page in a depth-first manner, optionally combining user-provided guidance analysis results and simulating user clicks based on the Appium framework, thereby triggering sensitive APIs on specific pages. At the same time, it uses Frida Hook technology to detect them in real time and obtain the set of sensitive permission calls during mini-program runtime.

[0079] This invention uses dynamic analysis to filter out unreachable pages based on user-provided guidance analysis, creating a complete and accurate page transition graph, and further constructing the correct sensitive API trigger paths. After determining the trigger paths, it simulates clicks to trigger sensitive permission calls, while simultaneously monitoring the call activity using Frida Hook technology to determine the set of sensitive permission calls during mini-program runtime. The dynamic analysis module can also be enabled independently for full-page dynamic testing. This solution overcomes both the high false positive rate of guidance analysis and the low detection coverage of dynamic analysis, thus achieving comprehensive and accurate detection of sensitive permission calls in mini-programs.

[0080] Example 7

[0081] As an optional embodiment of the present invention, the system further includes a monitoring module, which is used for:

[0082] Run the Frida Hook script to monitor the sensitive API calls and specific parameters of the mini-program to be analyzed during the process;

[0083] It's worth noting that Frida Hook can also be used by the Xposed and LSposed frameworks to perform runtime monitoring of sensitive APIs. Both achieve this by replacing the ` / system / bin / app_precesss` program to control the Zygote process, causing it to load a JAR file of the framework during system startup. This hijacks the Zygote process and the Dalvik virtual machine it creates, allowing developers to independently replace any class, such as the framework itself, the system UI, or any app. Furthermore, this method has the advantage of not altering the ROM or the app, and removing the changes is very easy; simply disabling the framework completely restores the original state.

[0084] The MITM proxy is used as a packet capture module to intercept the various packets downloaded by the mini-program to be analyzed in real time, and to intercept all network traffic of the mini-program to be analyzed.

[0085] The MITM packet capture module of this invention can achieve the same purpose using other network packet capture tools, such as tcpdump, wireshark, Fiddler, and Charles. Although their implementation principles differ from the man-in-the-middle attack of mitmproxy, they can still achieve the same objective as this tool in this invention.

[0086] Detect potential malicious third-party domain request behavior in the mini-program to be analyzed.

[0087] It is worth noting that the monitoring module is used for feasibility verification, interface search, and parameter reception.

[0088] Feasibility Verification: A preliminary feasibility verification of the above solutions is conducted based on the Objection framework. Objection is a Frida-based runtime mobile application roaming toolkit used to help security researchers quickly assess the security status of mobile applications. It includes powerful features such as memory roaming, class and function monitoring, and generation of template hook scripts.

[0089] Interface discovery: In order to distinguish it from system-level API calls, this invention does not adopt the method of directly hooking sensitive APIs at the Android system level; at the same time, considering the stability and portability of Hook, this invention must bypass obfuscation classes and find a relatively stable interface.

[0090] Parameter reception: Simple communication between JavaScript and Python can be achieved through Frida's RPC mechanism.

[0091] Example 8

[0092] As an optional embodiment of the present invention, the program analysis module is further configured to:

[0093] Based on the aforementioned guide page transition diagram and the aforementioned guide function call diagram, the user is guided through a simulated professional page process to obtain a set of sensitive permission calls for user-provided guide analysis.

[0094] Combination Figures 3 to 7 The system of this invention provides two dynamic analysis methods for users to choose from: First, the program analysis module can be loaded independently, in which case the dynamic analysis will automatically click on all clickable areas on the page to comprehensively trigger sensitive APIs. Second, users can provide guidance for the dynamic analysis, such as a page state transition diagram of the program. In this case, the dynamic analysis acts as an aid to gray-box testing, verifying the correctness of the guided analysis and providing more granular privacy analysis for this invention. The process by which this invention obtains the sensitive permission call set for user-provided guided analysis is as follows:

[0095] (1) A model-based mini-program exploration strategy was established using guided testing, which then generated automated test cases. Appium was used to simulate the user's interaction with the mini-program, thereby dynamically analyzing the sensitive permissions acquired by the mini-program during runtime. The implementation scheme for this part is as follows: Figure 3 As shown. (2) Run as an independent module. This invention also supports dynamic analysis running as an independent module. At this time, the system also uses UIAutomator and Chrome Dev Tools engine to parse all clickable controls on the page and automatically simulate clicks, and combines the output of Frida Hook to filter the sensitive permission triggering situation at runtime.

[0096] Example 9

[0097] As an optional embodiment of the present invention, the consistency analysis module is used for:

[0098] Consistency analysis is performed on the first sensitive permission call set, the sensitive permission call set provided by the user for guidance analysis, and the second sensitive permission call set to obtain redundant declared permissions and undeclared sensitive permissions, thereby determining whether the mini-program to be analyzed has any risks.

[0099] The goal of this invention is to combine privacy policy analysis with program analysis to perform consistency analysis on the set of sensitive permissions declared in the privacy policy of a mini-program and the set of sensitive permissions actually invoked obtained from dynamic guidance analysis, thereby detecting privacy compliance and permission abuse issues in mini-programs.

[0100] Privacy policies often use different terms to describe the same information; therefore, this invention needs to determine the relationships between these terms to determine whether two different expressions refer to the same thing. To address this problem, this invention constructs an ontology of privacy-related terms to represent the generalization relationships between terms, facilitating mapping to sensitive APIs. An ontology is a formal description of an entity and its attributes, relationships, and behaviors. For example... Figure 4a In the privacy statement, both "Bluetooth access" and "beacon" refer to "Bluetooth", and the corresponding sensitive API is wx.startBluetoothDevicesDiscovery.

[0101] In addition, this invention summarizes sensitive APIs and the sensitive information they collect, mapping the set of sensitive APIs obtained from dynamic guidance analysis to their corresponding collected privacy entities. Then, using the ontology constructed by this invention, the privacy information corresponding to the sensitive APIs is compared with the sensitive information declared in the privacy policy to determine whether there are any violations such as redundant declarations in the privacy policy or collection of personal information beyond its scope. The overall solution architecture is as follows: Figure 4b As shown.

[0102] Based on the analysis results of the privacy policy statement permission set (A), i.e. the first sensitive permission call set, the user-provided guidance analysis permission call set (B), and the degree-dynamic runtime permission set (C), i.e. the second sensitive permission call set, the permission abuse in mini-programs can be divided into the following two categories:

[0103] • Redundant Privacy Policy Statement (weak violation): That is, there is a sensitive permission P, which is declared in the mini program's privacy policy, but no related permission calls are found in the user-provided guidance analysis results, and there is redundant permission declaration in the privacy policy;

[0104] • Failure to declare all privacy permissions (serious violation): This means that there is a sensitive permission P, which is invoked in the user-provided guidance analysis results, but there is no relevant statement in the privacy policy, which constitutes an act of collecting user privacy information beyond the scope.

[0105] Example 10

[0106] As an optional embodiment of the present invention, the consistency analysis module is further used to upload the result of whether the mini-program to be analyzed has risks and the user information that causes the risks to a third-party server.

[0107] Combination Figures 3 to 7This invention performs consistency analysis between the sensitive permission set obtained from program analysis and the sensitive permission set obtained from privacy policy parsing, identifying redundantly declared permissions and undeclared sensitive permission calls. Optionally, it utilizes user-provided function click guidance to perform bounded backtracking of permission request points, locating pages that call sensitive permissions and accurately identifying unauthorized sensitive permission calls. Combining network traffic monitoring and sensitive API hooking technology, it tracks relevant data related to sensitive permissions, analyzing potential data leakage or permission abuse risks. Ultimately, it detects violations in mini-programs, such as undeclared collection of user information and redundant permission declarations, and can also reveal malicious behaviors such as uploading user information to third-party servers and unauthorized access to system cameras.

[0108] This invention, relying on natural language processing technology and JavaScript dynamic guidance analysis technology, implements a lightweight, efficient, and automated framework for analyzing the consistency of privacy policies and permission calls in mini-programs. Using this work, an empirical evaluation of privacy compliance issues existing in the mini-program ecosystem was conducted, identifying privacy compliance issues in 1265 mini-programs. Of these, 700 mini-programs underwent manual review, confirming that 663 had issues, achieving a detection accuracy rate of 94.7%, thus demonstrating the usability of this invention.

[0109] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0110] Although this application has been described herein in conjunction with various embodiments, those skilled in the art will understand and implement other variations of the disclosed embodiments by reviewing the accompanying drawings, the disclosure, and the appended claims in carrying out the claimed application. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality.

[0111] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.

Claims

1. A system for analyzing the consistency between privacy policies and permission calls in mini-programs, characterized in that, include: The privacy policy extraction module is used to obtain the mini-program to be analyzed and extract the privacy policy of the privacy page in the mini-program to be analyzed; The privacy policy automated parsing module is used to identify and classify the privacy policy using a trained privacy policy recognition model to obtain a privacy policy seven-tuple, and extract sensitive permissions from the privacy policy seven-tuple to form a first sensitive permission call set; The program analysis module is used to perform guided analysis to construct a guided page transition diagram and dynamic analysis to construct a dynamic page transition diagram for the mini-program to be analyzed, and to combine the dynamic page transition diagram with the guided page transition diagram to obtain the overall page transition diagram; By simulating the calling process of a specified page, the set of second sensitive permission calls during the runtime of the mini-program to be analyzed is determined from the overall page transition diagram; The consistency analysis module is used to perform consistency analysis between the first sensitive permission call set and the second sensitive permission call set to determine whether the mini-program to be analyzed has any risks. The program analysis module is used for: The source code of the mini-program to be analyzed is used to obtain the original page; The analysis of the original pages yields the bootstrap states between the original pages and the relationships between bootstrap call functions; Construct a boot page transition graph based on the boot transition state and a function call graph based on the boot call function; Identify all interactive pages reachable through UI interaction, all reachable functions of the interactive pages, and all sensitive API calls from the aforementioned onboarding page transition diagram; Traverse all trigger pages and the redirected pages from each trigger page in a depth-first manner to obtain the dynamic transition states and dynamic function call relationships; A dynamic page transition graph is constructed based on the dynamic transition states and the dynamic function call relationships. The overall page transition diagram is obtained by combining the guided page transition diagram and the dynamic page transition diagram; All sensitive API calls identified based on the overall page transition diagram and the guide page transition diagram; Simulate the triggering process of sensitive API calls on a specified page to obtain the set of second sensitive permission calls during the runtime of the mini program to be analyzed; The program analysis module is also used for: Based on the aforementioned guide page transition diagram and the aforementioned function call diagram, the user is guided to simulate the professional process of the page to obtain the set of sensitive permission calls provided by the user for guidance analysis; The consistency analysis module is used for: Consistency analysis is performed on the first sensitive permission call set, the sensitive permission call set provided by the user for guidance analysis, and the second sensitive permission call set to obtain redundant declared permissions and undeclared sensitive permissions, thereby determining whether the mini-program to be analyzed has any risks.

2. The system for consistency analysis of privacy policies and permission calls in mini-programs according to claim 1, characterized in that, The privacy policy extraction module is used for: The mini-program to be analyzed is parsed to obtain multiple trigger pages; Simulate user click behavior to trigger the trigger page, so as to redirect to the next page; Determine whether the redirected page is a privacy page; if so, extract the privacy policy of the privacy page.

3. The system for consistency analysis of privacy policies and permission calls in mini-programs according to claim 2, characterized in that, The privacy policy extraction module is used for: The UI automation testing tool was used to parse the page of the mini-program to be analyzed, resulting in multiple trigger pages; The Appium-based system simulates user clicks to trigger a page, which then redirects the user to a different page. If the redirected page contains multiple keywords related to privacy policies and the number of keywords exceeds a preset threshold, then the redirected page is determined to be a privacy page. If the privacy page is an external link, the privacy policy of the privacy page is crawled by accessing the external link without displaying it; if the privacy page is a mini-program page, the privacy policy of the privacy page is extracted directly or extracted using OCR text recognition method.

4. The system for consistency analysis of privacy policies and permission calls in mini-programs according to claim 1, characterized in that, The privacy policy automated parsing module is used for: Select the CA4P-483 Chinese privacy policy annotation set and the BERT-BiLSTM-CRF model; The BERT-BiLSTM-CRF model was trained using the CA4P-483 Chinese privacy policy annotation set to obtain a trained privacy policy recognition model, and a pre-defined binary classifier was trained using the CA4P-483 Chinese privacy policy annotation set to obtain a trained binary classifier.

5. The system for consistency analysis of privacy policies and permission calls in mini-programs according to claim 4, characterized in that, The privacy policy automated parsing module is used for: The privacy policy of the privacy page is divided into privacy-related and privacy-irrelevant parts by using a trained binary classifier. The privacy-related policies are input into the trained privacy policy recognition model to obtain a privacy policy seven-tuple.

6. The system for consistency analysis of privacy policies and permission calls in mini-programs according to claim 1, characterized in that, The system also includes a monitoring module, which is used for: Run the Frida Hook script to monitor the sensitive API calls and specific parameters of the mini-program to be analyzed during the process; The MITM proxy is used as a packet capture module to intercept the various packets downloaded by the mini-program to be analyzed in real time, and to intercept all network traffic of the mini-program to be analyzed. Detect potential malicious third-party domain request behavior in the mini-program to be analyzed.

7. The system for consistency analysis of privacy policies and permission calls in mini-programs according to claim 1, characterized in that, The consistency analysis module is also used to upload the results of whether the mini-program to be analyzed has risks and the user information that causes the risks to a third-party server.