Android sdk version detection method and device, and storage medium
By decompiling APK files and generating signature codes, and using an inverted index library of SDK names to determine the Android SDK version, the problem of introducing insecure SDKs in Android application development is solved. This enables security alerts and version identification, improving both security and computational efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA TELECOM CORP LTD
- Filing Date
- 2021-05-07
- Publication Date
- 2026-06-26
AI Technical Summary
The lack of standards in Android application development may lead to the introduction of insecure third-party SDKs, necessitating version identification and security alerts.
By decompiling APK files, directory hierarchy features, source code features, and hash signatures are generated. The SDK name is determined using an inverted index library based on SDK name, and similarity is calculated to determine the SDK version.
It implements Android SDK version detection and security alerts, improving security, reducing computational load, and increasing computational efficiency.
Smart Images

Figure CN115309438B_ABST
Abstract
Description
TECHNICAL FIELD
[0001] The present disclosure relates to the field of security, and more particularly, to an Android SDK version detection method, device and storage medium. BACKGROUND
[0002] Due to the lack of corresponding specifications and standards in the current Android application development process, there may be a risk of introducing unsafe third-party SDKs in the application development process. Therefore, it is necessary to identify and warn the security of the third-party SDK. SUMMARY
[0003] A brief summary of the present disclosure is presented in the following to provide a basic understanding of some aspects of the present disclosure. It should be understood that this summary is not an extensive overview of the present disclosure. It is not intended to identify key or critical elements of the present disclosure or to delineate the scope of the present disclosure. Its sole purpose is to present some concepts of the present disclosure in a simplified form as a prelude to the more detailed description presented later.
[0004] According to an aspect of the present disclosure, a method for detecting an Android software development kit (SDK) version is provided, including: a decompilation step of decompiling an Android installation file (APK) to be detected into source code and resource files; a feature generation step of generating a directory hierarchy feature of the APK, a source code feature, and a hash feature code of the source code according to the decompiled source code and resource files, the directory hierarchy feature including paths of code files, the source code feature including at least classes, functions, and parameters used in the source code, and the hash feature code being obtained by hashing and weighting the source code; an SDK name determination step of determining an SDK name in the APK by reverse searching the directory hierarchy feature using an SDK name inverted index library, the SDK name inverted index library being an index library previously established with code file paths in the directory hierarchy feature of the SDK as keys and SDK names as values; a similarity generation step of calculating a directory hierarchy feature similarity, a source code feature similarity, and a hash feature code similarity between the APK and each version of the SDK according to the directory hierarchy feature of the APK, the source code feature, and the hash feature code of the source code; a total similarity generation step of generating a total similarity for each version of the SDK according to the directory hierarchy feature similarity, the source code feature similarity, and the hash feature code similarity; and an SDK version determination step of determining the SDK version in the APK according to the total similarity.
[0005] According to another aspect of the present disclosure, there is provided an apparatus for detecting an Android Software Development Kit (SDK) version, comprising: a memory having instructions stored thereon; and a processor configured to execute the instructions stored on the memory to perform the method according to the above aspect of the present disclosure.
[0006] According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium comprising computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method according to the above aspect of the present disclosure.
[0007] According to the present disclosure, Android SDK version detection and security warning can be achieved. BRIEF DESCRIPTION OF DRAWINGS
[0008] The accompanying drawings, which constitute a part of this specification, illustrate embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure.
[0009] The present disclosure can be more clearly understood with reference to the following detailed description when considered in conjunction with the following drawings, in which:
[0010] Figure 1 A general system framework of an embodiment of the present disclosure is shown.
[0011] Figure 2 is a flowchart showing a method for detecting an Android Software Development Kit (SDK) version according to an embodiment of the present disclosure.
[0012] Figure 3 An exemplary configuration of a computing device that can implement an embodiment according to the present disclosure is shown. DETAILED DESCRIPTION
[0013] The following detailed description is made with reference to the accompanying drawings, and is provided to help a comprehensive understanding of various example embodiments of the present disclosure. The following description includes various details to help the understanding, but these details are only considered as examples, and the present disclosure is defined by the appended claims and their equivalents. The words and phrases used in the following description are used only to enable a clear and consistent understanding of the present disclosure. In addition, descriptions of well-known structures, functions, and configurations can be omitted for clarity and conciseness. Those of ordinary skill in the art will recognize that various changes and modifications can be made to the examples described herein without departing from the spirit and scope of the present disclosure.
[0014] Currently, there is a lack of corresponding specifications and standards in the Android application development process, which may lead to the introduction of insecure third-party SDKs during application development. Therefore, version identification and security alerts for third-party SDKs are essential. For example, a third-party SDK for a payment module might have three versions: XX Payment Module 1.0, XX Payment Module 2.0, and XX Payment Module 2.1, where XX Payment Module represents the SDK name, and 1.0, 2.0, and 2.1 represent the SDK versions. These three versions contain different vulnerabilities due to design flaws. Therefore, it is necessary to check the SDK name and its version to formulate different security strategies. The three versions mentioned above are just an example; there is no limit to the number of versions; there can be any number.
[0015] Figure 1 The overall system framework of an embodiment of the present invention is shown.
[0016] In this invention, an SDK feature library is constructed by extracting features for different versions of each SDK. The SDK feature library includes SDK directory hierarchy features, SDK source code features, and SDK hash code features. The SDK directory hierarchy features include the paths to various files in the SDK, for example, " / sources / com / test.java" in one example. The SDK source code features include at least the classes, functions, and parameters used in the source code. The hash code is obtained by hash-weighting the source code.
[0017] Figure 2 This is a flowchart illustrating a method for detecting the version of an Android software development kit (SDK) according to an embodiment of the present invention.
[0018] like Figure 2 As shown, firstly, in decompilation step S101, the Android installation file to be detected, i.e., the APK, is decompiled into source code and resource files. There are no particular restrictions on decompilation, as long as the APK can be decompiled into source code and resource files.
[0019] Next, in feature generation step S103, based on the decompiled source code and resource files, directory hierarchy features, source code features, and hash signatures of the source code are generated for the APK. Here, the directory hierarchy features include file paths, the source code features include at least the classes, functions, and parameters used in the source code, and the hash signature is obtained by hashing the source code. In one embodiment, the directory hierarchy features of the APK may store the file paths of the decompiled source code and resource files. In another embodiment, the source code features may store the classes, functions, and parameters used in the source code. In one embodiment, regarding hash feature codes, source code packages and import statements can be deleted, comments can be deleted, blank lines can be deleted, function access keywords can be changed to public (i.e., PUBLIC), source code can be changed to lowercase, tokens can be obtained through word segmentation, each token can be used to generate an N-dimensional binary vector using random projection, the term frequency-inverse text frequency (i.e., TF-IDF) weight can be calculated, if the TF-IDF weight is greater than a specified threshold, the binary vector is multiplied by a pre-set truncation weight, otherwise the binary vector is multiplied by the TF-IDF weight to obtain the weighted result of the feature vector, and the weighted result of the feature vector is accumulated to obtain the hash feature code.
[0020] Next, in the SDK name determination step S105, the SDK name inverted index is used to perform a reverse search on the directory hierarchy features to determine the SDK name in the APK. This SDK name inverted index is a pre-built index using the code file paths in the SDK's directory hierarchy features as keys and the SDK name as the value. In one embodiment, the determined SDK name is "XX Payment Module". In other embodiments, a dictionary of {directory hierarchy, SDK package name} can be generated based on the directory hierarchy features, and the dictionary can be searched in reverse to obtain the SDK name.
[0021] Next, in the similarity generation step S107, based on the directory hierarchy structure features, source code features, and hash code of the APK, the similarity of the directory hierarchy structure features, source code features, and hash code between the APK and each version of the SDK determined in the SDK name determination step S105 is calculated. In some embodiments, when calculating the similarity of the directory hierarchy structure features between the APK and each version of the SDK determined in the SDK name determination step S105, the ratio of the total number of files with the same directory hierarchy structure between the APK and each version of the SDK to the total number of files can be used as the directory hierarchy structure feature similarity. In some embodiments, when calculating the similarity of the source code features between the APK and each version of the SDK determined in the SDK name determination step S105, the ratio of the total number of files with the same source code features between the APK and each version of the SDK to the total number of files can be used as the source code feature similarity. In some embodiments, when calculating the hash signature similarity between the APK and each version of the SDK determined in the SDK name determination step S105, the ratio of the Hamming distance to the hash code length between the APK and each version of the SDK can be used as the hash signature similarity. It should be understood that the methods for calculating directory hierarchy feature similarity, source code feature similarity, and hash signature similarity are not limited to the above embodiments, as long as they can represent the degree of similarity between the APK and each version of the SDK determined in the SDK name determination step S105 in terms of directory hierarchy feature, source code feature, and hash signature.
[0022] Next, in the total similarity generation step S109, the total similarity for each version of the SDK is generated based on the directory hierarchy feature similarity, source code feature similarity, and hash feature code similarity. In one embodiment, the directory hierarchy feature similarity, source code feature similarity, and hash feature code similarity can be weighted and summed separately for each version of the SDK to generate the total similarity.
[0023] Next, in the SDK version determination step S111, the SDK version in the APK is determined based on the total similarity in the total similarity generation step S109. In some embodiments, the SDK version with the highest total similarity can be determined as the SDK version in the APK to be detected. For example, if the calculated total similarity of XX payment module 2.0 is the highest, the SDK version in the APK is determined to be XX payment module 2.0. In some embodiments, the maximum value in the total similarity can also be compared with a predetermined threshold. If the maximum value is greater than or equal to the threshold, the SDK version corresponding to the maximum value is determined as the SDK version in the APK to be detected. The threshold can be flexibly set as needed. In some embodiments, the threshold can be, for example, 0.9. For example, if the calculated total similarity of XX payment module 2.0 is the highest and greater than or equal to the threshold (e.g., 0.9), the SDK version in the APK is determined to be XX payment module 2.0. Or, for example, if the calculated total similarity of XX payment module 2.0 is the highest but less than the threshold (e.g., 0.9), the SDK version in the APK to be detected is determined to be not included in the current SDK feature library, and an error is reported to the user.
[0024] According to the present invention, Android SDK version detection and security warning can be realized. After obtaining the name and version of the SDK in the APK, corresponding security strategies can be adopted to improve security.
[0025] Furthermore, according to the present invention, the SDK name can be determined first, and then the SDK version can be determined by comparing / matching between various versions of the APK and the SDK, which can reduce the amount of computation and improve computational efficiency.
[0026] Figure 3 An exemplary configuration of a computing device 1200 capable of implementing embodiments of the present disclosure is shown.
[0027] Computing device 1200 is an example of a hardware device capable of applying the above aspects of this disclosure. Computing device 1200 can be any machine configured to perform processing and / or computation. Computing device 1200 can be, but is not limited to, a workstation, server, desktop computer, laptop computer, tablet computer, personal data assistant (PDA), smartphone, in-vehicle computer, or a combination thereof.
[0028] like Figure 3As shown, computing device 1200 may include one or more components that can be connected to or communicate with bus 1202 via one or more interfaces. Bus 2102 may include, but is not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus. Computing device 1200 may include, for example, one or more processors 1204, one or more input devices 1206, and one or more output devices 1208. The one or more processors 1204 may be any type of processor and may include, but is not limited to, one or more general-purpose processors or special-purpose processors (such as dedicated processing chips). Input devices 1206 may be any type of input device capable of inputting information to the computing device and may include, but is not limited to, a mouse, keyboard, touchscreen, microphone, and / or remote controller. Output devices 1208 may be any type of device capable of presenting information and may include, but is not limited to, a monitor, speaker, video / audio output terminal, vibrator, and / or printer.
[0029] The computing device 1200 may also include or be connected to a non-transitory storage device 1214, which may be any non-transitory storage device capable of storing data, and may include, but is not limited to, disk drives, optical storage devices, solid-state storage, floppy disks, flexible disks, hard disks, magnetic tapes or any other magnetic media, compressed disks or any other optical media, cache memory and / or any other storage chip or module, and / or any other medium from which a computer may read data, instructions and / or code. The computing device 1200 may also include random access memory (RAM) 1210 and read-only memory (ROM) 1212. ROM 1212 may store executable programs, utilities, or processes in a non-volatile manner. RAM 1210 provides volatile data storage and stores instructions related to the operation of the computing device 1200. The computing device 1200 may also include a network / bus interface 1216 coupled to a data link 1218. Network / bus interface 1216 can be any kind of device or system capable of enabling communication with external devices and / or networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication devices and / or chipsets (such as Bluetooth). TM Equipment, 802.11 equipment, WiFi equipment, WiMax equipment, cellular communication facilities, etc.
[0030] This disclosure can be implemented as any combination of apparatus, system, integrated circuit, and computer program on a non-transitory computer-readable medium. One or more processors can be implemented as integrated circuits (ICs), application-specific integrated circuits (ASICs), or large-scale integrated circuits (LSIs), system LSIs, super LSIs, or ultra LSI components that perform some or all of the functions described in this disclosure.
[0031] This disclosure includes the use of software, application programs, computer programs, or algorithms. Software, application programs, computer programs, or algorithms may be stored on a non-transitory computer-readable medium to cause a computer, such as one or more processors, to perform the steps described above and in the accompanying drawings. For example, one or more memories may store the software or algorithm in executable instructions, and one or more processors may be associated with executing a set of instructions of the software or algorithm to provide various functionalities according to embodiments described in this disclosure.
[0032] Software and computer programs (also referred to as programs, software applications, applications, components, or code) include machine instructions for programmable processors and can be implemented in high-level procedural languages, object-oriented programming languages, functional programming languages, logic programming languages, assembly languages, or machine languages. The term "computer-readable medium" means any computer program product, apparatus, or device used to provide machine instructions or data to a programmable data processor, such as magnetic disks, optical disks, solid-state storage devices, memories, and programmable logic devices (PLDs), including computer-readable media that receive machine instructions as computer-readable signals.
[0033] For example, computer-readable media may include dynamic random access memory (DRAM), random access memory (RAM), read-only memory (ROM), electrically erasable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage devices, magnetic disk storage devices or other magnetic storage devices, or any other medium that can be used to carry or store required computer-readable program code in the form of instructions or data structures, and that can be accessed by a general-purpose or special-purpose computer or a general-purpose or special-purpose processor. As used herein, a disk or disc includes compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), floppy disks, and Blu-ray discs, wherein a disk typically copies data magnetically, while a disc copies data optically using a laser. Combinations of the above are also included within the scope of computer-readable media.
[0034] The subject matter of this disclosure is provided as examples of apparatus, systems, methods, and programs for performing the features described herein. However, other features or variations are contemplated in addition to those described above. It is anticipated that the components and functions of this disclosure can be implemented using any emerging techniques that may replace any of the above-described implementations.
[0035] Furthermore, the above description provides examples and does not limit the scope, applicability, or configuration set forth in the claims. Changes may be made to the function and arrangement of the elements discussed without departing from the spirit and scope of this disclosure. Various processes or components may be appropriately omitted, substituted, or added in various embodiments. For example, features described with respect to certain embodiments may be combined in other embodiments.
[0036] Furthermore, in the description of this disclosure, the terms “first,” “second,” “third,” etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance or order.
[0037] Similarly, although the operations are depicted in a specific order in the accompanying drawings, this should not be construed as requiring the operations to be performed in the specific order shown or in sequential order, or requiring the execution of all illustrated operations to achieve the desired result. In some cases, multitasking and parallel processing can be advantageous.
Claims
1. A method for detecting the version of an Android software development kit (SDK), comprising: The decompilation step involves decompiling the Android installation file (APK) to be tested into source code and resource files; The feature generation step generates the directory hierarchy feature, source code feature, and hash feature code of the source code of the APK based on the decompiled source code and resource files. The directory hierarchy feature includes the path of each code file, the source code feature includes at least the classes, functions and parameters used in the source code, and the hash feature code is obtained by hash weighting the source code. The SDK name determination step involves using an inverted index library for SDK names to perform a reverse retrieval of the directory hierarchy features, thereby determining the SDK name in the APK. The inverted index library for SDK names is a pre-built index library that uses the code file path in the directory hierarchy features of the SDK as the key and the SDK name as the value. The similarity generation step calculates the similarity of the directory hierarchy features, source code features, and hash code between the APK and each version of the SDK based on the directory hierarchy features, source code features, and hash code of the source code. The total similarity generation step generates a total similarity for each version of the SDK based on the directory hierarchy feature similarity, the source code feature similarity, and the hash feature code similarity. as well as The SDK version determination step involves determining the SDK version in the APK based on the total similarity.
2. The method according to claim 1, wherein, In the feature generation step, source code packages and import statements are deleted, comments are deleted, blank lines are deleted, the function access keyword is changed to public (PUBLIC), the source code is changed to lowercase, tokens are obtained through word segmentation, and each token is used to generate an N-dimensional binary vector using the random projection method. The term frequency-inverse text frequency (TF-IDF) weight is calculated. If the TF-IDF weight is greater than a specified threshold, the binary vector is multiplied by a pre-set truncation weight; otherwise, the binary vector is multiplied by the TF-IDF weight to obtain the weighted result of the feature vector. The weighted results of the feature vectors are accumulated to obtain the hash feature code.
3. The method according to claim 1, wherein, In the total similarity generation step, the similarity of directory hierarchy features, source code features, and hash feature codes are weighted and summed for each version of the SDK to generate the total similarity.
4. The method according to claim 1, wherein, In the SDK version determination step, the maximum value in the total similarity is compared with a specified threshold. If the maximum value is greater than or equal to the threshold, the SDK version corresponding to the maximum value is determined as the SDK version in the APK.
5. The method according to claim 4, wherein, The threshold is 0.
9.
6. The method according to claim 1, wherein, In the SDK name determination step, a dictionary of {directory hierarchy, SDK package name} is generated based on the directory hierarchy characteristics, and the SDK name is obtained by reverse retrieval of the dictionary.
7. The method according to claim 1, wherein, In the similarity generation step, when calculating the directory hierarchy structure feature similarity between the APK and various versions of the SDK, the ratio of the total number of files with the same directory hierarchy structure between the APK and various versions of the SDK to the total number of files is used as the directory hierarchy structure feature similarity.
8. The method according to claim 1, wherein, In the similarity generation step, when calculating the source code feature similarity between the APK and various versions of the SDK, the ratio of the total number of files with the same source code features between the APK and various versions of the SDK to the total number of files is calculated as the source code feature similarity.
9. The method according to claim 1, wherein, In the similarity generation step, when calculating the hash signature similarity between the APK and each version of the SDK, the ratio of the Hamming distance to the hash code length of the hash signatures of the APK and each version of the SDK is calculated as the hash signature similarity.
10. A device for detecting the version of an Android software development kit (SDK), comprising: A memory that stores instructions; as well as The processor is configured to execute instructions stored in the memory to perform the method according to any one of claims 1 to 9.
11. A computer-readable storage medium comprising computer-executable instructions, which, when executed by one or more processors, cause the one or more processors to perform the method according to any one of claims 1 to 9.