Abnormal application obtaining method, device and system and medium
A technology of application programs and acquisition methods, applied in the computer field, can solve the problems of function and efficiency limitations, inability to adapt to the diversity of black and gray production platforms, etc., and achieve the effect of improving the search rate
Pending Publication Date: 2022-02-15
EVERSEC BEIJING TECH
0 Cites 0 Cited by
AI-Extracted Technical Summary
Problems solved by technology
[0004] However, neither the traditional crawler nor the focused crawler can adapt to the diversity of...
Method used
In the technical scheme of the present embodiment, by obtaining the target URL address parsing template matched with the website uniform resource locator URL address of the abnormal application download platform, and using the target URL address parsing template parsing to obtain the URL corresponding to the website URL address Platform page content; obtain the page decoding result corresponding to the platform page content, and according to the page decoding result, traverse to obtain each download page corresponding to the URL address of the website; extract the download link in each of the download pages, And according to each of the download links, each abnormal application program included in the abnormal application download platform is downloaded to the target storage space, which solves the problem that the traditional acquisition method cannot adapt to the diversity of the abnormal application download platform, and achieves The program sample library provides more analysis samples, which can automatically obtain more abnormal applications from the abnormal application download platform, and improve the search rate of website URL addresses.
The benefit of setting template storehouse is, can constantly enrich perfect template storehouse and make an application program acquisition method not limited to a certain abnormal application download platform, can automatically obtain the abnormal application program of more abnormal application download pl...
Abstract
The invention discloses an abnormal application obtaining method, device and system and a medium. The method comprises the following steps: acquiring a target URL (Uniform Resource Locator) address analysis template matched with a website URL address of an abnormal application downloading platform, and performing analyzing by using the target URL address analysis template to obtain platform page content corresponding to the website URL address; obtaining a page decoding result corresponding to the platform page content, and performing traversing to obtain each downloading page corresponding to the website URL address according to the page decoding result; extracting downloading links in all the downloading pages, and downloading all the abnormal application programs included in the abnormal application downloading platform to a target storage space according to all the downloading links. According to the technical scheme, the effects that more analysis samples are provided for the abnormal application program sample library, more abnormal application programs of the abnormal application downloading platform can be automatically obtained, and the website URL address search rate is increased are achieved.
Application Domain
Web data indexingSoftware testing/debugging +3
Technology Topic
Information retrievalAddress resolution +5
Image
Examples
- Experimental program(6)
Example Embodiment
[0032] Example one
[0033] figure 1 The flow chart of the acquisition method of an exception application provided by the embodiment of the present invention, the present embodiment can be applied to the case where an exception application is automatically acquired an exception application, which can be performed by an exception application acquisition device. The device can be implemented in a software and / or hardware. The device can be configured in a server, and the method includes:
[0034] S110, obtain the target URL address parsing template that matches the website unified resource locator URL address with the exception application download platform, and use the target URL address parsing template to obtain the platform page content corresponding to the website URL address.
[0035] Among them, the abnormal application download platform can be a web page containing illegal violation applications, for example, a phishing website containing telecom fraud. The target URL address parsing template can be a template for parsing the target URL address for describing the standardization format of the target URL address. The platform page content can be the specific content information included in one or more pages included in the platform page, for example, the applications included in the page and other information.
[0036] Specifically, according to the URL address of the exception application download platform, the target URL address parsing template that matches thereof is obtained, and the URL address of the exception application download platform can be filled according to the format of the target URL address parsing the template, parse the exception application download platform. The URL address can read the platform page content.
[0037] In an alternative embodiment of the present invention, the target URL address parsing template that matches the website URL address may include: determining whether there is a target URL address parsing with the website URL address in the URL address parsing template library. Template; The target URL address parsing template and stores the target URL address parsing template in the URL address parsing template library.
[0038] The URL address parsing template library can be a database that is pre-built containing various exception applications download platform URL address templates. The URL address parsing template generating platform can be a platform for generating unknown exception applications download platform URL addresses parsing templates, for example, can generate a corresponding URL address resolution template by manually analyzing the URL address feature.
[0039] Optional, you can adapt the website URL address of the exception application download platform with the URL address parsing template library, determine whether there is a target URL address parsing template that matches the website URL address in the URL address parsing template library. If the URL address parsing template library is adaptable to the website URL address, you can obtain the target URL address parsing template that matches the website URL address from the URL address parsing template library. Otherwise, the website URL address can be sent to the URL address parsing template generation platform (For example, an artificial generating platform), it will further obtain the target URL address resolution template and store the target URL address parsing template in the URL address parsing template library.
[0040]S120, obtain the page decoding result corresponding to the platform page content, and decoding the result according to the page, spread each download page corresponding to the website URL address.
[0041] Among them, the page decoding result can be a page encoding of an exception application download platform. The download page can be the download interface of each application included in an exception application download platform.
[0042] In the embodiment of the present invention, the page content of the abnormality application download platform can be parsed, and the decoding result of the page corresponding to the page can be obtained according to the page decoding result, and the exception application download platform page can be traversed to obtain the download page of each application.
[0043] In an alternative embodiment of the present invention, the page decoding result corresponding to the platform page content may include parsing the platform page content and acquires page coding results of the platform page content; The page character set is extracted in the page coding result, and the page encoding is transcoded by setting the page character set, forming the page decoding result corresponding to the platform page content.
[0044] Where the page coding result can be an encoding result that is preliminarily parsed for an exception application download platform page, for example, GBK (Chinese Internal Code Specification). The page decoding result can be the result of the acquired page encoding transcoding.
[0045] Optional, you can analyze the exception application download platform page content, get the corresponding page coding; further page character sets can be extracted from the acquired page coding, you can use the page character set to transfer the acquired page coding, thus forming The exception application download platform page content page decoding result, exemplary, can set the character set through the HTML META label. The advantage of this setting is that the probability of obtaining data garbled during the page encoding process can be greatly reduced.
[0046] S130, extract the download links in each download page, and download each of the exception applications included in the exception application download platform to the target storage space based on the download links described.
[0047] The target storage space can be a database that stores the acquired exception application, for example, a MongoDB database.
[0048] Specifically, each application download link can be extracted from the download pages corresponding to the website URL address of the abnormality application download platform, depending on the download link, download the exception applications included in the exception application download platform, and store it to the target storage space.
[0049] Optionally, extract the download link in each download page, which can include: obtaining a target download page parsing template corresponding to the target download page for the current processing target download page; Extract the matching download link from the target download page.
[0050] Among them, the target download page can be the download interface of the application that is currently being processed. The target download page parsing template can be a template for parsing the target download page for describing the standardized format of the target download page.
[0051] In the embodiment of the present invention, the target download page parsing template that matches thereof can be obtained according to the target downloaded page currently being processed, and the Download Link page is parsed according to the target download page parsing template to extract the download link.
[0052] In an alternative embodiment of the present invention, a target download page parsing template corresponding to the target download page is obtained for the current processing target download page, which may include: determining whether the download page parses the template library exists with the target Download the page matching target download page parsing template; The download page parsing template generates the target download page parsing template, and stores the target download page parsing template in the download page parsing template library.
[0053] Where the download page parsing the template library can be a database that includes various exception applications download page templates. The download page parsing template generation platform can be a platform for generating an unknown exception application download page, for example, can generate a corresponding download page resolution template by manually analyzing the download page feature.
[0054] Optional, you can adapt the exception application download page with the download page parsing the template library, and determine whether the download page parses the Target download page parsing template with the target download page. If the download page parsing the template library can be downloaded with the target download page, you can get the target download page parsing template from the Download Page Resolution Template library, otherwise, you can send the target download page to the download page parsing template generation platform (For example, manual generating platform), then obtain the target download page parsing template and store the generated target download page resolution template in the Download Page Resolution Template Library.
[0055] The advantage of setting the template library is that you can improve the template library to make an application acquisition method is not limited to an exception application download platform, you can automatically obtain more abnormal application download platform exception applications, enhance the search for website URL addresses Rate.
[0056] The technical solution of this embodiment, by obtaining a target URL address parsing template that matches the website unified resource locator URL address of the abnormality application download platform, and use the target URL address parsing template to obtain a platform page corresponding to the website URL address. Get the page decoding result corresponding to the platform page content, and decode the result according to the page, spread the downloaded page corresponding to the website URL address; extract the download links in the download page, and according to each The download link, download the exception application included in the exception application download platform to the target storage space, solves the problem of traditional acquisition methods that cannot accommodate the diversity of exception application download platform, reaching the library of abnormal applications Provide more analysis, automatically obtain more abnormal application download platforms, enhance the effect of the website URL address search rate.
Example Embodiment
[0057] Example 2
[0058] figure 2 A flow chart of acquisition method of a distributed abnormality application according to Embodiment 2 of the present invention, the present embodiment, on the basis of the above-described embodiments, the acquisition method of the exception application described above can be performed. This embodiment can be applied to the case where the abnormal application of the plurality of abnormal application download platforms simultaneously, the method can be performed by the acquisition device of the distributed exception application, which can be configured in the server, and the method includes:
[0059] S210, obtain the website unified resource locator URL address of each anomaly application download platform, and store the website URL address to the message queue.
[0060] Where the message queue can be a container that stores a website URL address.
[0061] Optionally, you can get the website URL address of each anomaly application download platform and store it into the message queue.
[0062] S220, sequentially acquire the website URL addresses included in the message queue to each distributed processing node.
[0063] Among them, the distributed processing node can be a plurality of dispersed network reptiles in a distributed cluster.
[0064] Optionally, you can sequentially obtain each site URL address from the message queue to each processing node in the distributed cluster.
[0065] S230, through each distributed processing node, an acquisition method of an abnormal application as described in any of the present invention is performed according to the received website URL address.
[0066] Alternatively, each distributed processing node can perform a method of acquiring an abnormal application as described in any of the present invention in accordance with the received website URL address.
[0067] The advantage of this setting is that the mode of the distributed cluster is added to the message queue, and can maximize the speed of the reptiles to improve the number of reptiles.
[0068] The technical solution of the present embodiment, by obtaining the website unified resource locator URL address of each anomaly application download platform, and stores the website URL address to the message queue; sequentially acquires the website URL included in the message queue The address is distributed to each distributed processing node; through each distributed processing node, according to the received website URL address, the acquisition method of an abnormal application as described in any of the present invention is performed, and the traditional crawler and focusing crawler cannot The diversity of the unusual application download platform leads to the limitations of the function and efficiency of the crawling website, reaching the maximum rate of reptiles, and improve the number of processing websites.
[0069] In an alternative embodiment of the present invention, while the website URL address is stored to the message queue, it is also possible to include: add a task ID to each of the website URL addresses, and store each of the task IDs. After sequentially acquiring each of the website URL addresses to each distributed processing node, it is also possible to include: using the currently stored task ID, query the processing status of the Site URL address Each of the exception applications corresponding to the current processing completed website URL address is obtained from the target storage space.
[0070] Wherein, the task identification can be a symbol that can identify task identities. The processing state may refer to the execution state of the acquisition method of each distributed processing node to perform an exception application, and the processing status may include processed and untreated.
[0071] Specifically, you can add a task ID for each site URL address stored in the message queue, and store the task ID, for example, the task ID can be stored in the REDIS (Remote Dictionary Server, the Remote Dictionary Services) database; you can pass each task ID program.
[0072] image 3 A specific application scenario work flow diagram of a distributed abnormal application is provided for an embodiment of the present invention. Decline whether the black gray production platform to be climbed is survived by the terminal server, filter out the inactivated platform; store the URL address of the survival platform into the message queue, add task ID to each platform URL address in the message queue and saved in Redis In the database; distribute each URL address in the message queue to a distributed reptile, identify the platform URL address by the URL address template library; Download the page, otherwise add the template to the URL address template library, get the page encoding and transfer, by download page template library identification application download page; if you can identify the application download page template, extract the download link, download and analyze the Android app Android Application Package, APK information, otherwise add the download page template to the download page template library, extract the download link, perform download analysis APK information; to store the APK information into the MongoDB database is extracted by the terminal server.
[0073] Figure 4A specific application scenario is provided for the embodiment of the present invention provides a specific application scenario of acquisition method of distributed abnormal applications. The terminal server filter out the unoccupied black gray production platform, store the URL address of the survival platform into the message queue, and add task IDs to the task in the message queue and store the Redis database, and put the task in the message queue. Download to the distributed cluster, the Internet crawler traversal download page extracts the application download link, download the analysis APK information, store the results to the MongoDB database, further poll the task ID, see if the task is completed, the completed task can be identified according to the task ID Find results from MongoDB database.
Example Embodiment
[0074] Example three
[0075] Figure 5 The schematic structural diagram of the acquisition device of the abnormality application provided by the third embodiment of the present invention, the apparatus can perform acquisition methods of an abnormal application involved in the above embodiments. Refer image 3 The apparatus includes: a platform page content acquisition module 310, a download page traversal module 320, and an exception application download module 330.
[0076] The platform page content acquisition module 310 can be used to obtain a target URL address parsing template that matches the website unified resource locator URL address of the abnormality application download platform, and uses the target URL address parsing template to obtain a platform corresponding to the website URL address. Page content;
[0077] Download the page traversal module 320, can be used to obtain a page decoding result corresponding to the platform page content, and spread the following, according to the page decoding result, traversed each download page corresponding to the website URL address;
[0078] The exception application download module 330 can be used to extract the download links in each of the downloads, and download each of the exception applications included in the exception application download platform to the target storage space according to the download links described.
[0079] The technical solution of this embodiment, by obtaining a target URL address parsing template that matches the website unified resource locator URL address of the abnormality application download platform, and use the target URL address parsing template to obtain a platform page corresponding to the website URL address. Get the page decoding result corresponding to the platform page content, and decode the result according to the page, spread the downloaded page corresponding to the website URL address; extract the download links in the download page, and according to each The download link, download the exception application included in the exception application download platform to the target storage space, solves the problem of traditional acquisition methods that cannot accommodate the diversity of exception application download platform, reaching the library of abnormal applications Provide more analysis, automatically obtain more abnormal application download platforms, enhance the effect of the website URL address search rate.
[0080] In the above apparatus, it is optionally, the platform page content acquisition module 310 can be specifically used:
[0081] Determine whether there is a target URL address parsing template that matches the website URL address in the URL address parsing template library;
[0082] If so, obtain the target URL address resolution template from the URL address parsing template library;
[0083] Otherwise, send the website URL address to the URL address resolution template generation platform; obtain the target URL address parsing template for the URL address parsing template generating platform feedback, and stores the target URL address parsing template to the URL Address parsing template library.
[0084] In the above apparatus, it is optionally, the download page traversal module 320 can be specifically used for:
[0085] The platform page content is parsed and the page coding result of the platform page content is obtained;
[0086] The page character set is extracted in the page coding result, and the page coding is transcoded by setting the page character set, forming the page decoding result corresponding to the platform page content.
[0087] In the above apparatus, it is optionally, the abnormal application download module 330, which may include:
[0088] The target download page parsing template acquisition unit is used to obtain a target download page parsing template corresponding to the target download page for the current processing target download page.
[0089] Download the link extracting unit for extracting the matching download link from the target download page based on the target download page resolver template.
[0090] In the above apparatus, it is optionally, the target download page resolves the template acquisition unit, which can be specifically used for:
[0091] Decision Download Page Resolution Target Detailed Target Detailed Page Resolution Template in the Target Load page.
[0092] If so, obtain the target download page parsing template from the download page parsing template library;
[0093] Otherwise, send the target download page to the download page parsing template generation platform; obtain the target download page parsing template feedback from the download page parsing template generation platform, and store the target download page parsing template to the download Page parsing template library.
[0094] The acquisition device of the abnormality application provided in the embodiment of the present invention can perform an acquisition method of an abnormal application provided by any of the present invention, and has a functional module and a beneficial effect of performing a method.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.