URL merging processing method and device
A processing method and collection technology, applied in the field of information processing, can solve the problems of resource consumption, bandwidth occupation and storage resources, etc., and achieve the effect of reducing bandwidth and storage consumption
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
no. 1 example
[0025] Figure 1a It is a flow chart of a URL merge processing method provided in the first embodiment of the present invention. The method in this embodiment can be executed by a URL merge processing device, which can be implemented by means of hardware and / or software, and can generally be integrated In the server used to complete the URL merge processing function. The method of this embodiment specifically includes:
[0026] 110. Obtain a set of URLs corresponding to the target website.
[0027] Generally speaking, a website is a collection of multiple web pages, and a web page corresponds to an independent URL address. In order to obtain all URL addresses corresponding to a target website (for example, www.baidu.com). In the prior art, the URL set corresponding to the target website can be crawled in the network mainly by means of a web crawler. Wherein, the URL set includes at least one URL address corresponding to a web page in the target website.
[0028] However, o...
no. 2 example
[0066] figure 2 a is a flowchart of a URL merging processing method according to the second embodiment of the present invention. This embodiment is optimized on the basis of the above-mentioned embodiments. In this embodiment, the URL set corresponding to the target website is obtained as follows: according to the browsing log information of the user, the URL set corresponding to the target website is obtained; meanwhile, it is also preferred Including: sequentially obtaining one of the URL merge clusters as a verification cluster; from the verification clusters, obtaining at least two URLs as verification URLs; downloading the webpage content of at least two verification webpages corresponding to the verification URLs; if according to the The content of the webpage, identifying that the webpage structure between the verification webpages is different, then unmerging the URLs in the verification cluster;
[0067] In addition, according to the content of the webpage, identify...
no. 3 example
[0088] Fig. 3 is a flow chart of a URL merging processing method according to the third embodiment of the present invention. This embodiment is optimized based on the above-mentioned embodiments. In this embodiment, according to the data characteristics of the structure value corresponding to the structure identifier, the specific optimization of obtaining the generalization identifier in the structure identifier is as follows: A feature set corresponding to each URL in the set, generating a set of structure values corresponding to each of the structure identifiers; according to the data characteristics of each structure value in the set of structure values, calculating the value of the structure identifier corresponding to the set of structure values A generalization weight value; according to the generalization weight values corresponding to each structural identifier, the generalization identifier in the structural identifier is obtained;
[0089] At the same time, acco...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com