Method and system for automatic reconstruction of website site map
A site map and automatic reconstruction technology, which is applied in the directions of network data query, network data retrieval, network data navigation, etc., can solve the problem that the site map is not timely and comprehensive, and achieve the effect of improving SEO friendliness
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0034] figure 1 It is a flow chart of a method for automatic reconstruction of a website site map according to the present invention, and the method specifically includes the following steps:
[0035] S1. Collection of website pages: from the homepage of the website, sequentially collect website pages in a breadth-first manner, and at most N layers (for small websites, N=4; for large websites, N=5). Note that for large-scale commercial websites, attention should be paid to shielding a large number of user communication areas such as bbs, so as to avoid crawlers collecting a large amount of waste on invalid web pages.
[0036] S2. For each collected webpage, carry out digital identification extraction, obtain the unique digital identification DOM_ID of each webpage, and save and classify and save in the form of key-value pairs , and obtain the website web page information set MAP, Among them, DOM_ID is the unique digital identifier of the webpage, PAGEs is the description inf...
Embodiment 2
[0052] figure 2 A system for automatic reconstruction of a website site map provided by the present invention, the system specifically includes the following contents:
[0053] Web page collection module;
[0054] Website page information collection generation module: for each collected webpage, perform digital identification extraction to obtain the unique digital identification DOM_ID of each webpage, and save and classify them in the form of key-value pairs , and obtain the website Web page information collection MAP, wherein, PAGEs is a list of description information of web pages; each item in the list is a PAGE, PAGE is a description of web page information, PAGE=[url, anchor, depth, referer], url is a web page link, referer It is the url of the upper layer web page linked to the current page, anchor is the text anchor point of the current page on the referer page, and depth is the depth of the current web page;
[0055] The column object list determination module of ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com