Method and system for automatic reconstruction of website site map

A site map and automatic reconstruction technology, which is applied in the directions of network data query, network data retrieval, network data navigation, etc., can solve the problem that the site map is not timely and comprehensive, and achieve the effect of improving SEO friendliness

Active Publication Date: 2019-07-30
BEIJING UCAP INTERNET TECH +1
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Site map generation methods, such as online generation, software generation, etc., but the site map is not timely and comprehensive enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for automatic reconstruction of website site map
  • Method and system for automatic reconstruction of website site map

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] figure 1 It is a flow chart of a method for automatic reconstruction of a website site map according to the present invention, and the method specifically includes the following steps:

[0035] S1. Collection of website pages: from the homepage of the website, sequentially collect website pages in a breadth-first manner, and at most N layers (for small websites, N=4; for large websites, N=5). Note that for large-scale commercial websites, attention should be paid to shielding a large number of user communication areas such as bbs, so as to avoid crawlers collecting a large amount of waste on invalid web pages.

[0036] S2. For each collected webpage, carry out digital identification extraction, obtain the unique digital identification DOM_ID of each webpage, and save and classify and save in the form of key-value pairs , and obtain the website web page information set MAP, Among them, DOM_ID is the unique digital identifier of the webpage, PAGEs is the description inf...

Embodiment 2

[0052] figure 2 A system for automatic reconstruction of a website site map provided by the present invention, the system specifically includes the following contents:

[0053] Web page collection module;

[0054] Website page information collection generation module: for each collected webpage, perform digital identification extraction to obtain the unique digital identification DOM_ID of each webpage, and save and classify them in the form of key-value pairs , and obtain the website Web page information collection MAP, wherein, PAGEs is a list of description information of web pages; each item in the list is a PAGE, PAGE is a description of web page information, PAGE=[url, anchor, depth, referer], url is a web page link, referer It is the url of the upper layer web page linked to the current page, anchor is the text anchor point of the current page on the referer page, and depth is the depth of the current web page;

[0055] The column object list determination module of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for automatically reconstructing a website site map. The method specifically comprises the following steps: S1, collecting website page; S2, extracting the digital identifier from each collected web page to obtain the unique digital identifier DOM_ID of each web page, and storing the unique digital identifier DOM_ID: PAGEs in a key value pair mode to classify and save the unique digital identifier DOM_ID: PAGEs to obtain the web page information set MAP of the web site; 3, statistically analyze that MAP of the web page information set of the web site by using the judgment rule, and determining the column object list COLUMNs of the web site; S4, for the column object list COLUMNs determined in the step S3, the column tree is reconstructed through the column hierarchical relationship to obtain a complete site map. In addition, the invention also provides a system for automatically reconstructing a website site map. Through the technical proposal of the invention, the site map of the website is automatically constructed, so that the crawler can collect the key column pages of the website in time and comprehensively, so as to collect more articles with fewer resources, improve the SEO friendliness of the website and bring more users to the website.

Description

technical field [0001] The invention belongs to the technical field of Internet information collection, and in particular relates to a method and system for automatic reconstruction of a website site map. Background technique [0002] A site map is a navigation web page file generated according to the structure, framework, and content of a website. It is generally stored in the root directory and named sitemap. A site map is a container for all links on a website. Because many websites have deep connection levels, it is difficult for crawlers to crawl. Through the site map, you can clearly understand the structure of the website, which is convenient for crawlers to crawl website pages. The site map of a website is very important for users to browse the web and search engine indexing. Search engines such as Baidu and Google hope that each website provides a clear site map. With a site map, web crawlers can reduce the number of collections and reduce the pressure on the websi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/954G06F16/953
Inventor 汪敏刘鹏飞李伦凉李绪祥尹娜
Owner BEIJING UCAP INTERNET TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products