Method and system for automatically reconstructing site map of website

A sitemap, automatic reconstruction technology, applied in special data processing applications, instruments, electronic digital data processing and other directions, can solve the problem that the site map is not timely and comprehensive enough, and achieve the effect of improving SEO friendliness

Active Publication Date: 2018-12-21
BEIJING UCAP INTERNET TECH +1
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Site map generation methods, such as online generation, software gene

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for automatically reconstructing site map of website
  • Method and system for automatically reconstructing site map of website

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] figure 1 It is a flowchart of a method for automatically reconstructing a site map of the present invention. The method specifically includes the following steps:

[0035] S1. Collection of website pages: collecting website pages in order from the homepage of the website in a breadth-first manner, collecting at most N levels (for small websites, N=4; for large websites, N=5). Attention should be paid to large-scale commercial websites, shielding a large number of user communication areas such as bbs, and avoiding large amounts of crawler collection and waste on invalid web pages.

[0036] S2. Perform digital identification extraction for each collected webpage to obtain the unique digital identification DOM_ID of each webpage, and use the key-value pair Save in the way of categorization to obtain the website page information collection MAP, where DOM_ID is the unique digital identifier of the page, PAGEs is the description information list of the page, each item in the list ...

Embodiment 2

[0052] figure 2 A system for automatically reconstructing a website site map provided by the present invention, the system specifically includes the following contents:

[0053] Website webpage collection module;

[0054] Website webpage information collection generation module: extract the digital identification of each collected webpage, obtain the unique digital identification DOM_ID of each webpage, and use the key-value pair Save in the way of categorization to obtain the website page information collection MAP, where PAGEs is a list of description information of the page; each item in the list is a PAGE, PAGE is a description of the page information, PAGE=[url, anchor, depth ,Referer], url is a webpage link, referer is the url of the previous webpage linking to the current page, anchor is the text anchor of the current page on the referer page, depth is the depth of the current webpage;

[0055] Column object list determination module of the website: use the judgment rules t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for automatically reconstructing a website site map. The method specifically comprises the following steps: S1, collecting website page; S2, extracting the digital identifier from each collected web page to obtain the unique digital identifier DOM_ID of each web page, and storing the unique digital identifier DOM_ID: PAGEs in a key value pair mode to classify and save the unique digital identifier DOM_ID: PAGEs to obtain the web page information set MAP of the web site; 3, statistically analyze that MAP of the web page information set of the web site by using the judgment rule, and determining the column object list COLUMNs of the web site; S4, for the column object list COLUMNs determined in the step S3, the column tree is reconstructed through the column hierarchical relationship to obtain a complete site map. In addition, the invention also provides a system for automatically reconstructing a website site map. Through the technical proposal of the invention, the site map of the website is automatically constructed, so that the crawler can collect the key column pages of the website in time and comprehensively, so as to collect more articles with fewer resources, improve the SEO friendliness of the website and bring more users to the website.

Description

Technical field [0001] The invention belongs to the technical field of Internet information collection, and in particular relates to a method and system for automatic reconstruction of a website site map. Background technique [0002] A site map is a navigation web page file generated based on the structure, frame, and content of the website. It is generally stored in the root directory and named sitemap. The site map is a container for all links to a website. Due to the deep connection level of many websites, it is difficult for crawlers to capture. The site map can clearly understand the structure of the website and facilitate crawlers to crawl website pages. The site map of a website plays a very important role for users to browse the web and indexed by search engines. Search engines such as Baidu and Google all hope that each website provides a clear site map. With a site map, web crawlers can reduce the number of collections and reduce the pressure on the website. At the sa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 汪敏刘鹏飞李伦凉李绪祥尹娜
Owner BEIJING UCAP INTERNET TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products