Method and equipment for crawling page
Patent Information
- Authority / Receiving Office
- CN ยท China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING BAIDU NETCOM SCI & TECH CO LTD
- Publication Date
- 2013-07-31
- Estimated Expiration
- Not applicable ยท inactive patent
Smart Images
Figure 1 Figure 2 Figure 3
Abstract
Description
technical field
[0001] The invention relates to the technical field of the Internet, in particular to a technology for crawling pages. Background technique
[0002] The current method for crawling web pages is to use a random breadth-first strategy. Therefore, for directional crawling, there are problems such as slow diffusion speed, difficult control of the diffusion direction and diffusion speed, and difficulty in spreading to the desired page within the desired time. For example, when crawling data in a vertical site, if the various dimensions of the data are distributed on different pages, there will be serious incomplete data crawling; at the same time, since the crawling of the current data cannot be recorded during the crawling process Therefore, for the incomplete data after crawling, it is impossible to judge whether the incompleteness of the data is due to the incompleteness of the data itself, or the crawling of the page has not been completed. Contents of the ...