Network data crawling method and apparatus
A network data and website technology, applied in the Internet field, can solve the problems of inability to crawl network data, low efficiency of crawling servers to crawl network data, etc., and achieve the effect of improving efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0022] Embodiments of the present invention provide a method for crawling network data, such as figure 1 As shown, the processing flow of the method may include the following steps:
[0023] Step 101 , according to a preset polling sequence, select domain names to be crawled one by one from a pre-stored domain name queue.
[0024] Step 102, after each domain name to be crawled is selected, if the time interval between the last crawled time of the selected domain name and the current time exceeds the preset time interval threshold, extract from the URL queue corresponding to the selected domain name The website to be crawled, crawl the network data of the website to be crawled, if the time interval between the last crawled time of the selected domain name and the current time does not exceed the preset time interval threshold, select the next one to be crawled domain name.
[0025] In the embodiment of the present invention, according to the preset polling order, the domain n...
Embodiment 2
[0027] An embodiment of the present invention provides a method for crawling network data, and the execution body of the method is a crawling server. Wherein, the crawling server may be a background server of a browser or a background server of a website, and the crawling server may be a single server or a server group composed of multiple servers.
[0028] The following will combine specific implementation methods, figure 1 The processing flow shown is described in detail, and the content can be as follows:
[0029] Step 101 , according to a preset polling sequence, select domain names to be crawled one by one from a pre-stored domain name queue.
[0030] In implementation, technicians may pre-store domain names of multiple websites in the crawling server, and these domain names may be stored in the form of a domain name queue according to a preset polling sequence. The crawling server can also store a plurality of URLs under the domain name corresponding to each website do...
Embodiment 3
[0051] Based on the same technical idea, the embodiment of the present invention also provides a device for crawling network data, such as image 3 As shown, the device includes:
[0052] The selection module 310 is used to select the domain names to be crawled one by one in the pre-stored domain name queue according to the preset polling sequence;
[0053] The crawling module 320 is configured to select a domain name to be crawled each time, if the time interval between the last crawled time of the selected domain name and the current time exceeds a preset time interval threshold, then the selected domain name Extract URLs to be crawled from the corresponding URL queue, and perform network data crawling on the URLs to be crawled, if the time interval between the last crawled time of the selected domain name and the current time does not exceed the preset time interval threshold, select the next domain name to be crawled.
[0054] Optionally, the crawling module 320 is confi...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 