Unlock instant, AI-driven research and patent intelligence for your innovation.

Crawler realization method and system capable of breaking through IP limit

An implementation method and system technology, which is applied in the crawler implementation method and system field that breaks through IP restrictions, can solve the problems of uneconomical and high cost, and achieve the effect of low cost

Inactive Publication Date: 2017-01-11
银恭敬
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although this can solve the problem, it is very expensive and uneconomical

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Crawler realization method and system capable of breaking through IP limit

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] Such as figure 1 As shown, this crawler implementation method for breaking through IP restrictions includes the following steps:

[0020] (1) The crawler scheduling server sends a crawling task, which includes the task ID, the URL of the HTTP request, all parameters, and the longest waiting time;

[0021] (2) After the client receives the grab task, it immediately initiates an HTTP request to grab the corresponding page;

[0022] (3) The page capture is completed, check whether the maximum waiting time is exceeded, if the maximum waiting time is not exceeded, step (4) is executed, otherwise step (1) is executed;

[0023] (4) Send the captured data to the crawler scheduling server, and mark the task ID at the same time, and the captured data is the string returned by the HTTP response.

[0024] The present invention sends the task of grabbing pages to the client (for example, the APP installed on the user's mobile phone), and breaks through the limit by the huge number...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A crawler realization method capable of breaking through IP limit comprises the following steps: 1) a crawler scheduling server issues a capture task, wherein the capture task comprises task ID, URL of an HTTP request and all parameters and longest waiting time; 2) after receiving the capture task, a client immediately initiates the HTTP request to capture corresponding pages; 3) after page capture is finished, a detection detects whether the longest waiting time is surpassed, if not, the step 4) is carried out, or otherwise, the step 1) is carried out; and 4) a sending module sends the captured data to the crawler scheduling server, and meanwhile, labels the task ID, wherein the captured data is a character string returned by HTTP response. The invention also provides a crawler realization system capable of breaking through the IP limit.

Description

technical field [0001] The invention belongs to the technical field of web crawlers, and in particular relates to a crawler implementation method and system that breaks through IP restrictions. Background technique [0002] A web crawler (also known as a web spider, a web robot, and more often referred to as a web chaser in the FOAF community) is a program or script that automatically grabs information on the World Wide Web according to certain rules. [0003] Capturing users' credit data on the Internet is an important means of credit rating. For example, the transaction records captured from the Alipay website can reflect the user's economic strength from the side. However, when capturing this information, we also encountered artificial technical obstacles. [0004] Some websites impose IP restrictions in order to prevent crawlers from grabbing information. For example, a single IP can only be accessed 100 times per minute, so a crawler server can only initiate 100 netwo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08G06F17/30
CPCH04L67/02G06F16/951H04L67/60
Inventor 周灏董超
Owner 银恭敬