Unlock instant, AI-driven research and patent intelligence for your innovation.

Efficient crawler method based on IP

A crawler and high-efficiency technology, applied in the field of high-efficiency crawlers based on IP, can solve the problems of low IP utilization rate, achieve the effect of improving utilization and efficiency, improving crawler efficiency, and saving time for frequent IP switching

Inactive Publication Date: 2019-08-20
上海睿翎法律咨询服务有限公司
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The conventional method in the prior art is to take any IP to request the target website, and then switch to another IP until the data cannot be obtained due to frequent use, which will lead to low IP usage

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient crawler method based on IP

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to describe the technical content of the present invention more clearly, further description will be given below in conjunction with specific embodiments.

[0028] This efficient crawler method based on IP of the present invention, comprises:

[0029] (1) Obtain the proxy IP, put the IP into the availability detection queue, request the server built locally, and put the high-quality proxy IP into the common IP pool;

[0030] (1.1) Obtain the proxy IP and put the IP into the availability detection queue;

[0031] (1.2) Request the server built locally, and judge whether the server response can be obtained within 2 seconds. If so, this IP is a high-quality proxy, add the target website quality detection queue, put it into the common IP pool, and continue to step (2); Otherwise, the IP is a non-high-quality agent, and is put into the usability detection queue again;

[0032] (1.3) Judging whether the number of times that the IP is put into the availability detec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an efficient crawler method based on an IP, and the method comprises the following steps: (1) obtaining a proxy IP, putting the IP into an availability detection queue, requesting a server built locally, and putting a high-quality proxy IP into a common IP pool; (2) according to an actual collection task, an IP pool used by an appointed website is formulated; and (3) deleting invalid IPs for the IP request server of each agent pool. Using the efficient crawler method based on the IP, different websites according to collection, IP pools used by different specified websites are added. Different IP pools are used for different websites, IPs can be utilized to the maximum extent, the time for frequently switching the IPs due to the fact that data cannot be obtained issaved, the crawler efficiency is greatly improved, the problem can be well solved by monitoring the IP pool used by a designated website, and the utilization rate and the efficiency are improved.

Description

technical field [0001] The present invention relates to the field of data collection, in particular to the field of IP usage, and specifically refers to an IP-based high-efficiency crawling method. Background technique [0002] The conventional method in the prior art is to take any IP to request the target website, and then switch to another IP until the data cannot be obtained because of frequent use, which will lead to a low IP usage rate. Contents of the invention [0003] The purpose of the present invention is to overcome the above-mentioned shortcoming of prior art, provide a kind of efficient crawler method based on IP with high efficiency, high utilization rate, easy and simple operation. [0004] In order to achieve the above object, the efficient crawler method based on IP of the present invention is as follows: [0005] The main feature of this efficient crawler method based on IP is that the system includes: [0006] (1) Obtain the proxy IP, put the IP into ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/12H04L29/08G06F16/951
CPCH04L67/02G06F16/951H04L61/5007H04L61/5061H04L61/59
Inventor 张臣
Owner 上海睿翎法律咨询服务有限公司