Method and device for improving quality of crawler proxy and computer readable storage medium

A storage medium and quality technology, applied in computing, multi-programming devices, program control design, etc., can solve the problems of inability to identify the quality and usable status of agents, waste of network and server resources, and failure of crawler requests to reduce occupation. , Effective management, ensure the effect of success

Active Publication Date: 2019-08-20
重庆八戒传媒有限公司
View PDF14 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

A web crawler is a web browsing robot. Each time a crawler program requests a proxy pool to distribute a new proxy, the crawler receives the distributed proxy and directly requests network resources from the target server. The quality and availability of the proxy cannot be identified, resulting in a large number of crawlers. The request fails, and the request continues frequently after the failure, causing most of the waste of network and server resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for improving quality of crawler proxy and computer readable storage medium
  • Method and device for improving quality of crawler proxy and computer readable storage medium
  • Method and device for improving quality of crawler proxy and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Embodiments of the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings. The following examples are only used to illustrate the technical solutions of the present invention more clearly, and therefore are only examples, rather than limiting the protection scope of the present invention.

[0041] It should be noted that, unless otherwise specified, the technical terms or scientific terms used in this application shall have the usual meanings understood by those skilled in the art to which the present invention belongs.

[0042] Such as figure 1 As shown, the present invention proposes a kind of method that promotes crawler agent quality, comprises the following steps:

[0043] Set different priorities for several proxy pools;

[0044] The agent pool is invoked based on the agent pool priority and its utilization rate.

[0045] Such as figure 2 As shown, the present invention proposes a kind of m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for improving the quality of a crawler proxy and a computer readable storage medium. The method for improving the quality of the crawler proxy comprises: setting different priorities for a plurality of proxy pools; and calling the proxy pool based on the proxy pool priority and the utilization rate of the proxy pool priority. The proxy pools are reasonably graded, the proxies are classified and put into the proxy pools according to the high-quality degree, aproxy pool switching is carried out according to the proxy utilization rate of the proxy pools, effective management of the network proxies is achieved, the utilization rate of the high-quality proxies is increased to the maximum, and the network resource obtaining efficiency of the crawler program is improved; the available state of the agent is detected, the success of the request is ensured, unavailable agents are excluded, the occupation of network resources is reduced to a certain extent, and the damage of the network request to the target server is reduced.

Description

technical field [0001] The invention relates to the field of computer software, in particular to a method, device and computer-readable storage medium for improving the quality of reptile agents. Background technique [0002] In the rapidly developing Internet era, users efficiently collect public network data through web crawlers, but the continuous collection of network data by a large number of web crawlers consumes a lot of network resources and puts a lot of pressure on normal website servers. Therefore, many websites have adopted anti-pickup technology, which does not allow the same IP address to make high-frequency requests to the website, and limits the access speed of crawlers. In order to deal with the anti-pickup technology, the crawler program starts to use a proxy to request the target server to download normal web pages. [0003] Most of the existing technologies use a common proxy pool to supply crawler calls. A web crawler is a web browsing robot. Each time...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48G06F9/50G06F16/951
CPCG06F9/4881G06F9/5038G06F16/951
Inventor 刘希龙
Owner 重庆八戒传媒有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products