Data crawling method and device, computer equipment and storage medium

A technology for data and data reception, applied in the Internet field, can solve the problem of inability to obtain data in time, and achieve the effect of reducing interception and realizing data crawling operations.

Active Publication Date: 2019-03-01
ONE CONNECT SMART TECH CO LTD SHENZHEN
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Based on this, it is necessary to provide a data crawling method, device, computer equipment and storage medium that can provide effective IP addresses for web crawlers in view of the problem that the IP address of the web crawler is blocked and the data cannot be obtained in time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data crawling method and device, computer equipment and storage medium
  • Data crawling method and device, computer equipment and storage medium
  • Data crawling method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0066] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

[0067] The data crawling method provided by this application can be applied to such as figure 1 shown in the application environment. Wherein, the terminal 102 communicates with the server 104 through the network. The server 104 receives the data crawling request sent by the terminal 102, and obtains the parameter value of the user agent of normal access according to the data crawling request, sets the value of the user agent of the web crawler as the parameter value, obtains the available web crawler, and utilizes the available network The crawler grabs valid IP address...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data resource-based data crawling method and device, computer equipment and a storage medium. The method includes: receiving a data crawling request, and obtaining parametervalues of a normal-accessing user proxy according to the data crawling request; setting values of a user proxy of a web crawler as the parameter values to obtain an available web crawler; capturing valid IP addresses on a proxy website in preset time by using the available web crawler; binding the multiple valid IP addresses by using a proxy cache server, and generating a proxy IP address table according to the multiple valid IP addresses; and connecting the multiple valid IP addresses corresponding to the proxy cache server by using the available web crawler for data crawling. By adopting the method, passing of attribute detection of websites on the web crawler is facilitated, interception cases are reduced, an IP address used by the web crawler is replaced in time, an effect that the web crawler can use the valid addresses is guaranteed, and data crawling operations are realized.

Description

technical field [0001] The present application relates to the technical field of the Internet, in particular to a data crawling method, device, computer equipment and storage medium. Background technique [0002] A web crawler is a tool used to automatically obtain data from a website. For a website, the acquisition of data by a web crawler will consume the same resources as a real user's visit. For some web crawlers with a large amount of data capture , its resource consumption is even much greater than normal user access. Therefore, for the design of many websites, the anti-crawler strategy of the website is generally adopted, including limiting the speed of visits suspected of being web crawlers, verifying identities through verification codes, and even blocking access to certain IP addresses. Data crawling by web crawlers poses problems. [0003] The traditional method of dealing with website anti-crawler strategies, when the crawler speed is limited for the website, d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08H04L29/12G06F16/953
CPCH04L67/02H04L61/4594H04L61/58H04L67/56H04L67/568
Inventor 李晨光
Owner ONE CONNECT SMART TECH CO LTD SHENZHEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products