Network data acquisition method and device, computer equipment and storage medium

A technology of network data and acquisition method, applied in the field of web crawler, can solve the problems of waste of resources, data crawling cost, data crawling failure, etc.

Pending Publication Date: 2021-03-23
广州市创乐信息技术有限公司
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] A web crawler is a program or script that crawls network information according to predetermined rules. However, with the continuous advancement of anti-crawler technology, data crawling failures often occur when using a predetermined crawler system for web data crawling.
[0003] At present, to deal with crawling failures, we mainly analyze the reasons for crawling failures and use various anti-crawler technologies for continuous or cyclic access, resulting in a great waste of resources and an increase in data capture costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network data acquisition method and device, computer equipment and storage medium
  • Network data acquisition method and device, computer equipment and storage medium
  • Network data acquisition method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

[0029] The network data acquisition method provided by this application can be applied to a terminal or a server alone, or partially applied to a terminal or a server. Taking the communication between a terminal and a server as an example, such as figure 1 shown in the application environment. Wherein, the terminal 102 communicates with the server 104 through the network. The user accesses the server 104 through the terminal 102 to acquire a web page, thereby realizing acquisition of web data. Wherein, the terminal 102 can be, but not limited to, various personal compute...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a network data acquisition method and device, computer equipment and a storage medium. The method comprises the steps of monitoring a data capture state under a current downloading strategy in a crawling system, wherein the data capture state comprises page downloading failure, page analysis failure and / or page analysis success; determining a failure index corresponding toa current downloading strategy according to the data capture state; and if the failure index is greater than or equal to a first preset threshold, switching the current downloading strategy into a target downloading strategy, wherein the target downloading strategy is one of a plurality of downloading strategies preset in the crawling system. By adopting the method, invalid grabbing and great waste of data crawling resources can be avoided.

Description

technical field [0001] The present application relates to the technical field of web crawlers, in particular to a network data acquisition method, device, computer equipment and storage medium. Background technique [0002] A web crawler is a program or script that crawls network information according to predetermined rules. However, with the continuous advancement of anti-crawler technology, data crawling failures often occur when using established crawler systems for web data crawling. [0003] At present, to deal with crawling failures, we mainly analyze the reasons for crawling failures and use various anti-crawler technologies for continuous or cyclic access, resulting in a great waste of resources and an increase in data capture costs. Contents of the invention [0004] Based on this, it is necessary to provide a network data acquisition method, device, computer equipment and storage medium for the above technical problems. [0005] A network data acquisition method...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951G06F8/30G06F8/41G06F16/955
CPCG06F8/37G06F8/427G06F16/951G06F16/955
Inventor 曾文清杨濠兴朱光岳廖梓鸿虞孝伟
Owner 广州市创乐信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products