Data crawling method and device, storage medium and terminal

A technology for fetching data and data, applied in the direction of network data retrieval, network data indexing, and other database retrieval, etc., can solve the problems of increased maintenance cost of crawlers, affecting data integrity, data omission, etc., to improve the automatic response ability of crawlers, improve Access success rate and the effect of improving efficiency

Inactive Publication Date: 2020-09-15
JINGZAN ADVERTISING SHANGHAI CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, with the development and popularization of anti-crawler technology, there are problems with crawler crawling data: 1. The cost of crawler maintenance increases; 2. R&D personnel are required to specify different cracking anti-crawler technologies according to different website anti-crawler technologies; 3. Data Omissions that affect the integrity of the data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data crawling method and device, storage medium and terminal
  • Data crawling method and device, storage medium and terminal
  • Data crawling method and device, storage medium and terminal

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] As mentioned in the background technology, with the development and popularization of anti-crawler technology, there are problems in crawling data by crawlers: 1. The maintenance cost of crawlers increases; technology; 3. Data omission, which affects the integrity of the data.

[0026] The technical scheme of the present invention can determine the anti-crawler configuration set by the target visited website through the status code or page data fed back by the target visited website, so that the cracking operation for the anti-crawler configuration can be determined according to the status code or page data, thereby updating the access request Revisit the target website to obtain the website content of the target website, improve the crawler's automatic response capability, and improve the efficiency of web crawlers in crawling data.

[0027] In order to make the above objects, features and advantages of the present invention more comprehensible, specific embodiments of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data crawling method and device, a storage medium and a terminal. The data crawling method comprises the steps of simulating a browser to send an access request for a targetaccess website; receiving a response message of the target access website for the access request, wherein the response message comprises a state code and page data; updating the access request according to the state code or the page data; and obtaining the content in the target access website by utilizing the updated access request. According to the technical scheme, the data crawling efficiency of the web crawler can be improved.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a method and device for crawling data, a storage medium, and a terminal. Background technique [0002] A web crawler is a program or script that automatically and efficiently grabs Internet information according to certain rules. [0003] With the rapid development of big data, more and more enterprises and websites want to prevent website data from being collected in batches at high speed by web crawlers, and anti-crawler technology has emerged as the times require. Anti-crawler technologies are also different, for example, restricting the access frequency of Internet Protocol Address (IP), restricting the speed of browsing web pages, account login verification, input verification code and other technologies. [0004] However, with the development and popularization of anti-crawler technology, there are problems with crawler crawling data: 1. The cost of crawler...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951
Inventor 汤奇峰陈泽顺
Owner JINGZAN ADVERTISING SHANGHAI CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products