Data crawling method and device, computer equipment and storage medium

A technology of data and target data, applied in the field of data processing, can solve the problem of not being able to automatically drill down on pages at the same time

Pending Publication Date: 2021-10-19
PINGAN INT SMART CITY TECH CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the technical problem that in the data crawling process in the prior art, the page cannot be automatically drilled down and the target data of all the pages to be crawled can be obtained at the same time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data crawling method and device, computer equipment and storage medium
  • Data crawling method and device, computer equipment and storage medium
  • Data crawling method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to make the objects, technical solutions, and advantages of the present application, the technical solutions in the present application embodiment will be described in connext of the present application embodiment, and It is a part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without making creative labor are the scope of the present application. It should be understood that the specific embodiments described herein are intended to explain the present application and is not intended to limit the present application.

[0031] figure 1 An application environment for the data climbing method in accordance with an embodiment of the present application. Refer figure 1 This data climb method is applied to the data crawling system. The data climb system includes terminal 110 and server 120. Terminal 110 and server 120 are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data crawling method and device, computer equipment and a storage medium, and the method comprises the steps: reading and analyzing a crawler configuration file to obtain a uniform resource locator of a starting target page serving as a current page; obtaining target data of a previous page of the current page, and generating and sending an access request carrying the target data of the previous page according to the uniform resource locator of the current page; extracting target data of the current page from response data of the access request; judging whether a next page exists in the current page or not; if the next page exists, acquiring a uniform resource locator of the next page, taking the next page as the current page, and circulating the steps until the next page does not exist in the current page; and obtaining final target data according to the target data obtained by the last page. According to the method and the device, each target page can be automatically drilled down, the page data of each level can be extracted and analyzed, and meanwhile, the data can be automatically merged and integrated for output.

Description

Technical field [0001] The present application relates to the field of data processing, in particular, to a data climb method, device, computer device, and storage medium. Background technique [0002] Currently, climbing methods commonly used based on the SCRAPY framework are divided into two. [0003] One is based on inheriting the reptiles prepared by the SPIDER class. The advantage of this reptile is that the data assembly between the different pages can be solved, but every time the drill climbs a level, you must manually write a function to call crawle, if one The website has a lot of levels of hierarchy, each level page must write an extract function, which must write a large number of extraction functions to splicing extraction, clearly not reach the effect of data automation and splicing. [0004] The second is based on inheriting the crawlspider class, the advantage of this reptile is that the automatic drilling is automatically extracted, and the links of each hierarch...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/955
CPCG06F16/951G06F16/955
Inventor 贾波涛
Owner PINGAN INT SMART CITY TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products