Data crawling method and device, electronic equipment and storage medium

An electronic device and data technology, applied in the Internet field, can solve the problems of high crawling cost and low crawling efficiency

Pending Publication Date: 2021-03-30
BEIJING GRIDSUM TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Relevant data crawling technologies usually increase the success rate of crawling by setting more crawler clients and limiting the frequency of crawling. This method has the problems of high crawling cost and low crawling efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data crawling method and device, electronic equipment and storage medium
  • Data crawling method and device, electronic equipment and storage medium
  • Data crawling method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, but not all of them. Based on the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present application.

[0049] A server implementing various embodiments of the present invention will now be described with reference to the accompanying drawings. In the following description, use of suffixes such as 'module', 'part' or 'unit' for denoting elements is only for facilitating description of the present invention and has no specific meaning by i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data crawling method and device, electronic equipment and a storage medium, and the method comprises the steps: obtaining the information of a to-be-crawled webpage, whereinthe information of the to-be-crawled webpage comprises a uniform resource locator URL; converting the URL into a target URL of a target webpage, the to-be-crawled webpage and the target webpage corresponding to the same target resource, with the first crawling difficulty level corresponding to the target webpage being lower than the second crawling difficulty level corresponding to the to-be-crawled webpage; and crawling data in the target webpage according to the target URL of the target webpage. The URL of the to-be-crawled webpage is switched to the target URL of the target webpage with the low crawling difficulty level in the URL conversion mode, crawling is conducted on the data in the target webpage according to the target URL, the data crawling difficulty can be greatly reduced, crawling resource consumption is reduced, cost is reduced, and the data crawling efficiency is improved.

Description

technical field [0001] The present application relates to the technical field of the Internet, and in particular to a data crawling method, device, electronic equipment and storage medium. Background technique [0002] With the development of the Internet, a network resource is usually released on multiple platforms, such as publishing a PC-side WEB site, a mobile-side WEB site, and an APP. Different data sources have different crawling difficulties. [0003] Relevant data crawling technologies usually increase the success rate of crawling by setting more crawler clients and limiting the frequency of crawling. This method has the problems of high crawling cost and low crawling efficiency. Contents of the invention [0004] In order to solve the above technical problems or at least partly solve the above technical problems, embodiments of the present application provide a data crawling method, device, electronic equipment and storage medium. [0005] In view of this, in th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951G06F16/9535G06F16/955
CPCG06F16/955G06F16/9535G06F16/951
Inventor 武玉博
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products