Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Page crawling method, device, storage medium and processor

A storage medium and processor technology, applied in the computer field, can solve problems such as low crawling efficiency, and achieve the effect of solving low crawling efficiency, improving crawling efficiency and improving speed

Active Publication Date: 2021-07-30
BEIJING GRIDSUM TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Embodiments of the present invention provide a page crawling method, device, storage medium, and processor, so as to at least solve the problem of low crawling efficiency of crawling pages using the same IP address in the related art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Page crawling method, device, storage medium and processor
  • Page crawling method, device, storage medium and processor
  • Page crawling method, device, storage medium and processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] In this embodiment, a page climb method is provided. figure 1 It is a flow chart of the page climbing method according to an embodiment of the present invention, such as figure 1 As shown, the process includes the following steps:

[0031] Step S102, obtain the page climbing task, where the page climb task includes a task crawling multiple web pages, and the task of climbing multiple web pages is a task that needs to be climbed using the same IP address;

[0032] Step S104, acquire a target agent IP address from a preset proxy IP pool, wherein the proxy IP pool is used to store one or more proxy IP addresses;

[0033] Step S106 generates a target task carrying a target agent IP address, and performs a target task based on the target agent IP address.

[0034] Alternatively, the above-described page climbing method can, but is not limited to, in a scene applied to crawling the page content. For example, the scene of climbing the content of the webpage in the website.

[0035...

Embodiment 2

[0049] A page climbing device is also provided in this embodiment, the apparatus for implementing the above embodiments and preferred embodiments have been described. As used herein, the term "module" can achieve a combination of software and / or hardware of a predetermined function. Although the apparatus described below is preferably implemented in software, the implementation of hardware, or combinations of software and hardware may also be conceived.

[0050] image 3 Is a structure frame of a page climbing device according to an embodiment of the present invention Figure one ,like image 3 As shown, the apparatus includes:

[0051] The first acquisition module 32 is used to obtain the page climb task, where the page climb task includes tasks crawling multiple web pages, and the task of climbing multiple web pages is a task that needs to be climbed using the same IP address;

[0052] The second acquisition module 34 is coupled to the first acquisition mode 32 for acquiring a t...

Embodiment 3

[0092] Embodiments of the present invention also provide a storage medium including a stored program, wherein the method of any of the above processes is performed.

[0093] Alternatively, in the present embodiment, the storage medium can be set to store program code for performing the following steps:

[0094] S1, get the page climbing task, where the page climb task includes tasks crawling multiple web pages, and the task of climbing multiple web pages is a task that needs to be climbed using the same IP address;

[0095] S2, get a target proxy IP address from a preset proxy IP pool, where the proxy IP pool is used to store one or more proxy IP addresses;

[0096] S3, the task of climbing multiple web pages into multiple task packs, where each task package in multiple task packets brings tasks and target proxy IP addresses with a web page in multiple web pages;

[0097] S4, perform the page climbing task according to multiple task packages.

[0098] Alternatively, in the present...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a page crawling method, device, storage medium and processor, wherein the method includes: acquiring a page crawling task, wherein the page crawling task includes the task of crawling multiple webpages, and multiple web pages are crawled The task of a web page is a task that needs to be crawled with the same IP address; obtain a target proxy IP address from the preset proxy IP pool; generate a target task with the target proxy IP address, and execute the target task according to the target proxy IP address . By adopting the above technical solution, the problem of low crawling efficiency of crawling pages using the same IP address is solved, and the crawling efficiency of crawling pages using the same IP address is improved.

Description

Technical field [0001] The present invention relates to the field of computer, and in particular, there is a page climb method, a device, a storage medium, and a processor. Background technique [0002] During the distributed crawler crawling network page, sometimes you need to use the same IP address while climbing some pages, which can be judged by some page rules. Traditional processing methods can lock the entire station climbing of the current site uses fixed reptiles to climb, so that each page uses the same IP address to climb. It is also possible to package the page content of the same IP address, send it to a fixed reptile node. This also guarantees the same IP address as used when climbing. [0003] In the traditional method, the first method of using a fixed reptile climb when climbing the entire site, the system logic complexity is relatively low, but the climbs of some pages are dependent on the fixed reptile node, if they are climbing After the process of reptile no...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/951G06F16/953G06F16/9535
CPCG06F16/951
Inventor 崔志伸
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products