Page crawling method, device, storage medium and processor
A storage medium and processor technology, applied in the computer field, can solve problems such as low crawling efficiency, and achieve the effect of solving low crawling efficiency, improving crawling efficiency and improving speed
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0030] In this embodiment, a page climb method is provided. figure 1 It is a flow chart of the page climbing method according to an embodiment of the present invention, such as figure 1 As shown, the process includes the following steps:
[0031] Step S102, obtain the page climbing task, where the page climb task includes a task crawling multiple web pages, and the task of climbing multiple web pages is a task that needs to be climbed using the same IP address;
[0032] Step S104, acquire a target agent IP address from a preset proxy IP pool, wherein the proxy IP pool is used to store one or more proxy IP addresses;
[0033] Step S106 generates a target task carrying a target agent IP address, and performs a target task based on the target agent IP address.
[0034] Alternatively, the above-described page climbing method can, but is not limited to, in a scene applied to crawling the page content. For example, the scene of climbing the content of the webpage in the website.
[0035...
Embodiment 2
[0049] A page climbing device is also provided in this embodiment, the apparatus for implementing the above embodiments and preferred embodiments have been described. As used herein, the term "module" can achieve a combination of software and / or hardware of a predetermined function. Although the apparatus described below is preferably implemented in software, the implementation of hardware, or combinations of software and hardware may also be conceived.
[0050] image 3 Is a structure frame of a page climbing device according to an embodiment of the present invention Figure one ,like image 3 As shown, the apparatus includes:
[0051] The first acquisition module 32 is used to obtain the page climb task, where the page climb task includes tasks crawling multiple web pages, and the task of climbing multiple web pages is a task that needs to be climbed using the same IP address;
[0052] The second acquisition module 34 is coupled to the first acquisition mode 32 for acquiring a t...
Embodiment 3
[0092] Embodiments of the present invention also provide a storage medium including a stored program, wherein the method of any of the above processes is performed.
[0093] Alternatively, in the present embodiment, the storage medium can be set to store program code for performing the following steps:
[0094] S1, get the page climbing task, where the page climb task includes tasks crawling multiple web pages, and the task of climbing multiple web pages is a task that needs to be climbed using the same IP address;
[0095] S2, get a target proxy IP address from a preset proxy IP pool, where the proxy IP pool is used to store one or more proxy IP addresses;
[0096] S3, the task of climbing multiple web pages into multiple task packs, where each task package in multiple task packets brings tasks and target proxy IP addresses with a web page in multiple web pages;
[0097] S4, perform the page climbing task according to multiple task packages.
[0098] Alternatively, in the present...
PUM

Abstract
Description
Claims
Application Information

- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com