Distributed webpage downloading method and system

A distributed and webpage technology, applied in the field of computer networks, can solve problems such as inability to effectively download webpages, achieve high real-time response performance, break through the limitations of concurrent requests, and have strong scalability

Active Publication Date: 2014-06-18
XIAMEN MEIYA PICO INFORMATION
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a distributed web page download method, which is used to solve the problem that the existing web page download system cannot effectively download web pages due to limited IP address resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed webpage downloading method and system
  • Distributed webpage downloading method and system
  • Distributed webpage downloading method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0025] Method example:

[0026] refer to figure 1 , shows the flow of an embodiment of the distributed web page downloading method of the present invention, the system that executes the method embodiment includes a web crawler, a task scheduling service unit, and an Internet access client (including a personal computer, a network server, etc.) connected to the Internet ); This preferred method embodiment includes the following steps:

[0027] Step S101: the web crawler sends a webpage download request to the task scheduling service unit;

[0028] In this preferred embodiment, in order to improve the ability to download web pages, more than two web crawlers are set to request to grab and save web page data from the Internet. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed webpage downloading method and system. The method includes the steps that a web crawler sends a webpage downloading request to a task scheduling service unit; the task scheduling service unit receives the webpage downloading request and stores the webpage downloading request into a first information queue; an internet client obtains the webpage downloading request from the first information queue of the task scheduling service unit, downloads corresponding webpage data, and stores the webpage data into a second information queue of the task scheduling service unit; the task scheduling service unit feeds the webpage data in the second information queue to the corresponding web crawler requesting webpage downloading. On the basis of a point-to-point distributed webpage downloading mode of the two information queues, webpages can be captured by means of internet machines distributed in different places, webpage downloading requests can be responded accurately in real time, and limitations to concurrence downloading requests of websites can be effectively broken through.

Description

technical field [0001] The present invention relates to the technical field of computer networks, in particular to a distributed web page downloading method and system. Background technique [0002] The distributed web page download system includes multiple web crawlers, and each web crawler needs to grab web page data from the Internet and save it. All these web crawlers may be distributed in different geographical locations. According to their different degrees of dispersion, web crawler systems can be divided into two categories: one is a distributed web crawler system based on a LAN; the other is a distributed web crawler system based on a wide area network. type web crawler system. [0003] In web crawling projects, IP address resources are usually the most lacking. Most websites, especially in specific fields, such as Weibo, etc., will impose restrictions on the crawling end, such as the concurrent request limit of the same IP address, a period of time The number of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08G06F9/46G06F17/30
Inventor 何培林汤伟宾陈晨章正道林胜通
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products