A Distributed Download System for Internet Crawlers

A distributed download and Internet technology, applied in the system field of the Internet field, can solve the problems of high thread overhead, global blocking, blocking, etc., and achieve the effect of improving download speed, balancing download services, and reducing network transmission

Inactive Publication Date: 2019-07-19
北京云悦共创网络技术有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 2. The traditional domain name resolution method is to perform domain name resolution through the underlying c function of gethostbyname that comes with the operating system. However, because this function is a synchronous function, when a thread is waiting for domain name resolution, other threads calling this function will also be blocked. , therefore, even if multi-threaded is used for downloading, when calling this function for domain name resolution, it is still blocked globally. Therefore, when a large number of domain name resolution requests are encountered, domain name resolution becomes the bottleneck of the entire download system
[0007] 3. Due to the location of each server on the Internet and the capacity of the server bandwidth, when downloading web pages, the difference in URL delay caused by I / O waiting will also affect the download speed and capacity of the entire system, while traditional solutions for IO The waiting solution is the multi-threaded model, that is, each thread completes different download tasks without affecting each other, but the defect of using the multi-threaded model is that if you want to improve the download capacity of a single machine, you must open more threads, and the number of threads The overhead is also very large, whether it is memory or cpu scheduling, it will put a lot of pressure on the download machine

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Distributed Download System for Internet Crawlers
  • A Distributed Download System for Internet Crawlers

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The specific embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0038]The present invention provides a downloading system of a distributed system of Internet crawlers, which includes a central server (Master), several downloading servers (Slave), clients (Clients) that need to download resources and DNS server clusters using an event trigger model.

[0039] The central server (Master) realizes the download scheduling of the download server, and is not responsible for the download task;

[0040] The download server (Slave) completes specific download tasks. The download server periodically sends heartbeats (Heartbeat, indicating nodes in the network and confirming their normal operation) and download status to the central server; Normal, prevents the client from sending a url to be downloaded to a slave that is down.

[0041] The above download status includes the number of successful downloads...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed downloading system of Internet crawlers. The system comprises a central server, a client, downloading servers and a DNS server cluster utilizing an event triggering model. The system is capable of providing efficient and balanced downloading service to crawlers of a search engine.

Description

technical field [0001] The invention relates to a system in the Internet field, in particular to a distributed download system for Internet crawlers. Background technique [0002] With the rapid development of the Internet, the data on the Internet is getting bigger and bigger. According to the China Internet Network Information Center's 2013 China Search Engine Market Research Report, the number of registered websites in China is 3.2 million, the number of domain names is 18.44 million, and the number of web pages is 150 billion. ;As of April 14, 2014, the total number of domain names in the world has reached 136,285,365, of which the United States ranks first with 81,136,981 domain names, and China has 7,907,696 domain names, ranking second. [0003] Search engines are the main way to obtain unknown information, and how search engine crawlers download huge data is a very important issue. The traditional stand-alone download mode has been unable to complete the download ta...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/08H04L29/12
Inventor 席齐许欢庆郭永福陈沛
Owner 北京云悦共创网络技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products