Distributed network crawler task scheduling method and apparatus

A technology of distributed network and scheduling method, applied in the field of distributed network crawler task scheduling, can solve the problems of reducing the processing efficiency of network crawling point tasks and failing to achieve effective management of network crawling points, and is conducive to popularization and application, realizing Effective management and improved usability

Active Publication Date: 2018-06-26
NEW FOUNDER HLDG DEV LLC +1
View PDF3 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a scheduling method and device for a distributed network crawler task, which can effectively overcome the problem in the prior art that the effective management of network crawling points is not realized, thereby reducing the task processing efficiency of network crawling points

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed network crawler task scheduling method and apparatus
  • Distributed network crawler task scheduling method and apparatus
  • Distributed network crawler task scheduling method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0022] figure 1 A schematic flow diagram of a scheduling method for a distributed web crawler task provided by an embodiment of the present invention; refer to the attached figure 1 It can be seen that this embodiment provides a scheduling method for distributed web crawler tasks. The scheduling method is used to assign tasks to web crawlers according to the processing capabilities of web crawlers and the order of priority of tasks. Specifically, the method includes:

[0023] S101: Obtain the processing capability of each crawler node in the distributed network;

[0024] Among them, there are multiple crawler nodes in the distributed network. In order to facilitate the a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed network crawler task scheduling method and apparatus. The method comprises the steps of obtaining processing capacity of each crawler node in a distributed network; and according to a preset priority sequence and the processing capacity of each crawler node, allocating corresponding to-be-processed tasks to each crawler node, thereby enabling the crawler node to process the allocated to-be-processed tasks. According to the distributed network crawler task scheduling method and apparatus provided by the invention, the corresponding quantity of the to-be-processed tasks are allocated to the crawler node according to the preset priority sequence and the processing capacity of each crawler node, so that the effective management of the crawler nodes is realized and the efficiency of processing the allocated to-be-processed tasks according to the processing capacity by each crawler node is ensured; and therefore, the practicality of the scheduling methodis improved.

Description

technical field [0001] The invention relates to the technical field of crawler nodes, in particular to a scheduling method and device for distributed network crawler tasks. Background technique [0002] In the era of big data, the value of data is self-evident. Search engines, public opinion systems, and price comparison systems are all based on obtaining large amounts of data, so crawler nodes have become an indispensable and important part. With the development of the Internet, information and knowledge are growing explosively, which brings higher challenges to crawler nodes. The single-node crawler structure can no longer meet the demand, and distributed crawler nodes have emerged. [0003] The distributed crawler node is composed of multiple crawler nodes, which can be divided into master-slave mode, autonomous mode and hybrid mode according to different communication methods. The master-slave mode means that one host is responsible for managing all running crawler nod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 张学颖张丹于晓明曹六一
Owner NEW FOUNDER HLDG DEV LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products