Network crawler task scheduling method and device

A technology of task scheduling and web crawler, which is applied in the direction of network data retrieval, network data indexing, multi-programming devices, etc., and can solve problems such as low work efficiency, frequent reading and writing of databases, and easy blockage of databases

Inactive Publication Date: 2018-02-16
广州探迹科技有限公司
View PDF12 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present invention provides a network crawler task scheduling method and device to solve the problem that the exi...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network crawler task scheduling method and device
  • Network crawler task scheduling method and device
  • Network crawler task scheduling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0032] figure 1 A schematic flow chart of a web crawler task scheduling method provided by an embodiment of the present invention, as shown in figure 1 As shown, the method includes the following steps:

[0033] Step 101, the first scheduler receives the first crawler task, and determines the type of the first crawler task according to the state of the first crawler task; when it is confirmed that the type of the first crawler task is delayed processing, dete...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a network crawler task scheduling method and device, relates to the field of software engineering, and aims at solving the problem that the working efficiency is low as existing crawler task scheduling needs to frequently read and write databases and the databases are easy to block. The method comprises the following steps of: receiving a first crawler task by a first scheduler and determining a type of the first crawler task according to a state of the first crawler task; when the type of the first crawler task is delayed processing, determining an execution time corresponding to the delayed processing and storing the first crawler task in a cache database; traversing the cache database by a second scheduler in an updating period, and sending the first crawler taskcorresponding to an execution time into a memory priority queue when determining that the execution time arrives; and obtaining crawler tasks from the memory priority queue in sequence by a third scheduler by adoption of a round-robin algorithm until the first crawler task is taken out from the memory priority queue.

Description

technical field [0001] The present invention relates to the field of software engineering, and in particular to a method and device for dispatching web crawler tasks. Background technique [0002] The web crawler task is a program that automatically extracts web pages. It downloads web pages from the World Wide Web for search engines and is an important component of search engines. [0003] Existing crawler task scheduling consists of only one scheduling module. The scheduling module performs various time-consuming operations such as crawler task persistence, database duplication check, task priority sorting, scheduled task execution, and crawler task status statistics in one cycle. When the number of crawler tasks reaches dozens or more, the number of concurrent crawler tasks will reach the thousand level. The scheduling module needs to read and write the database frequently to process these tasks. The burden on the database is very serious, and the overall efficiency of th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/50G06F17/30
CPCG06F9/5016G06F9/5038G06F16/951G06F2209/5021
Inventor 陈开冉邓楚健
Owner 广州探迹科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products