Unlock instant, AI-driven research and patent intelligence for your innovation.

Distributed crawler task scheduling system and method

A scheduling system and scheduling method technology, which are applied in the field of scheduling methods and systems for distributed crawler tasks, can solve the problems of inability to meet the individual needs of customers and the inability to obtain data effectively and quickly with their own efficiency, achieving fast collection speed and universal use of crawlers. high sex effect

Inactive Publication Date: 2020-03-27
UNIV OF ELECTRONICS SCI & TECH OF CHINA +1
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the services provided by large search engines to users are not customizable, when users need to crawl some data, these large search engines often cannot meet the personalized needs of customers.
However, stand-alone web crawlers are limited by their own low efficiency and cannot effectively and quickly obtain the data needed by users.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed crawler task scheduling system and method
  • Distributed crawler task scheduling system and method
  • Distributed crawler task scheduling system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0049] figure 1 It is a flow chart of a method for scheduling distributed crawler tasks provided by an embodiment of the present application; figure 1 As shown, the scheduling method of the distributed crawler task provided by this embodiment includes at least the following steps:

[0050] S101, acquiring a user-defined crawler task;

[0051] In practical application scenarios, crawler tasks are generated based on crawler task scripts. The crawler task script is created by the user and combined in sequence according to various operations of the user on the browser. The crawler task script may contain loop operations, indicating that the same processing is performed on the urls in the loop list in turn. Therefore, for a crawler task created by a user, the application splits the outermost loop parameters of the task to form subtasks, so as to facilitate parallel crawling of the task. Among them, the processing method for the circular list is described in more detail below. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the related technical field of crawler tasks, in particular to a distributed crawler task scheduling method and system. The distributed crawler task scheduling method providedby the invention comprises the steps of obtaining a crawler task defined by a user; confirming the number of virtual nodes required by the crawler task; obtaining the state of each virtual node; andbased on the state of each virtual node and the number of the virtual nodes required by the crawler task, allocating the crawler task to each virtual node for the virtual node to execute. According tothe scheme provided by the invention, the crawler task self-defined by the user can be obtained; and the crawler task is executed. Through the arrangement, a customizable crawler task can be providedfor the user, the data is crawled based on the requirements of the user, and the data which more specifically meets the requirements of the user is crawled for the user.

Description

technical field [0001] The present application relates to the technical field related to crawler tasks, and in particular to a scheduling method and system for distributed crawler tasks. Background technique [0002] With the rapid development of the Internet, big data has penetrated into various industries and business functional areas, and the value of big data is becoming more and more significant. It is becoming more and more important to extract meaningful and valuable data. Therefore, web crawlers used for Internet information collection are facing great opportunities and challenges. [0003] At present, some large search engines at home and abroad only provide users with non-customizable search services. This method can bring certain convenience for users to obtain data. However, since the services provided by large search engines to users are not customizable, when users need to crawl some data, these large search engines often cannot meet the personalized needs of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F16/951
CPCG06F9/5088G06F9/5027G06F16/951
Inventor 田丹田俊豪银虹宇李奇宇
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA