Unlock instant, AI-driven research and patent intelligence for your innovation.

A Region-Based Multi-Agent Internet Data Collection Task Scheduling Method

A data collection, multi-agent technology, applied in network data retrieval, network data indexing, other database retrieval and other directions, can solve the problem of affecting the data collection performance of the distributed crawler system, without considering the geographical differences of the crawling target, and without using bandwidth Estimation methods and other issues to achieve the effect of improving crawling efficiency

Active Publication Date: 2019-07-23
杭州倡导者网络科技有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The problem with this mechanism is that it does not consider the geographic location differences of crawling targets, and the bandwidth differences of multiple collection nodes in collection and storage, and does not use the corresponding bandwidth estimation method, so it is impossible to achieve optimal allocation of resources. Thus affecting the data collection performance of the distributed crawler system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Region-Based Multi-Agent Internet Data Collection Task Scheduling Method
  • A Region-Based Multi-Agent Internet Data Collection Task Scheduling Method
  • A Region-Based Multi-Agent Internet Data Collection Task Scheduling Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] Embodiment 1: In this embodiment, there is a target task T in the environment 1 , And 6 agent node A 1 ,A 2 ,...,A 6 ; Because K=6> L=1, so the single task allocation algorithm is executed; the steps of the single task allocation algorithm are as follows:

[0037] (1) For each node, calculate (the number of tasks on the node + 1) / (B C +B S ) (Unit: 1 / Mbps), find the node with the smallest calculation result. For example, the corresponding values ​​of the six nodes are: 3, 3, 2, 4, 1, 3, and the node with the smallest result is the fifth node.

[0038] (2) Assign the task to be assigned to the node calculated in step (1), which is the fifth node.

Embodiment 2

[0039] Embodiment 2: In this embodiment, there are 5 target tasks T in the environment 1 , T 2 ,...,T 5 , 4 agent nodes A 1 ,A 2 ,A 3 ,A 4 ; Because K=4 <L=5, so the multi-task allocation algorithm is executed; the steps of the multi-task allocation algorithm are as follows:

[0040] (1) All tasks are assigned 4 items at a time until the number of remaining tasks is less than 4. The remaining tasks can be allocated after the number of tasks increase, or they can be allocated in a single task;

[0041] (2) Known agent A i (i = 1, 2, 3, 4) complete the target task T j The cost matrix of (j=1, 2, 3, 4) is C=(c ij )(i,j=1,2,3,4), where c ij The calculation method of is the same as the single task; the cost matrix C is as Figure 4 (a) Shown.

[0042] (3) Put (c ij ) Is subtracted from the smallest element of the row, see Figure 4 (b), subtract 2 from the first row, subtract 4 from the second row, subtract 1 from the third row, and subtract 2 from the fourth row; then subtract the smal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a region-based multi-agent internet data collection task scheduling method. The method comprises the steps of assuming that L target tasks T1, T2, ..., TL and K agent nodes A1, A2, ..., AK exist in an environment; and if K>L, executing a single-task allocation algorithm, otherwise, executing a multi-task allocation algorithm. During multi-task allocation, a conventional distributed crawler system adopts a random task scheduling mechanism in general, and the problem of the mechanism is that capability difference factors of a plurality of collection nodes are not considered, so that the data collection performance of the distributed crawler system is influenced. For the deficiency, the invention provides a mechanism for estimating a bandwidth level according to the agent nodes and a region in which a data source is located; and, comprehensive allocation operation is performed based on bandwidth data, task information and node state information, and a collection task is preferentially allocated to the node with high collection capability matching degree, so that a distributed data collection system is relatively high in collection performance and the crawling efficiency of internet information is improved.

Description

Technical field [0001] The invention relates to the field of data collection, and in particular to a method for scheduling collection tasks of distributed network crawlers. Background technique [0002] In order to meet the requirements of massive data crawling, modern crawler systems generally adopt a large-scale distributed architecture. In this architecture, how to efficiently configure the resources of multiple data collection nodes has become a key issue in improving the performance of the crawler system. Traditional distributed crawler systems generally use random task scheduling mechanisms or similar mechanisms. The problem with this mechanism is that it does not consider the geographical difference of the crawling target and the bandwidth difference factors of multiple collection nodes in collection and storage, and the corresponding bandwidth estimation method is not used, so the optimal allocation of resources cannot be achieved. This affects the data collection perfo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/48G06F9/50G06F16/951
CPCG06F9/4881G06F9/5044G06F16/951
Inventor 沈颂
Owner 杭州倡导者网络科技有限公司