A load balancing distributed crawler method and crawler system

A load balancing and crawler system technology, applied in the field of network search, can solve the problems of reduced search speed, waste of resources, idleness, etc., and achieve the effects of reducing waste of resources, improving overall utilization, and improving allocation flexibility and processing speed.

Active Publication Date: 2020-08-11
广东科杰通信息科技有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When a single task needs to wait for another task to complete, the node responsible for processing this single task may appear idle, resulting in a large waste of resources and slowing down the search speed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A load balancing distributed crawler method and crawler system
  • A load balancing distributed crawler method and crawler system
  • A load balancing distributed crawler method and crawler system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and through specific implementation methods.

[0063] The load balancing distributed crawler method of the present embodiment, such as figure 1 As shown, it includes a main server 1 and a plurality of crawler servers 2 intercommunicating with the main server 1, and a plurality of crawler collection nodes 3 are arranged downstream of each said crawler server 2, including a system distributed crawler load balancing process:

[0064] Step A: The main server 1 decomposes a crawler task into a request page task and an analysis page task, and the request page task and the analysis page task are alternately performed in a cycle;

[0065] Step B: The main server 1 distributes the request page task and the analysis page task to different crawler servers 2, and the crawler server 2 assigns the tasks received by itself to each crawler collection node 3 and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a crawler method and system with load balanced distribution. The system comprises a main server and multiple crawler servers communicated with the main server, multiple crawlercollection nodes are set on the downstream part of each crawler server, and the method includes the process of system distributed crawler load balancing; the main server decomposes a crawler task into a request page task and an analysis page task, and the request page task and the analysis page task are performed in a cyclic and alternate mode. Node state information of all the crawler collectionnodes is monitored and analyzed, then the main server can call the crawler collection nodes in time, it is avoided that part of crawler collection nodes are in a no-load state, the overall utilization rate of the crawler collection nodes is increased, and resource waste is reduced.

Description

technical field [0001] The invention relates to the field of network search, in particular to a load balancing and distributed crawler method and a crawler system. Background technique [0002] Each node in the current distributed crawler is only responsible for a single task. When a single task needs to wait for another task to complete, the node responsible for processing this single task may appear idle, resulting in a large waste of resources and slowing down the search speed. Therefore, it is necessary to come up with a way to solve the bottleneck situation of resource waste, so that the distributed cluster machines are in an environment where resources are used reasonably, so that each machine node can perform its duties. Contents of the invention [0003] The purpose of the present invention is to propose a load-balanced and distributed crawler method that can deploy the crawler collection nodes in time, avoid some crawler collection nodes from being in an empty st...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50G06F16/951
Inventor 曾伟英霍智杰徐国坤
Owner 广东科杰通信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products