Distributed web crawler system and information crawling method

A distributed network and crawler system technology, applied in the field of distributed crawler systems and information crawling, can solve the problems of low stability of master-slave architecture, low efficiency of peer-to-peer architecture, and high resource occupation, so as to reduce occupation, Cost saving and high fault tolerance effect

Inactive Publication Date: 2017-08-18
WUHAN UNIV
View PDF1 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problems of low stability of master-slave architecture, difficult expansion of scale, low efficiency of peer-to-peer architecture and high resource occupation, the present invention provides a new type of distributed crawler system and information crawling method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed web crawler system and information crawling method
  • Distributed web crawler system and information crawling method
  • Distributed web crawler system and information crawling method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] In order to facilitate those of ordinary skill in the art to understand and implement the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0019] please see figure 1 , a distributed network crawler system provided by the present invention, including several control nodes and crawling nodes, all nodes are grouped by a certain method, the grouping basis is mainly the network environment, and the nodes with closer network distances are divided into one group, each There is one control node in the group, and the rest are crawling nodes; the master-slave relationship between the control nodes and the crawling nodes in the same group, and all the control nodes form a peer-to-peer network to jointly co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed web crawler system and an information crawling method. The distributed web crawler system comprises a plurality of control nodes and a plurality of crawling nodes; all the nodes are grouped according to the network distance, and the nodes within the budgeted network distance range form a group; each group comprises a control node, and the other nodes are the crawling nodes; the control node and the crawling nodes in the same group form a master-slave relationship, and all the control nodes form a peer-to-peer network to collectively control operation of the whole system. The distributed web crawler system can dynamically allocate crawling tasks according to a crawling list so as to achieve parallel crawling of massive data by multiple nodes, and is low in cost and high-efficient in performance.

Description

technical field [0001] The invention belongs to the field of computer networks, and in particular relates to a novel distributed crawler system and an information crawling method. Background technique [0002] With the development of Internet technology, there are more and more sites on the Internet, and the amount of information is huge. People urgently need a means to mine useful information, and crawler technology emerges as the times require. The crawler based on a single machine has limited crawling ability, and it is difficult to deal with complex and changeable network information, which prompts the realization of network crawler technology based on distributed systems. [0003] The existing distributed crawler system architecture can be roughly divided into two types: master-slave and peer-to-peer. The master-slave mode means that one host is used as the control node to manage all the hosts running the web crawler. The crawler only needs to receive tasks from the c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9566G06F16/951
Inventor 高靖宇刘科科李武昭
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products