Distributed crawler system architecture, data crawling method and computer equipment

A crawler system and distributed technology, applied in the field of data collection, can solve the problems of low development and maintenance efficiency of developers, poor stability and versatility of mining construction, etc., to achieve improved stability and expansion capabilities, reliable and efficient operation and maintenance management, and reduce The effect of capacity needs

Pending Publication Date: 2019-11-15
重庆金融资产交易所有限责任公司
View PDF14 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The main purpose of this application is to provide a distributed crawler system architecture, a method of crawling data, and computer equipment, aiming to solve the problem of poor stability and versatility of distributed crawler systems in the prior art, and low development and maintenance efficiency of developers The problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed crawler system architecture, data crawling method and computer equipment
  • Distributed crawler system architecture, data crawling method and computer equipment
  • Distributed crawler system architecture, data crawling method and computer equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

[0040] refer to figure 1 , the present application first proposes a distributed crawler system architecture. The design of the architecture uses HTTP service registration to isolate different modules. Different modules use message queues to access each other. The architecture includes :

[0041] Task publishing module 10, for publishing crawler tasks;

[0042] The crawler service module 20 is used to store different crawler services in the form of services, and different crawler services complete different crawler tasks;

[0043] The crawler module 30 is used to receive...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed crawler system architecture, a data crawling method and computer equipment, and the method comprises the steps: obtaining a crawler task through a task publishingmodule, and transmitting the crawler task to a crawler module; after obtaining the crawler task, enabling the crawler module to call a target crawler service corresponding to the crawling requirementfrom the crawler service module, and crawls original crawler data from a target website by utilizing the target crawler service; and storing the crawled original crawler data in a preset first storage module. According to the distributed crawler system architecture, the distributed crawler data crawling method, the computer equipment and the like, the crawler service module is arranged, bottom-layer requirements of the whole crawler system are packaged, modularized and serviced, the workload of developers is reduced, the development language of the developers is not limited, and the capacityrequirements are reduced; and the stability and the expansion capability of the crawler system are improved through architecture design.

Description

technical field [0001] This application relates to the field of data collection, in particular to a distributed crawler system architecture, a data crawling method and computer equipment. Background technique [0002] The current design of the crawler platform is mainly for customized development of a single business scenario. Different crawlers always need to write independent modules for requirements, which leads to most crawler systems not considering the stability and versatility of the entire system. , The development and maintenance efficiency of developers is low. Contents of the invention [0003] The main purpose of this application is to provide a distributed crawler system architecture, a method of crawling data, and computer equipment, aiming to solve the problem of poor stability and versatility of distributed crawler systems in the prior art, and low development and maintenance efficiency of developers The problem. [0004] In order to achieve the purpose o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/953
CPCG06F16/951G06F16/953
Inventor 车驰李钢权佳成谭瑞张瑜
Owner 重庆金融资产交易所有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products