Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Distributed crawler management system and method thereof

A technology of management system and management method, which is applied in the field of distributed crawler management system, can solve the problems of time-consuming, money-consuming, repetitive and monotonous developers, etc., and achieve the effect of optimizing crawler configuration and reducing waste

Active Publication Date: 2017-06-20
GUOXIN YOUE DATA CO LTD
View PDF8 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] The traditional crawler management method to crawl data on the Internet will be blocked by the anti-crawling mechanism of some websites, resulting in crawlers developed by crawler developers who have worked so hard to crawl without useful data, even after a period of time after adjusting the crawler. Crawling data and modifying it repeatedly is time-consuming and costly for enterprises, and it does not make any sense for developers to repeat monotonous work

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed crawler management system and method thereof
  • Distributed crawler management system and method thereof
  • Distributed crawler management system and method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Such as figure 1 As shown, the present embodiment provides a distributed crawler management system, which includes a homepage display module 1, a project management module 2, a crawler management module 3, a data management module 4, a node management module 5 and an agent management module 7.

[0032] Wherein, the home page display module 1 includes a login unit and a data display unit, the login unit provides an interface for users to access the distributed crawler management system, and the user accesses the distributed crawler management system by inputting corresponding identity verification information in the login unit , the data display unit is used to display data related to crawlers. Specifically, when crawling tasks are required, users can log in to the system through the login unit based on the account and password registered in the distributed crawler management system to access the system and perform related operations. Users can use any mainstream web br...

Embodiment 2

[0061] Such as figure 2 As shown, this embodiment provides a distributed crawler management method, the method includes: system login and data display; project creation and management; crawler deployment and management; crawler data monitoring and management; crawler node management; user operation behavior monitoring Records; crawler task agent management. These contents are described in detail below.

[0062] System login and data display

[0063] System login and data display include entering corresponding authentication information in the login unit of the home page display module to access the distributed crawler management system, and displaying crawler-related data in the data display unit of the home page display module. Specifically, when crawling tasks are required, users can log in to the system through the login unit based on the account and password registered in the distributed crawler management system to access the system and perform related operations. Us...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed crawler management system. The distributed crawler management system comprises a homepage presenting module, a project management module, a crawler management module, a data management module, a node management module and an agency management module, wherein the homepage presenting module is used for system logging in and data presenting, the project management module creates projects and manages the projects, the crawler management module conducts deployment and management on project crawlers, the data management module performs monitoring and managing on crawler data, the node management module performs management on crawler nodes and the agency management module performs agency management on crawlers of a user. The crawler nodes are virtual machines applied from a cloud platform. Besides, the invention further provides a distributed crawler management method. The distributed crawler management system can provide anti-crawling schemes and management and analysis of the crawlers and the data, and thus safe and efficient data crawling solving schemes are provided for enterprises and individuals.

Description

technical field [0001] The invention relates to a distributed crawler management system and method, in particular to a distributed crawler management system and method capable of managing and analyzing crawlers and data crawled by the crawlers. Background technique [0002] The traditional crawler management method to crawl data on the Internet will be blocked by the anti-crawling mechanism of some websites, resulting in crawlers developed by crawler developers who have worked so hard to crawl without useful data, even after a period of time after adjusting the crawler. Crawling data and modifying it repeatedly is time-consuming and costly for enterprises, and it does not make sense for developers to repeat monotonous work. [0003] Therefore, it is urgent to provide a solution capable of effectively managing and analyzing the crawler and the data it crawls. Contents of the invention [0004] In order to solve the above-mentioned technical problems, the present invention ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 刘希陈进宝刘光辉
Owner GUOXIN YOUE DATA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products