Universal distributed crawler scheduling system

A scheduling system and distributed technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of low efficiency of crawler, achieve the effect of clear structure, simple and reliable transmission

Active Publication Date: 2015-08-05
NANJING UNIV
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a general-purpose distributed crawler system, mainly aiming at the low efficiency of crawlers under the background of information explosion growth, and the current situation of compatibility problems of diversified crawlers. crawler framework to integrate diverse crawlers and improve crawler efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Universal distributed crawler scheduling system
  • Universal distributed crawler scheduling system
  • Universal distributed crawler scheduling system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Aiming at the problems of low data acquisition efficiency and insufficient compatibility of distributed frameworks in the current big data era, the present invention proposes a universal and universal distributed crawler scheduling system, which is compatible with diverse crawlers on the basis of ensuring high speed, effectiveness and accuracy. Based on the strategy, a prototype system was implemented to verify the rationality of the invention.

[0024] In order to change the current situation that the distributed framework structure is complex and the versatility is weak, the present invention implements a general-purpose distributed crawler system on the basis of simplicity and easy implementation. At the same time, the URL information is encapsulated based on a unified communication protocol, and the reflection mechanism is used to extract the information, so that diversified URL information and other data information can be transmitted to achieve the purpose of diver...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A universal distributed crawler scheduling system is provided, which includes a controller and crawlers. The controller includes a display module and a scheduling module; the display module is used for viewing various task process information and logs and triggering the control of the crawlers, and the scheduling module includes a communication area, a maintaining area, a data holding area, and a data buffering area. Each crawler includes a crawling module and a scheduling module, wherein the crawling module achieves specific crawler functions; the scheduling module includes a communication area, a maintaining area, a data holding area and a data buffering area. The communication areas in the scheduling modules achieve interconnection therebetween by means of Socket long connection asynchronous communication. With regard to the problems such as low efficiency in obtaining data, and losing data during large data time, a general and universal distributed crawler system frame is provided; moreover, the universal distributed crawler scheduling system is compatible with diversified crawling strategies while ensuring a fast, effective and accurate crawling process.

Description

Technical field: [0001] The present invention mainly relates to an efficient collection system of Internet data, in particular to the realization of a general-purpose distributed crawler system. Mainly aiming at the problems of low data acquisition efficiency and insufficient compatibility of distributed frameworks in the current big data era, a universal and general-purpose distributed crawler scheduling scheme is proposed, which is compatible with diversified crawling on the basis of ensuring high speed, effectiveness and accuracy Strategies to achieve diverse crawling tasks on a unified platform. Background technique: [0002] The rapid development of the information age has promoted the rapid development of Internet technology and the explosive growth of information. Traditional search engine technology, which has become increasingly prominent as an information retrieval tool, enables people to quickly and accurately locate the information they need. However, limited by...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 吴骏王涛刘勇陈嘉伟吴和生谢俊元
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products