Highly-decoupled method capable of dynamically managing crawlers

A dynamic management and crawler technology, applied in the direction of electrical digital data processing, special data processing applications, other database retrieval, etc., can solve problems such as coupling phenomena

Active Publication Date: 2021-05-18
苏州市中地行信息技术有限公司
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the current problem of coupling phenomena in the realization of functions such as data capture, data analysis, data scheduling, and updating, the method proposed by the present invention, which is suitable for highly decoupled and dynamically manageable crawlers, can be based on effective data scheduling and updating methods To achieve the corresponding decoupling phenomenon

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Highly-decoupled method capable of dynamically managing crawlers
  • Highly-decoupled method capable of dynamically managing crawlers
  • Highly-decoupled method capable of dynamically managing crawlers

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0016] Refer to attached figure 1 , a method for highly decoupling and dynamically managing crawlers, characterized in that it includes:

[0017] Deploying the crawler host-side image and running the host-side service, the host-side image completes message transmission, data scheduling, storage records, and log analysis;

[0018] Deploy the crawler client image and run the client service, the client image completes message transmission, crawler control and crawler...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a highly-decoupled method capable of dynamically managing a crawler, which comprises the following steps that: the crawler is divided into two stages of data analysis and new target generation, rules of the two stages corresponding to a collection target are compiled into json data according to a protocol, and the json data are stored into a host end; the host runs a task and sends the task to a client with sufficient resources through a message queue module according to a resource scheduling algorithm, the client receives task information, converts the task information into executable information through the crawler protocol core, runs the executable information through a crawler running module, and finally obtains data; the host end obtains data and a new task, and stores and updates a task pool; and the host end is separated from a crawler server, so that the coupling of the system can be reduced. Therefore, after the functions of the crawler are separated, the complexity of the crawler server can be reduced, and the host end can be modified while the distributed crawler system runs so as to achieve the purpose of specific control management, so that the whole module is subjected to decoupling and extensible design, and the robustness and stability of the whole framework are enhanced.

Description

technical field [0001] The invention relates to computer data mining technology, in particular to a highly decoupled and dynamically managed crawler method. Background technique [0002] As a tool for network information search, search engine collects and discovers information on the Internet with certain strategies, understands, extracts, organizes and processes information, and provides retrieval services for users. In 1994, the crawler program was applied to the indexing program, and Yahoo, Google, etc. appeared one after another. But so far, no matter how powerful the search engine is, there are still problems such as information loss, low update rate, and low accuracy rate. Users need faster, more accurate, more convenient, and more effective query services, which has become the goal of research and development of search engine technology. [0003] In this case, topic crawlers that directionally grab related web resources came into being. Theme crawler, also known as...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/16G06F16/172G06F16/182
CPCG06F16/951G06F16/164G06F16/172G06F16/182
Inventor 金智辉
Owner 苏州市中地行信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products