A scalable distributed data acquisition method and system

A data acquisition system and distributed network technology, applied in the computer field, can solve problems such as ease of use, versatility, and manageability of unreasonable distribution of distributed deployment tasks.

Active Publication Date: 2021-09-14
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention proposes a highly scalable distributed data acquisition framework and method, which solves the contradiction between the ease of use, versatility and manageability of existing data acquisition systems and the problem of unreasonable task allocation in distributed deployment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A scalable distributed data acquisition method and system
  • A scalable distributed data acquisition method and system
  • A scalable distributed data acquisition method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0079] In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described below through specific embodiments and accompanying drawings.

[0080] For the problem of unreasonable task allocation in distributed deployment, the present invention proposes a solution, the steps are as follows:

[0081] S1: The system consists of a master node, several working nodes, and an intermediate node that provides communication services for two types of nodes. Each node is deployed on each server as needed, which can be divided into the following steps:

[0082] S101: Deploy the intermediate node to an environment that other nodes can access; the intermediate node includes programs such as message queues and databases, where the message queue includes task queues and data queues;

[0083] S102: Deploy the master node in a stable environment, which can be an internal network;

[0084] S103: Depl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a scalable distributed data collection method and system. The method includes: deploying master nodes, working nodes and intermediate nodes; the master node regularly generates collection tasks according to the timing tasks in the database, and publishes them to the task queue of the message queue; The status of the collection task determines whether to apply for the execution of the collection task; the master node selects the best work node from the work nodes that apply for the same collection task to execute the collection task, and removes the collection task from the task queue; The task generates and executes the collection process, and puts the collected data into the data queue in the message queue of the intermediate node; the working node monitors the running status of the collection process and records relevant data. The invention solves the contradiction among the usability, versatility and manageability of the existing data acquisition system and the problem of unreasonable task allocation in distributed deployment.

Description

technical field [0001] The present invention relates to the field of computers, in particular to a scalable distributed data collection method and system, and the data collection mentioned in the present invention specifically refers to data collection published on the Internet. Background technique [0002] Network data collection refers to the process of obtaining public data resources from the Internet and then saving them to specific locations according to requirements. Network data collection is usually implemented using a web crawler, which is a computer program that automatically crawls Internet web page data according to certain rules. A crawler can download web pages from a certain website on the Internet, then parse and filter the web pages, and extract data from them according to requirements. With the development of artificial intelligence and big data technology, the demand for network data is increasing rapidly. A research work requires tens of millions of dat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50G06F9/54G06F16/951
CPCG06F9/5038G06F9/546G06F16/951
Inventor 姜政伟江钧李仲举贺义通
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products