Unlock instant, AI-driven research and patent intelligence for your innovation.

High-concurrency parallel-computation big data acquisition system and method

A parallel computing and acquisition system technology, applied in database management systems, computing, network data retrieval, etc., can solve the problem of low real-time performance, and achieve the effect of improving acquisition capabilities, data acquisition capabilities and data processing capabilities

Inactive Publication Date: 2017-12-08
GUANGZHOU TEDAO INFORMATION TECH CO LTD
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the existing big data acquisition framework, web spider technologies such as Heritrix and PySpider are mainly used to capture Internet data in real time. The big data collection architecture is lightweight, and the existing big data collection architecture does not have high real-time performance when sending Internet data to the data center

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-concurrency parallel-computation big data acquisition system and method
  • High-concurrency parallel-computation big data acquisition system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0036] Such as figure 1 As shown, the present invention provides a large data acquisition system with high concurrent parallel computing, including: a business data management platform, a dispatch center, a collector, a stream service 19 and a data node; the business data management platform includes a site configuration module 10 and collection rule base 11; the dispatch center includes a proxy server 15 and at least two distribution servers, such as the firs...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a high-concurrency parallel-computation big data acquisition system. The high-concurrency parallel-computation big data acquisition system comprises a business data management platform, a dispatching center, a collector, a streaming server and a data node. The business data management platform comprises a station configuration module used for configuring a task to be acquired according to different contents and website characteristics and an acquisition rule base used for storing and distributing the task to be acquired to distribution servers. The dispatching center comprises at least two distribution servers used for dividing the task to be acquired into subtasks. The collector is used for executing subtask crawling operation based on a preset acquisition strategy in order to carry out data acquisition, and sending the acquired data to the streaming server. The streaming server is used for aggregating the acquired data, and sending the data to the corresponding data node according to the business identification for retrieval and storage. The invention also discloses a high-concurrency parallel-computation big data acquisition method, and the acquisition capability and the data processing real-time capability of the big data acquisition system are improved.

Description

technical field [0001] The invention relates to the fields of dynamic programming, parallel computing and grid computing, in particular to a large data collection system and method for high concurrent parallel computing. Background technique [0002] In the existing big data acquisition framework, web spider technologies such as Heritrix and PySpider are mainly used to capture Internet data in real time. Big data acquisition architectures are all lightweight, and the existing big data acquisition architectures do not have high real-time performance when sending Internet data to the data center. Contents of the invention [0003] In view of the above problems, the purpose of the present invention is to provide a high-concurrency parallel computing big data acquisition system and method to improve the acquisition capability of the big data acquisition framework and the real-time performance of data processing. [0004] In order to solve the above technical problems, the pre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/25G06F16/951
Inventor 晋彤李永康
Owner GUANGZHOU TEDAO INFORMATION TECH CO LTD