Data acquisition system

A data acquisition system and data technology, applied in the direction of network data retrieval, network data query, and other database retrieval, etc., can solve problems such as inconvenient data source management, excessive data volume, and affecting data effectiveness, and reduce configuration complexity , reduced learning costs, and low performance requirements

Pending Publication Date: 2020-11-24
中央广播电视总台
View PDF10 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Faced with data sources in multiple formats, flume configuration is complex and inconvenient for data source management. MAPREDUCE calculation will affect the effectiveness of data when reading data on HDFS due to the large amount of data, and when Hive query requires correlation query, Hive cannot support

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data acquisition system
  • Data acquisition system
  • Data acquisition system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] figure 1 A schematic structural diagram of the data acquisition system in Embodiment 1 of the present application is shown.

[0024] As shown in the figure, the data collection system includes: collection data service module, KAFKA message queue, SPARK distributed processing calculation module, and Elasticsearch full-text search engine middleware, wherein,

[0025] The data collection service module is developed by python language, and uses the pre-packaged driver class for different data sources to call the data in the form of parameters and send it to the designated partition of the KAFKA message queue after processing;

[0026] The KAFKA message queue includes multiple partitions for storing different types of data;

[0027] The SPARK distributed processing calculation module uses spark streaming to poll and calculate the data in KAFKA, and writes the calculated data into the Elasticsearch full-text search engine middleware;

[0028] The Elasticsearch full-text sea...

Embodiment 2

[0061] In order to facilitate the implementation of the present application, the embodiment of the present application is described with a specific example.

[0062] The embodiment of the present application provides a set of system for TV station new media data processing, including data collection service, KAFKA message queue, SPARK distributed processing calculation, Elasticsearch full-text search engine middleware.

[0063] 1. Data collection service

[0064] figure 2 A schematic diagram of the principles of the data collection service in Embodiment 2 of the present application is shown.

[0065] As shown in the figure, the data collection service in this embodiment of the application is developed using language. Python allows developers to focus on programming objects and thinking methods without having to worry about external factors such as syntax and types. Its clear and concise syntax also makes it much easier to debug than Java. For operations related to kafka, f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data acquisition system. The system comprises a data acquisition service module, a KAFKA message queue, an SPARK distributed processing and calculating module and Elasticsearch full-text search engine middleware, wherein the SPARK distributed processing and calculating module is connected with the Elasticsearch full-text search engine middleware, the data acquisition service module is developed through a python language, calls data in a parameter form by utilizing driver classes pre-packaged for different data sources, processes the data and sends the data to a specified partition of a KAFKA message queue, the SPARK distributed processing and calculating module is used for polling the data in the KAFKA by adopting spark streaming and writing the calculated data into the Elasticsearch full-text search engine middleware, and the Elasticsearch full-text search engine middleware is used for storing data and creating an index for the data. The system is advantagedin that compared with a mainstream HADOOP platform system, the system is lighter and more flexible.

Description

technical field [0001] The present application relates to broadcasting and television technology, and in particular, relates to a data collection system. Background technique [0002] With more and more new media data sources and more and more forms provided, the TV station’s new media data processing capacity is extremely large, with millions of pieces of data every day and high real-time performance is required, which is close to real-time monitoring account status. After the data processing is completed, it is also required to be able to match highly relevant articles through keywords and make statistics. This requires the data processing system to be able to accept the challenges of multiple data sources, large amount of data and high real-time requirements. [0003] The existing big data technology architecture usually uses the big data processing architecture of flume+kafka+mapreduce+hive. This technology builds complex clusters on the distributed infrastructure, con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/54G06F16/182G06F16/953
CPCG06F9/546G06F16/182G06F16/953G06F2209/547G06F2209/548
Inventor 李伟男王雪京苏超王鑫乔立新
Owner 中央广播电视总台
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products