A distributed big data computing engine and an architecture method

A computing engine and distributed technology, which is applied in computing, digital data processing, program control design, etc., can solve the problems of fixed time granularity of built-in aggregation, inability to support larger granularity aggregation, and poor throughput, etc., to achieve development and deployment as well as Low maintenance cost, less component dependencies, and good real-time effects

Active Publication Date: 2019-01-11
北京博睿宏远数据科技股份有限公司
View PDF3 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the existing technology, the big data processing engine is completely based on memory computing, and its throughput is not as good as traditional batch computing frameworks, such as Spark, MapReduce, etc.; and its built-in aggregation time granularity is fixed and cannot be changed, nor can it support larger granularity aggregation above the sky granularity; The support for MQ is limited to Kafka, and other MQs can be considered in the future; in addition, the existing technology is only suitable for structured timing index data processing, and does not support other scenarios such as unstructured big data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed big data computing engine and an architecture method
  • A distributed big data computing engine and an architecture method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Such as figure 1 As shown, the distributed data engine framework is mainly composed of five parts: distributed coordination service cluster (ZooKeeper), message middleware cluster (Kafka), streaming computing cluster (Storm), data cache cluster (Redis), and visual control. ZooKeeper, Kafka, Storm, Redis, etc. are currently popular open source components. in,

[0045] Distributed coordination service cluster: ZooKeeper is a distributed application coordination service that provides efficient and reliable distributed coordination services for distributed applications, and provides distributed basic services such as configuration services, distributed synchronization, and node monitoring. In the distributed data engine, in addition to maintaining the state of the Kafka cluster and Storm cluster, we also use it to save related plug-ins and business schema.xml configuration files. Realized that the plug-in and schema.xml can take effect dynamically without restarting the t...

Embodiment 2

[0061] Such as figure 2 As shown, the implementation of the architecture includes the following steps:

[0062] (1) Clarify the source data format. Since the architecture itself has no restrictions on the format of the source data, the data sent to this architecture needs to be packaged in a unified format and marked with a data timestamp.

[0063] (2) Configure the Schema.xml file of the specific processing rules of each business data in the data source, and the operation and processing rules of all data indicators and dimensions are described by this file.

[0064] (3) Develop the data preprocessing plug-in by implementing the provided data preprocessing plug-in interface class, which runs in the data preprocessing topology and is responsible for implementing a specific cleaning strategy for each piece of raw data. When developing this plug-in, it is necessary to perform a cleaning strategy on each raw data received according to the configuration items in Schma.xml, and th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed big data computing engine and an architecture method. The computing engine includes: a distributed coordination service cluster, which is used to provide coordination service for distributed applications, stores schema configuration files of related plug-ins and business database objects; a message intermediate cluster for transmitting different types of service data, the service data comprising raw data, calculation results, snapshot data, baseline data and alarm data; a streaming computing cluster, which is based on Storm bottom computing framework and is used to abstract the processing process of temporal index big data into several processes; a visual control module, which is used to display and manage the data through the way of web; a data cachecluster used to reduce the memory overhead of the streaming computing cluster in large-batch computing. The invention can help the enterprise with less accumulation of big data technology, or the project team with shortage of project cycle and manpower can realize online streaming processing of massive time series index data conveniently and quickly.

Description

technical field [0001] The invention relates to a computing engine architecture, in particular to a distributed big data computing engine and an architecture method. Background technique [0002] At present, more and more companies have realized the importance of big data for their future development, so they have begun to use and gradually rely on big data processing related technologies. However, as more and more data needs to be processed, business scenarios are becoming more and more complex, and many problems have been encountered in the actual implementation process, such as the shortage of big data talents, resulting in high labor costs, lack of precipitation in related technologies, and difficulty in cultivating a team in the short term. At the same time, the business requirements of different business departments are various and different, which leads to repeated development of different project codes, repeated wheel building, and various technical architectures of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/54H04L29/08
CPCG06F9/546H04L67/02
Inventor 程捷张念礼罗俊
Owner 北京博睿宏远数据科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products