ETL system and method completely based on distributed type memory computation

An in-memory computing and distributed technology, applied in the field of ETL systems, can solve problems such as inability to support distributed storage systems, insufficient architecture flexibility and flexibility, and inability to release memory, so as to simplify data management, improve data processing performance, and improve flexibility Effect

Inactive Publication Date: 2018-07-20
广东奡风科技股份有限公司
View PDF4 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are many deficiencies in the architecture of Hadoop+Spark. 1. Spark has obvious defects in the logic of memory application, which is reflected in the fact that it cannot be released when the memory is insufficient, and it can only start suicide to end the task.
Memory consumers cannot ask other memory consumers t

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • ETL system and method completely based on distributed type memory computation
  • ETL system and method completely based on distributed type memory computation
  • ETL system and method completely based on distributed type memory computation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] The ETL system based entirely on distributed memory computing is an ETL product built on a memory-centric virtual distributed storage system and distributed memory parallel computing technology. This system runs Spark and Alluxio in standalone mode, uses Alluxio to build the basic support and storage platform, Alluxio supports memory or local storage such as SSD and HDD to store data; uses Spark core components to build a data processing framework, and uses Spark's advanced DAG execution engine And powerful memory-based multi-round iterative computing technology to deeply process the source data.

[0034]The system uses Scala programming, Scala is an object-oriented static functional programming language running on the JVM, which has the characteristics of fast speed, concise API, and easy integration with Alluxio and YARN. The Spark kernel is developed by the Scala language. This system is perfectly combined with Spark and directly reaches the Spark kernel, which impro...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Provided is an ETL system completely based on distributed type memory computation. The system comprises five functional modules, namely the data extracting module, the data processing module, the dataintegrating module, the data outputting module and the metadata managing module, and one ETL operation procedure engine. The metadata managing module outputs a metadata control file to be used by theETL operation procedure engine. The ETL operation procedure engine reads the metadata control file, computes the number of the layer where each node is located, and conducts layering on all the operation nodes according to the layer numbers. The ETL operation procedure engine sequentially operates ETL operations of all the layers according to the executing route from the lower layer to the higherlayer and finally completes the execution of all the ETL operations in an ETL operation procedure diagram. By means of the system, based on the Spark and Alluxio technology, the elasticity of the system frame is improved, data management is simplified, and the data processing performance is improved.

Description

technical field [0001] The present application relates to an ETL system, in particular, it belongs to an ETL system completely based on distributed memory computing and a method thereof. Background technique [0002] The explosive growth of data and the development of big data applications have provided unprecedented development opportunities for ETL (data extraction, transformation and loading) software. Traditional ETL software is mostly based on a stand-alone architecture. When processing massive data, there are bottlenecks in IO throughput and system resources, and expansion is difficult and expensive. With the emergence of distributed technology, a new generation of ETL software based on Hadoop and Spark uses the Hadoop distributed file storage system HDFS as the storage layer, uses the distributed memory computing framework Spark as the computing layer, and utilizes advanced DAG execution engines and powerful performance The memory-based multi-round iterative computin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/182G06F16/215G06F16/2445G06F16/254G06F16/258
Inventor 陈涛黄卓凡张志聪李笋林志广
Owner 广东奡风科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products