Big-data parallel computing method and system based on distributed columnar storage

A distributed columnar and parallel computing technology, applied in the field of big data processing, can solve problems such as slow computing speed, reduce time consumption, improve data query efficiency, and ensure real-time query analysis.

Inactive Publication Date: 2017-11-07
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In addition, traditional serial computing can no longer meet the needs of real-time query and analysis of big data, because the serial computing method requires tasks to be performed one by one in chronological o

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big-data parallel computing method and system based on distributed columnar storage
  • Big-data parallel computing method and system based on distributed columnar storage
  • Big-data parallel computing method and system based on distributed columnar storage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The present invention will be further described below in conjunction with specific examples.

[0038] The large data parallel computing method and system based on distributed columnar storage provided by this embodiment fully utilizes the processing performance of the cluster cloud server memory query and the advantages of columnar storage, and avoids the need to directly read HDFS file system data when querying. The resulting delay problem and the redundant data transmission problem caused by row storage greatly improve the data reading efficiency. In addition, the solution uses a Spark-based parallel computing framework on top of NoSQL-based columnar storage to further improve the efficiency of real-time query analysis through parallel computing. At the same time, due to the scalability of distributed clusters, the distributed architecture can meet the elastic and scalable requirements of massive data storage. The hierarchical structure of this program is as follows ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a big-data parallel computing method and system based on distributed columnar storage. Data which is most often accessed currently is stored by using the NoSQL columnar storage based on a memory, the cache optimizing function is achieved, and quick data query is achieved; a distributed cluster architecture, big data storing demands are met, and the dynamic scalability of the data storage capacity is achieved; combined with a parallel computing framework based on Spark, the data analysis and the parallel operation of a business layer are achieved, and the computing speed is increased; the real-time data visual experience of the large-screen rolling analysis is achieved by using a graph and diagram engine. In the big-data parallel computing method and system, the memory processing performance and the parallel computing advantages of a distributed cloud server are given full play, the bottlenecks of a single server and serial computing performance are overcome, the redundant data transmission between data nodes is avoided, the real-time response speed of the system is increased, and quick big-data analysis is achieved.

Description

technical field [0001] The present invention relates to the technical field of big data processing, in particular to a large data parallel computing method and system based on distributed columnar storage. Background technique [0002] The rapid development of the Internet and the continuous upgrading and replacement of hardware have caused the data scale of various units such as governments and enterprises to show explosive growth, and gradually move towards massive data. Faced with the storage and processing requirements of massive data, traditional relational databases are mainly based on the operation of tables and data rows, which has gradually failed to meet user needs, and even restricts the storage and processing of massive data. Therefore, relying solely on traditional storage technology cannot meet the development and needs of the times. It is necessary to establish a new big data storage technology based on traditional processing technology to ensure that data sto...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F9/50
CPCG06F9/5088G06F16/2219G06F16/24532
Inventor 张星明陈霖王昊翔梁桂煌古振威吴世豪
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products