Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

MPP engine-based cross-data center rapid query method and system

A cross-data center, query method technology, applied in the direction of database index, multi-dimensional database, database model, etc., can solve the problems that relational databases are not easy to expand, the problem domain is narrow, and cannot be expressed through SQL, so as to ensure global consistency and reduce analysis time, avoiding the effect of disk IO operations

Active Publication Date: 2017-08-18
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Hive and the relational database store files differently. Hive uses Hadoop's HDFS (Hadoop's distributed file system), and the relational database is the server's local file system; the computing model used by Hive is mapreduce, while the relational database is its own Designed computing model; relational database is suitable for real-time query business, while Hive is suitable for massive data mining; inherited from Hadoop, Hive is easy to expand storage scale and computing power, while relational database is not easy to expand
[0008] Because Hive uses SQL, its problem domain is narrower than that of Map-Reduce, because many problems cannot be expressed through SQL, such as some data mining algorithms, recommendation algorithms, image recognition algorithms, etc., which can only be completed by writing Map-Reduce

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • MPP engine-based cross-data center rapid query method and system
  • MPP engine-based cross-data center rapid query method and system
  • MPP engine-based cross-data center rapid query method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be further described below through specific embodiments and accompanying drawings.

[0042] The content of the present invention mainly includes the following aspects.

[0043] First, regarding the metadata identification, the present invention adopts a unified metadata identification, and utilizes the Hive metadata component to uniformly mark the data in different MPP engines. Including the storage structure and storage type to which the data table belongs. And use the corresponding MPP engine for fast query. The different MPP engines described in the present invention include Hive, Spark, HBase, etc., and the above-mentioned engines are integrated together in an MPP manner, centrally scheduled, and used in an MPP manner.

[0044] Second, in terms of data transmission, the present invention provides efficient and reliable data transmission through confirmation, retransmission and other mechanisms. In the data center, query through JDBC / ODB...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an MPP engine-based cross-data center rapid query method and system, belongs to the field of big data retrieval and analysis, and can be applied to real-time systems or offline backup systems. The method comprises the following steps of: uniformly marking data in different MPP engines and storing the data into metadata; receiving a query request by a global center point and carrying out grammar analysis on the query request, and sending the query request to corresponding data branch center nodes; carrying out query by each data branch center node through corresponding MPP engines of the metadata, and transmitting the data to the global center node; and rapidly querying the data returned by the data branch center points by the global center point by utilizing the MPP engines, and outputting a query result. According to the method and system, the union query of data of different data centers can be realized, a plurality of MPP engines are compatible, characteristics of different storage modes are sufficiently utilized to optimize the query, and the export of query result in a plurality of manners is supported, and convenience is brought to realize the different requirements, for result data re-analysis, of different upper applications.

Description

technical field [0001] The present invention relates to a cross-data center rapid query technology based on an MPP (Massively Parallel Processor) engine, in particular to metadata unified identification, reliable and fast data transmission, optimized query analysis engine and support for multiple result export The key technology of the method belongs to the field of big data retrieval. Background technique [0002] With the continuous popularization of network and information technology, the amount of data generated by human beings is increasing exponentially. It doubles approximately every two years, and according to monitoring, this rate will continue until 2020. This means that the amount of data generated by humans in the last two years is equivalent to the entire amount of data generated before. It can be predicted that the global data will reach 40ZB by 2020. The emergence of a large number of new data sources has led to the explosive growth of unstructured and semi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/22G06F16/24524G06F16/24532G06F16/24542G06F16/2455G06F16/248G06F16/283
Inventor 毕慧付戈李超王振宇李斌斌王树鹏
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products