Mixed storage system and mixed storage method for supporting Hive DML (data manipulation language) enhancement

A hybrid storage, the only technology, applied in the direction of instrumentation, computing, electrical digital data processing, etc., can solve the problem of data change that cannot be solved, and does not involve enhanced SQL interface DML operations, etc., to achieve efficient data change operations and efficient data query The effect of the operation

Inactive Publication Date: 2014-03-26
INST OF COMPUTING TECH CHINESE ACAD OF SCI +1
View PDF7 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] All the Hive-related optimization work and related systems described above do not involve DML operations in the enhanced SQL interface, and cannot solve the data change problems encountered in enterprise-level big data processing scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed storage system and mixed storage method for supporting Hive DML (data manipulation language) enhancement
  • Mixed storage system and mixed storage method for supporting Hive DML (data manipulation language) enhancement
  • Mixed storage system and mixed storage method for supporting Hive DML (data manipulation language) enhancement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments, but not as a limitation of the present invention.

[0062] Firstly, the terminology related to the system is introduced.

[0063] The formal description of DualTable is: DualTable:=. Among them, ID Mechanism is a mechanism to maintain the unique ID of each row of data. The unique ID of a data row is used to link two storage systems; Master Table (main table) is a storage system used to support efficient sequential reading; Attached Table (attached table) It is a storage system for storing changed data; Operation Set is an operation set provided by DualTable, defined as: Operation Set:=Union Read (read)|UPDATE (update)|DELETE (delete)|INSERT INTO (insert)|CREATE (table creation)|DROP (table deletion)|LOAD (data import)|COMPACT (data merging); Cost Model is a cost model that supports the implementation of read and write operations.

[0064] The...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mixed storage system and a mixed storage method for supporting Hive DML (data manipulation language) enhancement. The storage system is established on the basis of a Dual Table and comprises a Dual Table establishing module and a Dual Table actuating module, wherein the Dual Table establishing module is used for establishing the Dual Table which comprises a main table and a secondary table, maintains unique ID (identification) of each row of data and provides a specific operation set for the external on the basis of a cost model; and the Dual Table actuating module is used for reading and querying data in the Dual Table on the basis of the main table and the secondary table of the Dual Table, changing data in the Dual Table on the basis of the secondary table of the Dual Table, implementing the operation set provided by the Dual Table on the basis of the cost model, and combining the main table and the secondary table when data are read on the basis of the unique ID of each row of ordered data.

Description

technical field [0001] The invention relates to the development and realization of a hybrid storage system that enables Apache Hive to efficiently implement DML (Data Manipulation Language) operations in a big data environment, and in particular relates to a hybrid storage system that supports Hive DML enhancement. Background technique [0002] Hive provides a SQL-like interface for Hadoop-based data analysis, which reduces the amount of MapReduce development and facilitates the conversion from the existing relational data warehouse that provides SQL interfaces to the Hadoop ecosystem. Hive can map the user-defined data table schema to the underlying data storage, and implement data query and operation based on MapReduce. However, due to weak DML support, Hive cannot give full play to its performance in enterprise-level big data processing. Academia and industry have done a lot of Hive optimization work, including query scheme optimization, execution system optimization, st...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2282
Inventor 黄硕虎嵩林梁英谷丹阳吴凯锋李祥珍洪建光张春光肖政裴旭斌衡星辰崔蔚
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products