Row and column storage method and system of tree-shaped data

A columnar storage and row-based technology, applied in the field of data processing, can solve the problems of insufficient coding and query efficiency, low system function and use efficiency, etc., and achieve the effect of improving efficiency

Active Publication Date: 2017-08-18
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] 5. In the key-value pair mentioned in 2 above, the value of the key can only be of type (string)
[0045] 2) NoSQL data processing system is not efficient en

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Row and column storage method and system of tree-shaped data
  • Row and column storage method and system of tree-shaped data
  • Row and column storage method and system of tree-shaped data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0105] In view of the above shortcomings of the prior art, the present invention redesigns and implements a semi-structured data processing system STEED. The following introduces the overall architecture of the STEED system and briefly introduces the functional requirements of each module, then analyzes the interface definitions between these modules, and briefly explains how STEED internally processes and stores data.

[0106] Such as image 3 As shown, STEED is mainly composed of three modules:

[0107] (1) Data analysis module:

[0108] Read text data, parse it into row or column binary format data, and store it in the data storage module. In the process of data analysis, a syntax tree is dynamically generated to store the definition of semi-structured data. When parsing data in JSON format, because it does not define the corresponding data format (syntax tree, schema tree), the present invention can only dynamically generate the definition of the data format in the process of p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a row and column storage method and system of tree-shaped data. The method supports reading and analyzing text data in a tree-shaped structure into a row or column binary format for storage, wherein in an analyzing process, a syntax tree is generated dynamically, and a definition of semi-structured data is stored; in an inquiry process, STEED is configured to complete relevant inquiry operations according to relevant structure information of the original data read in the syntax tree through combination with the contents in the binary data. In the row storage structure, recording is taken as a unit, and a nested substructure is defined internally to represent nesting and a repeating field of the semi-structured data; in the column storage, every path from a root to a leaf in the syntax tree is taken as a unit, and the value of the path in all records and structure information of the path are independently stored. According to the method and the system, through the analysis on the storage structure of the semi-structured data, the data storage structure is simplified, and the storage efficiency is improved.

Description

Technical field [0001] The present invention relates to the technical field of data processing, and in particular to a row and column storage method and system for tree data. Background technique [0002] With the development of computer networks and big data processing technology, traditional relational data has become increasingly unable to meet the requirements for data definition and use in the network and big data environment, while semi-structured data represented by JSON and Protocol Buffers has It not only can fully express the object data in the programming language, but also can modify and expand the original data format according to the change of the data format, so it is widely used in the actual environment. [0003] Definition of tree structure data: [0004] T value = T primitive |T object |T array [0005] T primitive =string|number|boolean|null [0006] [0007] [0008] Record=T object [0009] As shown above, the tree structure data is defined as follows: [0010...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/80
Inventor 陈世敏王智义
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products