A mass data storage method based on file granularity

A technology for massive data and files, which is applied in the research field of massive data storage management to achieve the effects of data organization and management scheduling, system fault tolerance, and improved refinement.

Active Publication Date: 2018-02-16
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The metadata in Hive does not describe specific files, but only supports data directory level

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A mass data storage method based on file granularity
  • A mass data storage method based on file granularity
  • A mass data storage method based on file granularity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the purpose, technical solution and advantages of the present invention clearer, the hierarchical and segmented backup data organization and management method according to an embodiment of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0037] According to the first aspect of the present invention, a massive metadata management model based on file granularity is provided. Such as figure 1As shown, the data model includes three aspects: physical elements, logical elements, and business elements. The physical elements include three elements: storage devices, nodes, and files. Storage devices are abstractions for physical storage devices such as disks and disk arrays. A node is an abstraction for a physical host or a virtual host, and a file refers to a data file stored on a storage device of a certain node; logical elements include databases, tables, views, indexes, and partitions, and datab...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a massive data storage method based on file granularity. The massive data storage method comprises the following steps of (1) dividing a data storage cluster into a plurality of partitions, wherein each partition is provided with a partition value; (2) creating a business data sheet for the record of each department, and setting a partitioning rule for the records of each business data sheet; (3) for each record in the to-be-stored business data, storing into a file of the corresponding partition according to the number and a partitioning rule, and creating an index file; storing the number of the record, the path of the file, the number of a storage node, and the number of storage equipment into a metadata file; furthermore, creating a view between the business data sheets; according to the metadata file, separating the business data sheet, the view, the record partition and the index information belonging to the same business scene into the same database, so as to obtain a massive metadata management model. The massive data storage method has the advantages that the data management accuracy is improved, and the data division and organization flexibility is improved.

Description

technical field [0001] The present invention relates to a massive data storage method based on file granularity, in particular to a massive data metadata management model that is compatible with the Hive metadata model, uses files as the underlying management basic unit, and supports state management in the hadoop ecosystem Realize the plan. It belongs to the research field of massive data storage management. Background technique [0002] According to IDC's research in the past five years, the global data volume doubles approximately every two years; in 2010, the global data volume entered the ZB era, and it is estimated that the global data volume will reach a terrifying 35ZB by 2020. As netizens become more and more involved in Internet products and applications, the Internet will become more intelligent, and the amount of data on the Internet will grow explosively. The era of big data has arrived. Such a huge amount of data has brought great challenges to the data stora...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/122G06F16/13G06F16/182
Inventor 王振宇王树鹏王勇王曦
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products