Distributed management method and system for metadata of large-scale storage system

A large-scale storage and distributed technology, applied in digital data processing, special data processing applications, file systems, etc., can solve problems such as statistics, slow traversal, limited metadata storage space, and unstable metadata service nodes , achieve high availability, solve the problem of single point of failure, and expand the effect of availability

Active Publication Date: 2019-09-27
INST OF INFORMATION ENG CAS
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

HDFS (Hadoop Distributed File System) is the most common distributed file system, but HDFS stores metadata in the memory of a single Namenode machine, which limits the size of metadata storage space and becomes a bottleneck of system performance
Ceph is a high-performance, high-availability, and high-expansion distributed file system. It proposes a dynamic subtree partition method that dynamically divides metadata according to load conditions, and dynamically migrates metadata on nodes with higher loads to nodes with lower loads. On the lower nodes, dynamic load balancing is realized, but when the number of small files is large, Ceph's expansion of multiple metadata service nodes is still unstable, and it cannot provide online smooth expansion services
GlusterFS uses the elastic hash algorithm to replace the metadata management service, and uses the file path and file name to calculate the storage location of the file, which fundamentally solves the system bottleneck problem caused by the metadata service, and the traversal efficiency is low. When a directory When there are many files under download, it will be very slow to run file scanning, statistics, traversal and other operations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed management method and system for metadata of large-scale storage system
  • Distributed management method and system for metadata of large-scale storage system
  • Distributed management method and system for metadata of large-scale storage system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention is based on the design idea of ​​HopsFS, and utilizes the NewSQL relational database to manage the metadata in the large-scale storage system. The present invention specifically proposes a distributed metadata storage management method based on Postgres-XL, by using the Postgres-XL distributed database to expand the storage space of metadata, and utilizing the characteristics of Postgres-XL to provide a flexible metadata fragmentation processing scheme.

[0019] The overall architecture diagram of the present invention is as follows figure 1 shown in figure 1 Among them, each component in the metadata service cluster is a component of the Postgres-XL distributed database. The global transaction manager ensures the transaction consistency of the entire cluster; the coordinator mainly coordinates and manages user sessions, parses and optimizes SQL statements; A node that stores user file data. Each component node in the metadata service cluster can...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed management method and system for metadata of a large-scale storage system. According to the method, metadata, stored in a NameNode memory, of an HDFS is abstracted into a structure of a two-dimensional table, and the structure of the two-dimensional table is stored in a distributed database in the form of the two-dimensional table; and the abstracted two-dimensional tables are associated with each other through an inode _ id. The Namenode becomes a bridge for the client to access the metadata, the client is firstly connected with the Namenode, the Namenode is used for operating the metadata in the distributed database, and the metadata is returned to the client. The single-point fault problem of the HDFS is solved.

Description

technical field [0001] The invention belongs to the technical field of distributed data storage, and in particular relates to a metadata distributed management and organization method of a large-scale storage system. Background technique [0002] With the rapid development of big data technology and applications, the Internet of Things, and cloud computing, the amount of data stored in a centralized manner can usually reach PB or even EB levels. Distributed file systems are a common solution for storing and managing large-scale data files. The distributed file system uses multiple machines to build a storage cluster, and the data storage capacity also increases linearly with the number of machines. To support the storage of large-scale data, in addition to hardware support, metadata management technology is also one of the essential key technologies. HDFS (Hadoop Distributed File System) is the most common distributed file system, but HDFS stores metadata in the memory of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/13G06F16/16G06F16/182
CPCG06F16/134G06F16/164G06F16/182
Inventor 吴广君李斌斌王树鹏贾思宇赵百强
Owner INST OF INFORMATION ENG CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products