Check patentability & draft patents in minutes with Patsnap Eureka AI!

Method of HDFS Distributed and Centralized Hybrid Data Storage System Based on Hierarchical Governance

A hybrid data and storage system technology, applied in the field of information, to avoid computing resources and energy consumption, increase computing power and energy consumption, and avoid expansion pressure

Active Publication Date: 2020-07-07
上海孚典智能科技有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Aiming at the problems of vertical expansion capability and data governance existing in the existing HDFS distributed file system, the purpose of the present invention is to provide an efficient automatic data governance method to automatically move down the inactive data of HDFS to the NFS storage system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of HDFS Distributed and Centralized Hybrid Data Storage System Based on Hierarchical Governance
  • Method of HDFS Distributed and Centralized Hybrid Data Storage System Based on Hierarchical Governance
  • Method of HDFS Distributed and Centralized Hybrid Data Storage System Based on Hierarchical Governance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention is specifically realized as follows:

[0023] 1. Modify the namenode (the main control component for reading and writing files) of HDFS, and add a dynamic sampling mechanism and a storage location scheduling mechanism, such as figure 2 shown. Dynamic sampling occurs when the file user sends a read and write request to the namenode each time. The sampling module will record the read and write operations and the time of occurrence, and store the sampling records in the data table. Storage location scheduling is to locate different copies of HDFS files in two different storage devices in the metadata table, that is, a node in the physical server cluster running HDFS, or a back-end NFS device. When the user's read and write request is received by the namenode, it will first query the location of the copy in the metadata table, and then read and write according to the location data server node or NFS device;

[0024] 2. Dynamic sampling According to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a HDFS distributed and centralized hybrid data storage system based on hierarchical governance. A distributed file system HDFS is combined with a centralized storage network file system (NFS), the high-activity data (or called as hot data) is stored by the HDFS, and the low-activity data (or called as warm-cold data) is stored by the NFS; the low-activity file based on the user policy definition can be slow-released to the NFS system from the HDFS by providing the user customizable file storage positioning policy, and the corresponding space on the HDFS is released. Through the method provided by the invention, the storage source can be effectively managed and scheduled by the application manufacturer using the HDFS-based big data, thereby ensuring that the high-activity data is managed in a distributed storage way, and the high-concurrence correspondence is provided; and the low-activity data can be stored in the NFS way, and the resource waste caused by addingunnecessary computing resource by expanding the HDFS capacity (horizontal expansion) is avoided. The warm-cold data staying ahead in overall share of the data can be managed in a relatively cheap andsafe way, and the effective governance on the hierarchical data is realized.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a HDFS distributed and centralized hybrid data storage method based on data hierarchical management. Background technique [0002] In recent years, with the extensive development of big data applications, Hadoop-based computing framework has become one of the industry standard parallel computing environments. The distributed file system HDFS corresponding to Hadoop map-reduce has also become an industry-standard distributed storage system. Its multi-frame and erasure code mechanism can protect data in a more convenient way and provide high concurrency. . HDFS mainly relies on the disks of the computing nodes (computing server nodes) of the Hadoop cluster for storage, so it has the ability to expand horizontally, but it also restricts the expansion of storage capacity to a certain extent. Especially for scenarios that require vertical expansion capabilities, tha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/182G06F16/27
Inventor 赵继胜吴宇
Owner 上海孚典智能科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More