Cloud environment data storage optimization method based on HDFS

A technology of data storage and optimization method, which is applied in the direction of data error detection, electrical digital data processing, data processing input/output process, etc., which can solve the problem of poor data storage and reading performance, container The node is down, and the differences of each node are not considered.

Inactive Publication Date: 2020-09-25
汪礼君
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are two problems in the existing container environment: First, HDFS cluster nodes based on Docker containers are prone to extinction, and the data in the cluster will cause container nodes to crash due to external or internal factors, reducing the reliability of data storage; Once a physical machine with a large number of cluster nodes fails, it will cause a large amount of data loss in the cluster; if the Name Node node fails, it will make the entire cluster data unreadable and reduce the reliability of data storage
Secondly, in the cloud environment, the existing HDFS storage strategy is to randomly select Data Node nodes for data block copy storage and assume that each Data Node node is isomorphic, without considering the differences of each node in the cluster, even if some of them When the available storage space of the nodes is small or the performance affecting data storage and reading is poor, data block copies will also be stored in these nodes, which reduces the performance of data storage

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cloud environment data storage optimization method based on HDFS
  • Cloud environment data storage optimization method based on HDFS
  • Cloud environment data storage optimization method based on HDFS

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0110] It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

[0111] On the basis of ensuring the reliability of data storage, the data in the cloud environment is divided to achieve flexible storage and backup of data. Reference figure 1 As shown, it is a schematic diagram of an HDFS-based cloud environment data storage optimization method provided by an embodiment of the present invention.

[0112] In this embodiment, the HDFS-based cloud environment data storage optimization method includes:

[0113] S1. Put the metadata and storage data of each node of the HDFS cluster into the pre-created data volume container, obtain the IP address of each node in the cluster and notify it to all the remaining nodes in the cluster, so as to realize the normal establishment of the cluster and Communication.

[0114] First, the present invention imports the HDFS cluster into a Docker containe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of data storage, and discloses a cloud environment data storage optimization method based on an HDFS, which comprises the following steps: putting metadataand storage data of each node of an HDFS cluster into a pre-created data volume container; calculatingavailable storage space evaluation values of the HDFS cluster nodes in the physical machines andthe data volume containers respectively; calculating an availability value of each physical machine, and calculating a performance evaluation value of the HDFS cluster node based on the physical volume container according to the availability value of each physical machine; storing the data blocks by using a data storage copy placement algorithm; calculating a data attribute information gain ratioof the to-be-stored data; dividing to-be-stored data blocks in the HDFS cluster by utilizing a KADC-KNN algorithm based on an information gain ratio and weighting; and for the to-be-stored data blocksdivided by the KADC-KNN algorithm based on the information gain ratio and the weighting, storing the to-be-stored data blocks into a Feeder HDFS cluster according to different storage strategies. According to the invention, the optimization of data storage is realized.

Description

Technical field [0001] The invention relates to the technical field of data storage, in particular to a method for optimizing cloud environment data storage based on HDFS. Background technique [0002] In today's cloud environment, data has exploded in diversified forms. After being processed, calculated, and stored, these data will exist in all aspects of life as a social value. How to reliably store these data to retain its value has become a major research topic in the data age. [0003] On the premise of cloud computing, cloud storage is a new concept derived from it. Cloud computing has become a popular new computing model in the world today due to its super-large scale, virtualization, high reliability, versatility, and extremely low cost. When the cloud computing system requires a large number of storage devices to store a large amount of different types of data for processing and computing, a cloud storage system is derived from the cloud computing system. Cloud storage ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F3/06G06F11/14
CPCG06F3/0614G06F3/0644G06F3/067G06F11/1458
Inventor 汪礼君
Owner 汪礼君
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products