Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for optimizing HDFS storage structure

An optimization method and storage structure technology, which can be applied to the generation of response errors, error detection of redundant data in operations, special data processing applications, etc., and can solve problems such as hash conflicts

Active Publication Date: 2018-04-20
CHENGDU RAJA NEW ENERGY AUTOMOTIVE TECH CO LTD
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] 1. There is a lot of duplicate data
[0019] 2. In terms of eliminating redundant data, the hash function is generally used to judge data duplication, but due to hash conflicts, different blocks have the same Hash value

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for optimizing HDFS storage structure
  • Method for optimizing HDFS storage structure
  • Method for optimizing HDFS storage structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0075] The technical solution of the present invention will be further described in detail below in conjunction with the accompanying drawings, but the protection scope of the present invention is not limited to the following description.

[0076] In one embodiment, the environment is set up as follows:

[0077] The hardware environment of the cluster has been introduced earlier. The configuration of the cluster is: four servers, one of which is the master node, and slave1, slave2, and slave3 are three data nodes. The detailed installation steps are as follows:

[0078]

[0079]

[0080] Run the jps command to check the startup status.

[0081] Base is the most widely used NoSQL database on the Hadoop platform, featuring columnar storage, random read and write, load balancing, and dynamic expansion. The data management system uses HBase to store index tables and metadata storage, which can effectively avoid frequent disk access and maintain a high read and write rate. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for optimizing an HDFS storage structure. The method includes the first step of performing fingerprint calculation on data blocks divided by data files respectively; the second step of using a Hash function to complete fingerprint matching, and if the same value occurs, determining that the block repeats; the third step of causing the repeated blocks to store corresponding indexes, and causing new data blocks to perform fingerprint storage and update processes; the fourth step of updating metadata information of the files; the fifth step of calculating a Hash value through a CubeHash function, and introducing a keyword extraction strategy, feature vector weight calculation and cosine coefficient method to perform data sameness and similarity judgments; andthe sixth step of deleting repeated data according to labels. The method for optimizing the HDFS storage structure is reasonable in design, label deduplication is achieved, and the HDFS storage structure is optimized.

Description

technical field [0001] The invention relates to an optimization method of an HDFS storage structure. Background technique [0002] HDFS is an open source implementation based on the GFS distributed file system. Therefore, HDFS has the characteristics of GFS. GFS was developed by Google. It has strong fault tolerance and excellent scalability, and is widely used in applications that efficiently store and read massive distributed data. HDFS can be understood essentially by deeply analyzing the characteristics and principles of GFS. Typically, a file system cluster consists of a Master with multiple Chunkservers, which can be accessed by multiple Clients. When the Client sends a request to store a file, it first divides the file into fixed-size Chunk blocks; then, the Master assigns a unique identifier Chunk Unicode to each Chunk block; finally, the Chunk block is stored on the local disk, and according to the corresponding Chunk Unicode and byte ranges implement chunkserver...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F11/14
CPCG06F11/1448G06F16/137G06F16/174G06F16/182
Inventor 何鑫
Owner CHENGDU RAJA NEW ENERGY AUTOMOTIVE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products