High-performance file storage and management system based on HDFS

A file storage and management system technology, applied in the field of high-performance file storage and management systems, can solve the problems of reduced direct reading efficiency, complex implementation, and high usage rights, avoiding waste of storage capacity, improving retrieval efficiency, and improving reading The effect of taking speed

Pending Publication Date: 2020-01-03
GUANGDONG UNIV OF TECH
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are two problems with HAR file archiving: reading a file requires traversing two layers of indexes, which may be less efficient than direct reading. Second, once a file is created, it cannot be modified. If you want to add or delete a merged file, you only need Can create new files again
Some of these methods are suitable for speci

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-performance file storage and management system based on HDFS
  • High-performance file storage and management system based on HDFS
  • High-performance file storage and management system based on HDFS

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] A high-performance file storage and management system based on HDFS, such as figure 1 , including HDFS client, file preprocessing module, file merging module, prefetching cache module, defragmentation module and data encryption module, where:

[0053] The file preprocessing module receives the file uploaded by the client and performs size calculation and type judgment on the uploaded file, and creates metadata information of the file, and divides the uploaded file into small files, medium files and large files according to the size of the uploaded file, Small files and medium files are passed into the file merge queue to wait for the file merge module to read, and large files are directly stored in the HDFS cluster;

[0054] The metadata information includes file name name, file type type, file size size and creation date time, as follows:

[0055]

[0056]

[0057] The type determination is to classify files according to the file extension type, specifically:

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a high-performance file storage and management system based on an HDFS. A file preprocessing module and a file merging module are provided for overcoming the defect that an HDFS(Hadoop Distributed File System) supports small file storage, a pre-fetching cache module is provided for improving the file reading efficiency, a fragmentation module is provided for improving the overall space utilization rate, and a data encryption module is provided for protecting private files of a user. The invention provides a universal small file merging strategy and an HDFS client optimization scheme, and combines the advantages of an HAR file archiving method and a MapFile method so as to be suitable for any type of files. Specifically, the system optimizes a small file merging strategy and provides a defragmentation mechanism which is easy to update so as to improve the space utilization rate of an HDFS cluster. Meanwhile, a hotspot file strategy based on frequency statistics improves the reading speed of data, and improves the safety of a user account and a private file based on MD5 login encryption and AES file encryption.

Description

technical field [0001] The present invention relates to the field of computer science, and more specifically, relates to an HDFS-based high-performance file storage and management system. Background technique [0002] With the popularization and application of the mobile Internet, global data information is growing explosively, and emerging industries such as artificial intelligence, machine learning, and cloud computing are developing rapidly. The development of traditional computer storage systems can no longer meet the needs of large amounts of data generation and application. At present, the commonly used distributed file systems include Hadoop, Lustre, MogileFS, GoogleFS, etc. Hadoop has a series of advantages such as high reliability, strong scalability, fast storage speed, and high fault tolerance, and HDFS is exactly Distributed file system for Hadoop. [0003] However, the calculation and storage of big data is not the only problem faced by the huge information flo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/11G06F16/182
CPCG06F16/11G06F16/182
Inventor 吴宗泽张兴斌李建中梁泽逍李俊彬黄婷婷
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products