Storage system based on Hadoop distributed computing platform

A distributed computing and storage system technology, applied in the file system field of Hadoop distributed computing platform, can solve the problems of performance degradation, large number of small files, etc.

Inactive Publication Date: 2013-01-30
SUZHOU LIANGJIANG TECH
View PDF1 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a storage system based on the Hadoop distributed computing platform, which solves the problems in the prior art that the number of small files of the Hadoop distributed computing platform is too large to cause obvious performance degradation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Storage system based on Hadoop distributed computing platform
  • Storage system based on Hadoop distributed computing platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0028] Such as figure 1 As shown, this embodiment adds a small file processing module, a file type judgment module and a timing module on the basis of the original HDFS. Wherein said file type judging module is used to judge whether the file that user uploads belongs to small file; When the file size that user uploads is less than the block of HDFS file system, file type judging module judges that file is a small file, otherwise file type judging module judges file Be big file; Timing module, by setting timer timing, when reaching predetermined period, carry out statistics the size of small file sequence in small file processing module, judge whether the size of small file sequence is greater than the piece of HDFS file system; Small file processing module, for Store each small file as a Record in the SequenceFile class to form a small file queue; when the timing module judges that the size of the small file sequence is larger than the block of the HDFS file system, the file n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a storage system based on a Hadoop distributed computing platform. The storage system comprises an HDFS (Hadoop Distributed File System) general file processing module, a file type judging module, a small file processing module and a timing module, wherein the file type judging module is used for judging whether files uploaded by a user belongs to small files; the timing module carries out timing through setting a timer, accounts the size of medium-small file sequences of the small file processing module when the preset period is reached, and judges whether the size of the small-file sequence is more than a block of an HDFS file system; and the small file processing module is used for storing each small file as a Record in a Sequence File class to form a small file queue. According to the storage system, the quantity of the small files in an HDFS is reduced, and the property of reading the files in the HDFS is effectively improved.

Description

technical field [0001] The invention belongs to the technical field of Hadoop distributed computing platform file system, in particular to a storage system based on Hadoop distributed computing platform. Background technique [0002] Hadoop Distributed File System, HDFS for short, is a distributed file system. HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost (low-cost) hardware. And it provides high throughput to access application data, suitable for applications with very large data sets. HDFS relaxes the requirements of POSIX so that the data in the file system can be accessed in the form of streaming (streaming access). HDFS was originally created for the infrastructure of the open source apache project nutch, HDFS is part of the hadoop project, and hadoop is a part of lucene. [0003] As the amount of data to be processed by enterprises is increasing, the idea of ​​MapReduce is getting more and more attention. Hadoop is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 陈国庆钱扬帆
Owner SUZHOU LIANGJIANG TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products