Mass small file access optimization method based on Ceph

A technology of mass small files and optimization methods, which is applied in the field of distributed file storage, can solve the problems of low storage and reading efficiency, and achieve the effects of improving hit rate, storage efficiency and reading efficiency

Active Publication Date: 2018-10-26
GUILIN UNIV OF ELECTRONIC TECH
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] What the present invention is to solve is that Ceph has the problem of low storage and reading efficiency when process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass small file access optimization method based on Ceph
  • Mass small file access optimization method based on Ceph
  • Mass small file access optimization method based on Ceph

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in combination with specific examples and with reference to the accompanying drawings.

[0038] A Ceph-based access optimization method for massive small files. When users store files, first use the K-means clustering algorithm to obtain associated groups of small files, and then sort the files in each group in descending order , and then merge the associated files in the associated group and store them in Ceph. When a user initiates an access request, the system first checks whether the requested file is in the cache, and if so, directly reads and returns the requested file; otherwise, sends the request information to the Ceph cluster to read small files and merge them according to the requested file and its location Prefetch and cache small files based on the correlation between other small files, and then ret...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mass small file access optimization method based on Ceph. When a user stores files, correlation groups of small files are obtained through utilization of a K-means clusteringalgorithm. The files in each group are sequenced according to a descending order. Correlation files in the correlation groups are combined and then are stored to the Ceph. When the user initiates anaccess request, a system checks whether requested files are in a cache or not. If the requested files are in the cache, the requested files are directly read and returned. If the requested files are not in the cache, request information is sent to a Ceph cluster. The small files are read. The small files are pre-obtained and cached according to a utilization rate and a correlation ratio among fileblocks, the requested files are returned, and the small files are pre-obtained. According to the method, the interaction between the user and the cluster is reduced, the user access time is reduced,the access efficiency of the mass small files is improved, and the integrated performance of the system is improved.

Description

technical field [0001] The invention relates to the technical field of distributed file storage, in particular to a method for optimizing access to massive small files based on Ceph. Background technique [0002] With the rapid development of cloud computing and big data, the amount of global data is increasing exponentially. Traditional storage systems cannot gradually meet people's storage needs due to factors such as equipment costs and maintenance costs. In addition, with the increasing number of small files, most distributed storage systems can no longer meet the needs of efficient storage and reading of small files. How to solve the storage and management problems of massive small files and improve the storage and access efficiency of small files is the biggest challenge now. [0003] Ceph is a distributed file system, which can realize efficient storage and management of files when processing large files. However, Ceph still has some shortcomings when storing a large...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 王勇陆小霞叶苗郇宜鸣
Owner GUILIN UNIV OF ELECTRONIC TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products