Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Massive small file storage performance optimization method and system based on time series prediction

A massive small file and time series technology, applied in the field of information storage, can solve problems such as ignoring load timing characteristics, massive small file storage performance (writing and reading performance limitations, etc.), to improve file access performance and reduce seek time , the effect of accelerating the training speed

Active Publication Date: 2021-01-01
HUAZHONG UNIV OF SCI & TECH
View PDF20 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the above methods did not consider the access law of load changing over time when designing the aggregation strategy, that is, ignoring the timing characteristics of the load, making the storage performance (writing and reading performance) of massive small files in the distributed storage system restricted

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Massive small file storage performance optimization method and system based on time series prediction
  • Massive small file storage performance optimization method and system based on time series prediction
  • Massive small file storage performance optimization method and system based on time series prediction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0066] In the present invention, the terms "first", "second" and the like (if any) in the present invention and drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence.

[0067] In order to optimize the storage performance (including write performance and read performance) of a large number of small files in a distributed storage sy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a massive small file storage performance optimization method based on time sequence prediction, and belongs to the field of information storage. The massive small file storageperformance optimization method comprises the following steps: collecting historical file access records with time information to obtain a data set; after the data set is preprocessed into discrete time series data, using a time window to rolling on the data set to generate a training data set, wherein the training data at any t moment takes the data at the t-n-t moment as input data, and takes the data at the t + 1 moment as label data; establishing a time sequence prediction model based on a recurrent neural network, and performing training, verification and testing in sequence by utilizinga training set, a verification set and a test set obtained by dividing the training data set, thereby obtaining a target model; predicting the change trend of the file size by utilizing the target model so as to identify a large file and a small file in the file size; and directly storing the large files, and aggregating and storing the small files based on a time sequence. According to the massive small file storage performance optimization method, the storage performance of massive small files in the distributed storage system can be optimized.

Description

technical field [0001] The invention belongs to the field of information storage, and more specifically relates to a method and system for optimizing storage performance of massive small files based on time series prediction. Background technique [0002] In the context of the rapid development of Internet information technology, data has shown exponential growth. Faced with the ever-growing web applications and mobile applications, as well as the massive small files generated during the user's use. Distributed storage systems basically have performance problems when storing a large number of small files. This type of problem is called the small file problem. There are challenges in a series of issues such as performance. When faced with the problem of massive small files, disk I / O efficiency is low. Traditional disks are more suitable for sequential large file I / O access patterns, but the performance of random small file I / O reads and writes is not good. The time consum...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F3/06
CPCG06F3/0611G06F3/0643G06F3/0653G06F3/067
Inventor 曾令仿张爱乐程倩雅程稳李弘南方圣卿杨霖施展冯丹
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products