A massive small file query method and system using suffix array index

A technology of massive small files and suffix arrays, which is applied in the field of big data management, can solve the problems of poor query immediacy, difficulty in updating small files, and single query method for small files, so as to ensure immediacy, avoid massive small file queries, and reduce IO effect of overhead

Active Publication Date: 2019-01-29
SUN YAT SEN UNIV
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to overcome the problems of simple merging of small files in the prior art, such as single small file query mode, low reading efficiency, difficulty in updating small files, poor query immediacy, etc., and provide a method for querying massive small files using suffix array index

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A massive small file query method and system using suffix array index
  • A massive small file query method and system using suffix array index
  • A massive small file query method and system using suffix array index

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

[0048] In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;

[0049] For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

[0050] The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0051] This embodiment applies the present invention to Hadoop Distributed File System (HDFS).

[0052] The concrete attribute data of two small files in the present embodiment is as shown in table 1, has small file name filename, small file size filesize, small file is correspondingly stored in the file name unionfilename on the distributed system, and is stored in the corresponding storage f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a massive small file inquiry method adopting suffix array index. As that small file are merged and store on the distributed file system, the invention improves the utilizationrate of space. At that same time, a suffix array index is establish for each small file to record its storage information and the attribute information of the small file itself, and an effective smallfile update method is provided to support the small file query in various way, thereby avoiding the traditional single low-efficient massive small file query, and ensuring the instantaneity, accuracyand efficiency of the query. The invention solves the problems of simple merging small files in the prior art, such as single query mode of small files, low reading efficiency, difficulty in updatingsmall files, poor query instantaneity and the like.

Description

technical field [0001] The present invention relates to the field of big data management, more specifically, to a method and system for querying massive small files using a suffix array index. Background technique [0002] We are currently in the era of big data, and various modern information applications will generate massive amounts of data, which will also bring pressure on storage and management. Many commonly used distributed file systems represented by HDFS are designed to be more suitable for storing large files. If small files are stored, each small file will waste space because it occupies a complete storage unit space. At the same time, storing small files directly on the distributed file system will consume a lot of server memory due to the creation of small file metadata information, and after the number of small files reaches a certain scale, the speed of storage and retrieval will also slow down accordingly. [0003] The common way to solve the above problem...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/182G06F16/13G06F16/16
Inventor 赵鑫孙茜农革
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products