Method for accessing archived file based on B + tree index

An access method and indexing file technology, applied in file system, file access structure, file metadata retrieval, etc., can solve the problems of occupying NameNode memory space, loss of NameNode performance, and poor access performance of small files

Pending Publication Date: 2022-03-01
CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This will cause too many small files stored in HDFS, which will occupy a large amount of memory space of NameNode and seriously degrade the performance of NameNode.
[0004] (2) Poor processing performance: when processing a large number of small files, each request can only process a single file, and small files need to be read one by one, requiring MapReduce (HDFS data processing framework) to perform multiple processing on different nodes in the cluster Read and write operations, which take far longer than processing a single file of the same size as this batch of small files
[0005] (3) Long storage time: Since each small file uploaded by the client will send a request to the NameNode node, the NameNode node needs to check its own metadata information after receiving the request to confirm whether the target file exists, whether the parent directory exists, etc. Feedback on whether the client can upload the target file, it will take a lot of time
However, after processing according to the above method, when we need to access the small files in the large file, because the metadata information before the small file is stored in the memory of the NameNode to ensure fast access, it is now dumped into these three file structures , the file access mechanism has changed, resulting in poor access performance
[0015] Therefore, how to provide an access method based on the B+ tree index archive file, which can solve the problem of poor access performance of small files in the original solution, and realize the reduction of NameNode memory consumption while improving the speed of retrieval of small files has become an important issue. Technical problems to be solved urgently by those skilled in the art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for accessing archived file based on B + tree index
  • Method for accessing archived file based on B + tree index
  • Method for accessing archived file based on B + tree index

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to enable those skilled in the art to better understand the technical solution in the application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the drawings in the embodiment of the application. Obviously, the described implementation Examples are only some of the embodiments of the present application, but not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0051] It should be noted that when an element is referred to as being "fixed" or "disposed on" another element, it may be directly disposed on another element or indirectly disposed on another element; when an element is referred to as being "connected" It may be directly connected to another element or indirectly connected to another element.

[0052] It i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a B + tree index-based archived file access method, which comprises the following steps of: file division: calling a file filter, and dividing an archived file into a large file, a small file and an ultra-small file according to the size of the archived file; file merging processing: sending the small files and the ultra-small files to a file merging module, and merging and classifying the small files and the ultra-small files into index files and data files by using a file merging strategy; constructing an index system: constructing the index system for storing the index file in an HDFS (Hadoop Distributed File System); file uploading: directly uploading a large file to the HDFS file system; meanwhile, the index file and the data file which are merged and classified are uploaded to the HDFS file system; and file index query: querying the small file information according to the file size based on the index system. According to the technical scheme, the problem that in an original scheme, the small file access performance is poor can be solved, and the small file retrieval speed is increased while NameNode memory consumption is reduced.

Description

technical field [0001] The present application relates to the technical field of file indexing, and more specifically, relates to an access method for archiving files based on a B+ tree index. Background technique [0002] When the original HDFS stores a large number of small files, because each small file will generate metadata in HDFS and save it in the memory of the NameNode node, the following three problems will occur, mainly as follows: [0003] (1) Impact on NameNode performance: Since the metadata information of small files is stored in the NameNode master node server, the metadata of each small file requires about 250 bytes of memory. Each small file exists in the form of data blocks in HDFS. For each block with default three copies, its metadata requires about 368 bytes of memory. As a result, when too many small files are stored in HDFS, a large amount of memory space of the NameNode will be occupied, which seriously degrades the performance of the NameNode. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/13G06F16/14G06F16/182G06F16/901
CPCG06F16/182G06F16/148G06F16/9027G06F16/13
Inventor 张经宇田佳宸王进李文军何施茗
Owner CHANGSHA UNIVERSITY OF SCIENCE AND TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products