File metadata incremental scanning method and system for electron microscope data storage system

A data storage system and file metadata technology, applied in the computer field, can solve problems such as missed remedial time, unacceptable scanning time for system administrators, and a rather long time-consuming problem, so as to improve computing efficiency and reduce metadata acquisition operations , the effect of saving the required time

Active Publication Date: 2020-09-18
TSINGHUA UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The traditional method of obtaining the consumption status of the electron microscope storage system uses the command tool that comes with the operating system, such as the df command that comes with the Linux system to obtain the usage and remaining amount of the current storage space of the access system, but it is impossible to obtain the specific data of each user. consumption
If you want to check the user's daily consumption, you need to use other command line tools in combination, such as the find command and stat command in Linux, to scan all the files in the entire storage system before you can get the specific usage of each user, and This will lead to another problem: when the number of files in the storage system is very large, for example, at the tens of millions level, it will take a long time to get the final statistical data of each user
Take ShareEM, an electron microscope storage system managed by the biological computing platform of Tsinghua University, as an example. Its total capacity is 2.5P, which contains about 24 million data files related to electron microscopy. ShareEM is composed of 4 IO nodes in total. Through the IOZone The test found that the IOPS (Input / OutputOperations Per Second) of the storage system is about 4000 times per second, that is, about 4000 IO operations can be performed on ShareEM per second, but this is the aggregate throughput of 4 IO nodes. The average IOPS of each IO node is 1000 times per second. Therefore, if a user space usage statistics is performed on the 24 million electron microscope data on ShareEM, since there is a strict sequence between the find operation and the stat operation, it must be found through the find operation. After all the file paths are obtained, the stat operation can be performed on each file to obtain the metadata information of the file. Therefore, the analysis of one file requires two IO operations. For 24 million files, a total of 48 million IO operations are required. , while completing 48 million IO operations and scanning at a speed of 1,000 IO operations per second, it takes a total of 48,000 seconds, or about 13 hours in total.
[0008] In fact, since the find command and the stat command are executed in a single process, and the influence of other processes on the host where the find command and the stat command are run on the ShareEM operation, it is usually difficult to scan the ShareEM at a speed of 1000 times per second. Scanning, the actual test found that when using the find command and the stat command to scan ShareEM, the IOPS can only reach about 400 times / second, and a ShareEM scan takes more than 30 hours, and the system administrator usually cannot accept 30 hours. Hours of scanning time, the reason is that when an abnormal situation is found, the best remedial time has often been missed. Therefore, how to realize the rapid scanning of the metadata of the electron microscope data file is particularly important for the management of the electron microscope data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File metadata incremental scanning method and system for electron microscope data storage system
  • File metadata incremental scanning method and system for electron microscope data storage system
  • File metadata incremental scanning method and system for electron microscope data storage system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0026] It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein.

[0027] The file system is system software, and the hardware devices of the storage system are managed through the file system. It should be pointed out that the file systems in this application include parallel file systems and non-parallel file systems.

[0028] Su...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a file system metadata obtaining method and system. The method comprises the steps of analyzing a first metadata information file to obtain a first full path information set; obtaining path information of all data files of a file system, generating a second full path information set, and performing difference set operation on the second full path information set and the first full path information set to obtain a to-be-added full path information set; performing difference set operation on the first full path information set and the second full path information set to obtain a to-be-deleted full path information set; writing metadata related to each file path in the to-be-added full path information set into the first metadata information file; and deleting metadatarelated to the to-be-deleted full path information set in the first metadata information file. The file system metadata obtaining method and system provided by the invention is based on incremental updating, so that the processing time is shortened, and quick scanning and updating of the file system metadata are realized.

Description

technical field [0001] The invention relates to the field of computers, in particular to a file metadata incremental scanning method and system for an electron microscope data storage system. Background technique [0002] With the advancement of hardware technology, the application of cryo-electron microscopy to analyze the structure of biological macromolecules is becoming a new direction of structural biology research. In recent years, many research teams have published dozens of high-resolution results on protein three-dimensional structure analysis in top international academic journals such as Nature, Science, and Cell based on cryo-electron microscopy technology, which has had a significant impact on the development of life sciences. Cryo-electron microscopy plays a pivotal role in this. [0003] In order to reconstruct a high-precision molecular structure, cryo-electron microscopy needs to take a large number of two-dimensional high-resolution images. It is difficult...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/16
CPCG06F16/16G06F16/164
Inventor 阮华斌杨涛王亚坤
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products