Efficient mass scientific data picture access method

A technology of scientific data and access methods, applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of high concurrent read and write speed limitations, data storage capacity restrictions, etc.

Inactive Publication Date: 2018-09-28
中国科学院电子学研究所苏州研究院
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The technical problem to be solved by the present invention is: the access scheme proposed for the massive pictures in the new generation digital earth, to solve the massive data storage, high concurrent read and write process, the data storage volume is restricted by the metadata memory, and the high concurrent read and write Problem with speed limited by disk IOPS

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient mass scientific data picture access method
  • Efficient mass scientific data picture access method
  • Efficient mass scientific data picture access method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:

[0049] The present invention adopts different storage schemes for different sizes of picture files in scientific data:

[0050] (1) Less than 10KB, using Hbase cell for storage;

[0051] (2) Between 10KB and 2MB, use HDFS Sequence File or MapFile to merge and store;

[0052] (3) If it is larger than 2MB, it will be stored in a single HDFS file. The user finds the location of the required file according to the metadata record.

[0053] Advantages: After the files are aggregated, on the one hand, the amount of metadata is reduced, and the storage capacity is increased when the memory is constant. On the other hand, the number of interactions between the client and the NameNode decreases, and T cn , T metadata , T nc , T cd and T disk Both decreased.

[0054] 1. Metadata structure

[0055] Metadata stores the mapping from local files to HDF...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an efficient mass scientific data picture access method. The method comprises the following steps of: establishing a metadata records= expressing mapping from local files to HDFS files when pictures are written, and adopting corresponding storage schemes for different sizes of picture files in scientific data; and finding positions of required files according to the metadata record so as to carry out deletion, modification and reading operations on the pictures. According to the method, an access scheme is put forward for mass pictures in a new generation of digital earth, so that the problems that the data storage amounts are restricted by metadata memories and the high-concurrency reading/writing speeds are limited by disk IPOS in the processes of mass data storage and high-concurrency reading/writing are solved.

Description

technical field [0001] The invention belongs to the field of data storage, and in particular relates to a method for efficiently accessing massive scientific data pictures. Background technique [0002] Pictures have classification features, and pictures under the same path have a high probability of being read in batches, such as pictures from the same website, and batch pictures for biometric image recognition. When users need to read in batches, the read files are cached to improve read and write efficiency. [0003] HDFS (Hadoop Distributed File System, Hadoop Distributed File System) is used in write-once-read-many scenarios, such as figure 1 As shown, the HDFS read file process is as follows: send a request to the NameNode for the client, and record the time as T cn ;NameNode looks up metadata in memory, and the time is recorded as T metadata ;NameNode returns the queried metadata information to the client, and the time is recorded as T nc ;DataNode obtains the spe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 王龙江赵旦谱丁一鸣阎克栋台宪青
Owner 中国科学院电子学研究所苏州研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products