Mass small tile file storage management method based on hadoop

A technology for storage management and small files, applied in digital data processing, geographic information databases, special data processing applications, etc., can solve the problem of fast and efficient reading of small files that are not applicable to multiple data sources and multiple versions, and affect data transmission and data response performance, data transmission and data processing optimization, etc., to achieve the effect of improving storage speed, high-performance raster data service, and efficient storage

Active Publication Date: 2015-08-05
STATE GRID CORP OF CHINA +2
View PDF5 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (1) The existing small file storage and management methods are mostly based on hierarchical indexing to manage data. The problem brought by hierarchical indexing is that it takes extra effort to develop a Hadoop cluster small file storage mechanism to ensure that the small files are merged. The block file index and the corresponding block are transmitted to the same location of the same DataNode to realize the distributed storage of the block file index after small files are merged. The NameNode in the Hadoop cluster also needs to consume additional resources to manage the inde

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass small tile file storage management method based on hadoop
  • Mass small tile file storage management method based on hadoop
  • Mass small tile file storage management method based on hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0021] The present invention provides a method for storing and managing a large amount of small tile files based on hadoop, which is characterized in that it comprises the following steps: Step S1: judging the size of the file to be stored, if a single file is less than 20M, it is serialized and compressed Then store in the warehouse, if a single file is greater than or equal to 20M, it will be directly stored in the warehouse; step S2: introduce the Hilbert curve to sort the stored files; step S3: compress the stored files and generate tile index information; Step S4: Classify and name the tile files; Step S5: Establish a tile information index table; Step S6: Provide an improved geographic data block service ITMS, through the multi-type pre-generated geographic data unit blocks provided by the ITMS Perform asynchronous access; step S7: use Me...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a mass small tile file storage management method based on hadoop. According to the method, sequencing is carried out through a Hibert curve, and then, grid tile data is subjected to serialization compression storage by a Sequence File technology of the hadoop pre se; when tile compression blocks are generated, the multithreading parallel compression of a plurality of servers is realized, tile index information is generated, and the mass file library entering storage speed is accelerated; the regular naming of block file names is managed, and efficient storage, fast reading and high-performance grid data service can be provided for mass multi-source and multi-version grid small tiles; and ITMS (Improved Tile Map Service) is designed, and the problem of delay and bandwidth occupation caused by original data transmission and real-time data processing request responding, so that the project data retrieval and transmission requirements are met.

Description

technical field [0001] The invention relates to a method for storing and managing raster data on a hadoop distributed platform, in particular to a method for storing and managing small raster files with large quantities, multi-sources and multi-versions. Background technique [0002] With the rapid development of GIS technology, the map data has grown rapidly, and the corresponding tile data has a large amount, multiple data sources, and multiple versions. How to efficiently store and manage massive map tile data has become a problem. [0003] The emergence of cloud computing provides a new way of thinking. In recent years, in order to solve the problem of large-scale data storage and management, many companies and institutions have proposed a series of scalable large-scale data management solutions based on "cloud computing" technology. Hadoop, the current mainstream open source project, is a distributed system architecture that includes massive data storage and computing,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/134G06F16/29
Inventor 汤振立陈强林承华梁曼舒罗富财吴丹
Owner STATE GRID CORP OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products