Hadoop-based massive tile small file storage management method

A technology for storage management and small files, applied in digital data processing, geographic information databases, special data processing applications, etc. and data response performance, data transmission and data processing optimization, etc., to achieve the effect of improving storage speed, high-performance raster data service, and efficient storage

Active Publication Date: 2018-02-09
STATE GRID CORP OF CHINA +2
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] (1) The existing small file storage and management methods are mostly based on hierarchical indexing to manage data. The problem brought by hierarchical indexing is that it takes extra effort to develop a Hadoop cluster small file storage mechanism to ensure that the small files are merged. The block file index and the corresponding block are transmitted to the same location of the same DataNode to realize the distributed storage of the block file index after small files are merged. The NameNode in the Hadoop cluster also needs to consume additional resources to manage the index file and increase memory overhead. ;
[0008] (2) There may be multiple data sources and multiple versions of tile files in the project application, and the hierarchical index method is not suitable for fast and efficient reading of small files with multiple data sources and multiple versions;
[0009] (3) There are a large amount of raw data and frequent real-time data requests in the project. The existing solutions are not specifically optimized for data transmission and data processing, which will affect the performance of data transmission and data response when the project is running

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based massive tile small file storage management method
  • Hadoop-based massive tile small file storage management method
  • Hadoop-based massive tile small file storage management method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The present invention will be further described below in conjunction with the drawings and specific embodiments.

[0021] The present invention provides a Hadoop-based storage and management method for massive tiles and small files, which is characterized by comprising the following steps: Step S1: Determine the size of the file to be stored, and if a single file is smaller than 20M, serialize and compress it Re-store and store, if a single file is greater than or equal to 20M, it will be stored directly; Step S2: Introduce Hilbert curve to sort the stored files; Step S3: Compress the stored files and generate tile index information; Step S4: Classify and name the tile files; Step S5: Build a tile information index table; Step S6: Provide an improved geographic data block service ITMS, and pre-generate geographic data unit blocks for multiple types provided by the ITMS Perform asynchronous access; Step S7: Use Memcached as a cache area. If the requested tile data is in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a mass small tile file storage management method based on hadoop. According to the method, sequencing is carried out through a Hibert curve, and then, grid tile data is subjected to serialization compression storage by a Sequence File technology of the hadoop pre se; when tile compression blocks are generated, the multithreading parallel compression of a plurality of servers is realized, tile index information is generated, and the mass file library entering storage speed is accelerated; the regular naming of block file names is managed, and efficient storage, fast reading and high-performance grid data service can be provided for mass multi-source and multi-version grid small tiles; and ITMS (Improved Tile Map Service) is designed, and the problem of delay and bandwidth occupation caused by original data transmission and real-time data processing request responding, so that the project data retrieval and transmission requirements are met.

Description

Technical field [0001] The present invention relates to a method for storing and managing raster data on a Hadoop distributed platform, in particular to a method for storing and managing large, multi-source, and multi-version raster small files. Background technique [0002] With the rapid development of GIS technology, map data has grown rapidly. The corresponding tile data volume is large, data sources are multiple, and versions are multiple. How to efficiently store and manage massive map tile data becomes a problem. [0003] The emergence of cloud computing provides a new idea. In recent years, in order to solve the problem of large-scale data storage and management, many companies and organizations have proposed a series of scalable large-scale data management solutions based on "cloud computing" technology. At present, the mainstream open source project Hadoop is a distributed system architecture that includes massive data storage and calculations, and large-scale structured...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/134G06F16/29
Inventor 汤振立陈强林承华梁曼舒罗富财吴丹
Owner STATE GRID CORP OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products