Mass vector data partition method and system based on Hadoop

A vector data and map function technology, applied in the field of spatial big data, can solve problems such as reduced task execution efficiency, uneven distribution of Reduce load, and inability to guarantee the consistency of spatial index results, improve storage and computing efficiency, and ensure spatial distribution. Features, the effect of improving the efficiency of spatial indexing

Active Publication Date: 2016-10-12
CHINA AGRI UNIV
View PDF2 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the randomness of the sample, for the spatial index technology itself, on the one hand, it cannot guarantee the consistency of the spatial index results, and on the other hand, it will lose the spatial distribution characteristics of the spatial data, resulting in unsatisfactory results of the final data division.
For the Hadoop platform, due to the randomness of the samples, the established data partition rules cannot guarantee the balanced distribution of data, which in turn causes the uneven distribution of the Reduce load, which reduces the efficiency of the entire task execution; at the same time, the result will directly lead to Severe data skew (Data Skew); In addition, although the current parallel space partition algorithm can store adjacent elements in the same data block, it cannot guarantee that adjacent data blocks are stored on the same cluster node

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass vector data partition method and system based on Hadoop
  • Mass vector data partition method and system based on Hadoop
  • Mass vector data partition method and system based on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0046] figure 1It shows a schematic flowchart of a method for dividing massive vector data based on Hadoop according to an embodiment of the present invention. Such as figure 1 As shown, the Hadoop-based massive vector data division method of this embodiment includes:

[0047] S11: spatially encode the spatial elements in the spatial dataset based on the Hilbert space-filling curve;

[0048] S12: Realize key-value of spatial elements through Map function and Reduce function, and generate spatial data sample information set;

[0049] S13: Generate a spatial data division matrix according to the spatial data sample information set;

[0050] S14: Divide each spatial element into corresponding storage data blocks according to the spatial data division matrix, and distribute adjacent data blocks to the same cluster node at the same time.

[0051] In the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a mass vector data partition method and system based on Hadoop. The method comprises the steps that space encoding is conducted on space data-concentrated space elements on the basis of a Hilbert space filling curve; key value assignment on the space elements is achieved through a Map function and a Reduce function, and a space data sample information set is generated; space data partition matrixes are generated according to the space data sample information set; the space elements are partitioned in corresponding storage data blocks according to the space data partition matrixes, and meanwhile every two adjacent data blocks are distributed in a same cluster node. According to the system, the Hilbert space filling curve is introduced into a data sampling and partitioning rule, the influence factors such as the space position relation of adjacent objects of the space data, the self size of the space objects and the space object number of same encoding blocks are fully taken into account, therefore, the space distribution characteristics of the sample information set are guaranteed, the space index efficiency of the mass vector data is improved, and meanwhile load balance based on HDFS data block storage is guaranteed.

Description

technical field [0001] The invention relates to the technical field of spatial big data, in particular to a method and system for dividing massive vector data based on Hadoop. Background technique [0002] With the advent of the era of big data, traditional data storage and processing methods are facing severe challenges, such as volume, variety, velocity and low value density in the era of big data. And other characteristics make traditional tools and processing methods sometimes only look at "data" and sigh. In the field of geospatial vector data management, the existing mature geographic information system (GIS, geographic information system) spatial data mostly relies on relational database storage, which has inherent limitations in massive data management, high concurrent access, and scalability. Therefore, it can no longer play its due role in the era of big data. [0003] The emergence of cloud computing technology provides an ideal solution for the storage and mana...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 李林姚晓闯朱德海郧文聚杨建宇叶思菁赵祖亮
Owner CHINA AGRI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products