Dataset fragmentation method based on two-dimensional geographic position information

A geographic location information and data set technology, which is applied in the direction of electronic digital data processing, structured data retrieval, special data processing applications, etc., can solve problems that cannot ensure data adjacency, do not make good use of GeoHash features, Problems such as data cannot be guaranteed, so as to achieve the effect of improving query efficiency and good practical value

Active Publication Date: 2014-12-10
ZHEJIANG UNIV
View PDF2 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In the current sharding schemes, a one-dimensional attribute is generally selected as the shard key, such as the ID of the data, and then this dimensional attribute is directly divided into different slices according to the linear range. These sharding methods cannot guarantee the same The data on the slice is

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Dataset fragmentation method based on two-dimensional geographic position information
  • Dataset fragmentation method based on two-dimensional geographic position information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0031] The following is a more specific description of the fragmentation process. First, the two-dimensional geographic location information in each piece of data in the fragmented dataset is required, that is, the latitude and longitude values ​​are converted into geoHash; for example, (latitude is 42.6, longitude is -5.6) can be converted into 01101 11111 11000 00100 00010, and then segment the data according to the converted binary geoHash value. The length N of the geoHash value is configured according to the data location accuracy requirements. The longer the length, the smaller the error.

[0032] Such as figure 1 As shown, the specific fragmentation process is as follows:

[0033] (1) Initially, all the data is on the same slice, and the prefix le...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a dataset fragmentation method based on two-dimensional geographic position information. The method includes the steps that (1) two-dimensional geographic position information of each datum is converted into a binary geoHash value; (2) fragmentation is carried out according to the binary geoHash values, each fragment has a public geoHash prefix, and a fragment index is built or updated in the fragment process; (3) when a new datum is added, the fragment with the longest public geoHash prefix with the new datum is found in the index, then the datum is inserted into the fragment, and if the size of the fragment succeeds a preset value due to insertion of the datum, the fragment is fragmented according to the step (2). According to the method, the two-dimensional geographic position information is converted into the geoHash values for data fragmentation, so that it is guaranteed that data which are adjacent in geographic position are fragmented on the same fragment as much as possible, and a good optimization function is achieved on distributed application based on geographic positions.

Description

technical field [0001] The invention belongs to the technical field of data storage, and in particular relates to a data set fragmentation method based on two-dimensional geographic location information. Background technique [0002] In the era of big data, when the amount of data reaches the level of megabytes, a single memory or disk cannot store it. In this case, the dataset needs to be stored in a distributed manner. When performing distributed storage, the data set needs to be fragmented, so as to facilitate the organization, management, migration and other operations of data in units of slices. When the current popular NOSQL database MongoDB performs distributed storage, it first fragments the data set, and then manages the fragments on multiple servers. [0003] When we design a database, we usually divide a large global database according to its data items (ie fields) or according to certain characteristics of a keyword. Here we call it data fragmentation, and the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/21G06F16/2246
Inventor 吴朝晖刘娜陈华钧郑国轴
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products