Data multi-duplicate hybrid storage method and system

A hybrid storage and multi-copy technology, applied in the computer field, can solve problems such as data redistribution and limit data processing efficiency, and achieve the effect of reducing overhead and solving low data processing efficiency

Active Publication Date: 2013-12-11
SUGON INFORMATION IND
View PDF2 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, a single partition method may lead to large-scale data redistribution in scenarios where the partition and connection operations are inconsistent or the same batch of data needs to be operated on different partitions, which limits the efficiency of data processing to a certain extent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data multi-duplicate hybrid storage method and system
  • Data multi-duplicate hybrid storage method and system
  • Data multi-duplicate hybrid storage method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0076] Such as Image 6 As shown, take Beijing population data as data 1, Tianjin population data as data 2. First, split the Beijing population data (data 1) into multiple partitions according to surnames (partition method 1), and the result of the split is that Beijing Zhang’s is partition A 11 , Beijing Wang's is zone A 12 …; then split into multiple partitions according to age (partition method 2), and the result of the split is that Beijing 0-30 years old is partition A 21 , Beijing 31-60 years old is zone A 22 …. Similarly, the Tianjin population data (data 2) is also split by surname (division method 1), and the split result is that Tianjin Zhang’s is partition B 11 , Tianjin Wang is Partition B 12 …; and then split by age (partition 2), the split result is Tianjin 0-30 years old as partition B 21 , Tianjin is 31-60 years old as Zone B 22 ….

[0077] When storing data, consider a certain load balancing strategy to store Beijing population data (data 1) on multiple servers, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data multi-duplicate hybrid storage method and system. In a data loading stage, original data are partitioned repeatedly, different partitioning modes are adopted for each time of partitioning, and partition data obtained through each time of partitioning are stored in a plurality of servers. The invention further provides a data processing method mainly for large-scale data on-line analysis. According to the data multi-duplicate hybrid storage method and system, on the premises that disk space occupied by data storage is not increased and data reliability is not reduced, the variety of data partitioning is increased, therefore, data processing of the same type of more scenes conducted in groups can be subjected to parallel execution according to partitions, the expenditure for data query and processing under more scenes is reduced, and the problem that in the prior art, a single partitioning mode of duplicates causes low efficiency of some scenes under which data processing is conducted in groups.

Description

Technical field [0001] The invention relates to the field of computers, and in particular to a data multi-copy hybrid storage method, a data multi-copy hybrid storage system and a data processing method. Background technique [0002] In the field of big data processing, data is usually stored in shards, and the purpose of data partitioning is not just to store data in a distributed manner. To further ensure reliability, copy technology will be used. At present, the existing big data partition storage technology uses the same partition method for multiple copies of a piece of data. While reducing the amount of calculation in the loading phase, if a copy of the data is wrong or lost, you can directly copy other existing correct copies to restore; when the data is modified, you can also directly modify multiple corresponding copies at the same time. can. In some operations, reasonable data partitioning can reduce the data transmission overhead between nodes in the data processing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F11/14H04L29/08
Inventor 王颖狄静舒宋怀明苗艳超刘新春邵宗有
Owner SUGON INFORMATION IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products