Distributed data storage system expansion method based on data distribution

A distributed data and storage system technology, applied in the field of expansion of distributed data storage system, can solve the problems of frequent data movement, poor load balance, low utilization rate, etc.

Active Publication Date: 2014-05-21
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF5 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] The purpose of the present invention is to propose a distributed data storage system based on data distribution in view of problems such as poor load balance, low utilization rate, and frequent data movement between nodes in the current distributed storage system expansion method expansion method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed data storage system expansion method based on data distribution
  • Distributed data storage system expansion method based on data distribution
  • Distributed data storage system expansion method based on data distribution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0071] Present embodiment adopts the method that the present invention proposes to set up a simulation data storage system, and the CPU of the testing machine that this storage system uses is AMD Athlon TM II X3435Processor (3CPUs), 2.90GHz. The memory is 8.00GB and its frequency is 2.67GHz. The capacity of the storage nodes in the storage method in this embodiment is set to 2097152 records, so as to simulate the actual hardware facility environment. Using the user query logs provided by Sogou Labs for 2 days including 2011-12-30 and 2011-12-31 as the experimental data set, it contains 43,545,444 query click records, of which 31,552,843 records start with Chinese characters. The experiment first uses the hash value of the initial character of the query word of this part of the data as the identifier of the non-uniformly distributed data to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of databases and relates to a distributed data storage system expansion method, in particular to a distributed data storage system expansion method based on data distribution. According to the method, the generation probability of data is fitted, a storage node with the maximum overflow probability is computed according to the probability, and a new storage node is added on a data storage section of the node in order to share loads. Regardless of whether the data are distributed evenly and whether loading capacities of the storage nodes are the same, the utilization rate of a system can be improved, load balancing can be maintained, and moving times of the data between the nodes can be reduced.

Description

technical field [0001] The invention relates to a capacity expansion method of a distributed data storage system, in particular to a capacity expansion method of a distributed data storage system based on data distribution, and belongs to the field of databases. Background technique [0002] With the development of computer science and Internet technology, the amount of data in information retrieval systems has become larger and larger. In order to ensure its scalability, reliability, high performance and high applicability, distributed storage systems are usually used to save massive amounts of data. [0003] Capacity expansion is one of the important issues in distributed storage systems, and technologies such as partitioning, sharding, or distributed hash tables are usually used at present. When dealing with non-uniformly distributed data, these methods cannot guarantee high system utilization, good load balancing performance, and small number of data movement between nod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F3/06
CPCG06F16/22
Inventor 牛振东束博
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products