Splitting and moving ranges in a distributed system

a distributed database and range technology, applied in the field of large group splitting in a distributed database system, can solve the problem of not uncommonly large splits that exceed the size threshold

Active Publication Date: 2017-11-02
GOOGLE LLC
View PDF2 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0006]According to certain embodiments, the distributed transaction is executed according to a two-phase commit protocol comprising a voting phase and a commit phase. A majority of tablets in each group must commit in the voting phase for the distributed transaction to complete. In response to a vote to abort in the voting phase, each group undoes the transaction.

Problems solved by technology

The time and resources required by current repartitioning implementations often cause problems, in particular when trying to split large groups with heavy write loads and as a result, splits that grow many times larger than a size threshold are not uncommon.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Splitting and moving ranges in a distributed system
  • Splitting and moving ranges in a distributed system
  • Splitting and moving ranges in a distributed system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014]When repartitioning data in a distributed database, standard implementations copy the entirety of data to be moved. For example, splitting data or changing a replication level requires making a whole new copy of the data in the new configuration. Embodiments described herein provide a mechanism to avoid this extra copy by sharing on-disk copies of the data whenever possible. For example, when splitting data, rather than making a new copy of the two partitions, a virtual view of the existing partition may be provided that makes the existing partition usable as two separate portions. Only when a new copy of the data would otherwise be made, for example when rewriting data into a more compact form (i.e., a “compaction”), does the virtual copy need to be resolved into a real copy of the data. According to certain embodiments, a database is partitioned into groups, where each group is a replicated set of tablets. A tablet includes a list of immutable files, also called layers, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Methods and systems for a distributed transaction in a distributed database system are described. One example includes identifying a request to insert a split point in a source group comprising one or more tablet replicas, each tablet including at least a portion of data from a table in the distributed database system, and the split point splitting data in the source group into a first range and a second range different than the first range; in response to the request: sending a list of filenames in the first range of the source group to a first target group comprising one or more tablet replicas; and creating, at the first target group, a virtual copy of files represented by the list of filenames in the first range, the virtual copy making data of the files available, each using a new name, without duplicating the data of the files.

Description

BACKGROUND[0001]This specification generally relates to splitting large groups in a distributed database system.[0002]When repartitioning data in a distributed database, large chunks of data are often copied to be moved. The time and resources required by current repartitioning implementations often cause problems, in particular when trying to split large groups with heavy write loads and as a result, splits that grow many times larger than a size threshold are not uncommon. Therefore, a need has arisen for a mechanism to quickly and efficiently split large groups in a distributed database.SUMMARY[0003]In general, one aspect of the subject matter described in this specification may be embodied in systems, and methods performed by data processing apparatuses that include actions for a distributed transaction in a distributed database system, including identifying a request to insert a split point in a source group, the source group comprising one or more tablet replicas, each tablet ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30
CPCG06F17/30138G06F17/30575G06F17/30194G06F16/1727G06F16/27G06F16/182G06F16/278
Inventor KANTHAK, SEBASTIANFREY, CLIFFORD ARTHUR
Owner GOOGLE LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products