Co-range partition for query plan optimization and data-parallel programming model

A technology of range partitioning and data partitioning, which is applied in the fields of electrical digital data processing, digital data information retrieval, special data processing applications, etc.

Inactive Publication Date: 2012-12-19
MICROSOFT TECH LICENSING LLC
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, such an automatic determination cannot be made for multi-source operators (eg, join, groupjoin, zip, group operators: union, intersect, except )Wait)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Co-range partition for query plan optimization and data-parallel programming model
  • Co-range partition for query plan optimization and data-parallel programming model
  • Co-range partition for query plan optimization and data-parallel programming model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] figure 1 An exemplary data-parallel computing environment 100 is shown, comprising a common scope partition manager 110, a distributed execution engine 130 (e.g., MapReduce, Dryad, Hadoop, etc.) with high-level language support 120 (e.g., Sawzall, Pig Latin, SCOPE, DryadLINQ, etc.). etc.), and the distributed file system 140. In one embodiment, distributed execution engine 130 may include Dryad, and high-level language support 120 may include DryadLINQ.

[0019] The distributed execution engine 130 may include a job manager 132 responsible for generating vertices (V) 138a, 138b...138n on available computers with the assistance of remote execution and monitor port monitors (PD) 136a, 136b...136n . Vertices 138a, 138b . . . 138n exchange data via files, TCP pipes, or shared memory channels as part of distributed file system 140 .

[0020] Job execution on distributed execution engines 130 is coordinated by job manager 132, which may do one or more of the following: ins...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a co-range partition for query plan optimization and data-parallel programming model. A co-range partitioning scheme that divides multiple static or dynamically generated datasets into balanced partitions using a common set of automatically computed range keys. A co-range partition manager minimizes the number of data partitioning operations for a multi-source operator (e.g., join) by applying a co-range partition on a pair of its predecessor nodes as early as possible in the execution plan graph. Thus, the amount of data being transferred is reduced. By using automatic range and co-range partition for data partitioning tasks, a programming API is enabled that abstracts explicit data partitioning from users to provide a sequential programming model for data-parallel programming in a computer cluster.

Description

technical field [0001] This application relates to common range partitioning for query plan optimization and data parallel programming models. Background technique [0002] Data partitioning is an important aspect in large-scale distributed data parallel computing. A good data partitioning scheme splits the dataset into multiple balanced partitions to avoid data and / or computation skew issues, resulting in improved performance. For multi-source operators (e.g., join), existing systems require users to manually specify the number of partitions in a hash partitioner, or the range key in a range partitioner, so that multiple input datasets Partitions are balanced and coherent for good data parallelism. Such manual data partitioning requires the user to have knowledge of both the input dataset and the resources available in the computer cluster, which is often difficult or even impossible when the dataset to be partitioned is generated by an intermediate stage during runtime ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F8/453G06F17/30463G06F17/30584G06F2209/5017G06F9/44G06F16/24542G06F16/278
Inventor 柯启发Y·余
Owner MICROSOFT TECH LICENSING LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products