Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Shuffle data caching method based on mapping-reduction calculation model

A data caching and computing model technology, applied in the directions of fault processing, computing, and electrical digital data processing that are not based on redundancy, can solve problems such as affecting the performance of computing itself, not fully utilizing hardware features, etc., to improve robustness, Avoid the effect of manually setting checkpoints

Active Publication Date: 2017-02-01
SHANGHAI JIAO TONG UNIV
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Because these fault-tolerant mechanisms overlap with the calculation logic, not only do not make full use of the existing hardware features, but also interspersed in the calculation process greatly affects the performance of the calculation itself.
[0004] Although there are some memory-based distributed file systems, they are mainly for the data block itself, and the volume of the data block itself is often much larger than the shuffled data, so a large amount of memory is required as a support

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Shuffle data caching method based on mapping-reduction calculation model
  • Shuffle data caching method based on mapping-reduction calculation model
  • Shuffle data caching method based on mapping-reduction calculation model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. This embodiment is implemented on the premise of the technical solution and algorithm of the present invention, and provides detailed implementation and specific operation process, but the applicable platform is not limited to the following embodiments. The specific operating platform of this example is a small cluster composed of two common servers, each of which is installed with UbuntuServer 14.04.1 LTS 64 bit and equipped with 8GB of memory. The specific development of the present invention is based on the source code version of Apache Spark 1.6 as an illustration, and other mapping-reduction distributed computing frameworks such as Hadoop are also applicable. First, you need to modify the source code of Spark to transmit the shuffled data through the interface of this method.

[0044] The present invention deploys a cache system in a distributed co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a shuffle data caching method based on a mapping-reduction calculation model. The shuffle data caching method comprises the following steps of: a mapping-reduction calculation frame sends the division, which takes tasks as a unit, of one piece of mapping-reduction work to a shuffle caching host through an interface, after the shuffle caching host receives task division data, a timestamp is added to the task division data, and then, the task division data with the timestamp is stored in local memory; and the shuffle caching host adopts a random algorithm to carry out one-to-three mapping on reduction task in the task division data and each node in a cluster, and the reduction task and each node of the cluster are stored in the memory of the shuffle caching host in a form of a hash table. By use of the method, the calculation performance of a distributed calculation frame based on the mapping-reduction model can be improved, the low-efficiency manual setting of checking points by users can be avoided, and the robustness of the distributed calculation frame is improved.

Description

technical field [0001] The invention relates to the field of computer distributed systems and distributed computing frameworks. Specifically, it mainly provides a memory-based distributed shuffle data cache for the Map-Reduce computing model, thereby improving the performance and robustness of the computing framework. Background technique [0002] The map-reduce computing model and the distributed computing system designed based on this model are currently mainstream big data distributed systems, such as Spark and Hadoop. The calculation based on this model has a shuffling (Shuffle) between the mapping and reduction stages, which isolates the mapping and reduction. All current designs use shuffled data to be written to disk for persistent processing, and then transmitted. However, the performance of the disk is far inferior to that of the memory, so it brings this large performance overhead to the computing system. [0003] At the same time, this type of computing framewo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F11/07
CPCG06F9/5016G06F11/0709G06F11/073
Inventor 付周望王一丁戚正伟管海兵
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products