Unlock instant, AI-driven research and patent intelligence for your innovation.

A shuffle method for non-volatile memory

A non-volatile memory and persistent technology, applied in the field of big data processing, can solve the problems of high memory performance requirements, large time overhead, and dependence on network performance, and achieve the effects of improving efficiency, fast positioning, and improving space utilization

Active Publication Date: 2020-06-05
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Themis published an article on Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC), 2012, proposing to use a dynamic memory allocation strategy in the Shuffle stage to store data in the process, that is, during the process of processing data, data from The number of disk reads and writes is only twice, and the rest of the process will not interact with the disk; SpongeFiles published an article on the Proceedings of the 2014 ACM SIGMOD international conference on Management of data, proposing to share the unused memory space in the Task, the above two methods Acceleration is only through memory, which requires high memory performance;
[0006] In addition, Sailfish published an article on Proceedings of the 3rd ACM Symposium on CloudComputing (SoCC), 2012, proposing that when writing Shuffle data, the data of each partition corresponding to the Map Task is gathered, and the distributed file system is used to store the corresponding data; Hadoop-A published an article on Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, which proposes to use the characteristics of high-speed network (RDMA) and use the Network-Levitated Merge algorithm to execute the Shuffle stage, but the above two The disadvantage of this method is that it is too dependent on network performance, and the time overhead for data access in the form of a file system is relatively large

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A shuffle method for non-volatile memory
  • A shuffle method for non-volatile memory
  • A shuffle method for non-volatile memory

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] In order to make the object, technical solution and advantages of the present invention clearer, the Shuffle method for non-volatile memory provided in the embodiment of the present invention will be described below with reference to the accompanying drawings.

[0029] In order to study the impact of Shuffle performance on the overall performance, the inventor took the Sort application as an example and evaluated the results of the running time of the application on Spark as the amount of Shuffle data changed.

[0030] figure 2 It is a graph of the influence of Shuffle data volume on Sort execution time, such as figure 2 As shown, as the amount of Shuffle data increases, the performance of Spark drops significantly. This is because the data is partitioned when the data is read between the Map task and the Reduce task. Therefore, for a certain Reduce task, the amount of data read from a Map task is proportional to the total number of Reduce tasks. Inversely, this wil...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Shuffle method aiming at a nonvolatile memory. The Shuffle method includes following steps: utilizing a partition ID to write output data of a Map task into a persistent buffer zone; pulling data in the persistent buffer zone corresponding to a Reduce task.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a Shuffle method for non-volatile memory. Background technique [0002] With the development of science and technology, the world today has entered the era of big data. MapReduce is a popular programming model for large-scale data parallel computing. How to optimize the performance of MapReduce has always been a hot topic in the industry. [0003] Shuffle is a specific stage between the Map stage and the Reduce stage in the MapReduce framework. figure 1 is a schematic diagram of the MapReduce process, such as figure 1 As shown, Shuffle refers to the process that when the output result of Map is to be used by Reduce, the output result is hashed according to the key and distributed to each Reduce. Shuffle involves disk reading and writing and network transmission, so Shuffle The level of performance directly affects the operating efficiency of the entire program. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F8/30
CPCG06F8/31
Inventor 潘锋烽熊劲
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI