Caching optimizing method of internal storage calculation

A cache optimization and memory computing technology, applied in computing, memory address/allocation/relocation, memory system, etc., can solve problems such as inability to increase memory usage, and achieve the effect of reducing programming burden

Active Publication Date: 2014-03-12
清能艾科(深圳)能源技术有限公司
View PDF0 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, programmers need to use the cache() method in Spark to load RDD displays into memory. This method requires programmers to have a certain foundation for memory usage, and this meth

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Caching optimizing method of internal storage calculation
  • Caching optimizing method of internal storage calculation
  • Caching optimizing method of internal storage calculation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The memory computing cache optimization method (hereinafter referred to as "memory optimization method") of the present invention is an optimization of the existing open source project Spark. Applying the memory optimization method can improve memory utilization and further improve the operation of parallel processing of large data. speed.

[0018] Spark is implemented in Scala language, which is an effective and general-purpose programming language framework that allows interactive analysis of data sets on clusters. The Scala language is a statically typed, functional, object-oriented language based on the JVM (Java Virtual Machine, Java Virtual Machine). Among them, RDD (Resilient Distributed Dataset, Resilient Distributed Dataset) is an important abstract concept of Spark. It is a set of read-only record partitions distributed on cluster nodes with fault tolerance mechanism and parallel operation. All data can be loaded into Memory, which allows memory-based computa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a caching optimizing method of internal storage calculation. The method includes the steps that monitoring codes are inserted into a Spark source program, and dynamic semantic analysis is performed on an application program to construct a DAG; out-degrees of all vertexes in the DAG are calculated, RDDs of the vertexes of which the out-degrees are larger than one are screened, and the screened RDDs are RDDs needing to be cached to an internal storage; according to a greedy algorithm, the execution sequence of Action is adjusted so that the access sequence of RDD data calculation can be optimized; the weights of the RDDs are calculated, and the replaced RDDs in the internal storage are determined according to an internal storage replacement algorithm; it is determined how to process the replaced RDDs according to a multi-level caching algorithm. By the utilization of the caching optimizing method of internal storage calculation, a programmer does not need to examine and weigh internal storage using and display the RDDs of the appointed loading internal storage in the process of programming, programming loads of the programmer are reduced, meanwhile, the utilization rate of the internal storage is improved, and then the speed of processing big data is increased.

Description

technical field [0001] The invention relates to the field of distributed big data processing, in particular to a cache optimization method for memory computing. Background technique [0002] Similar to Hadoop, Spark is an open-source cluster computing system based on memory computing, which can process the calculation of large data sets in parallel, realize interactive query of large data and optimize iterative workload. Spark caches datasets in memory, reducing I / O access latency. However, with the surge in data volume and people's higher and higher requirements for task execution speed, improving the performance of Spark has become an urgent need. It is a way to improve the performance of the Spark system by accelerating the memory usage efficiency to improve the running speed. [0003] At present, programmers need to use the cache() method in Spark to load RDD displays into memory. This method requires programmers to have a certain foundation for memory usage, and this ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F12/12G06F12/123
Inventor 陈康艾智远冯琳周佳祥
Owner 清能艾科(深圳)能源技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products