Data processing system and method based on distributed caching

A data processing system and distributed cache technology, applied in the field of data processing, can solve problems such as reflecting data access characteristics, no good solutions, computing resource allocation strategies, and real-time load interference, so as to improve execution efficiency and reduce data transmission The effect of increasing data hit rate

Active Publication Date: 2015-12-09
GUILIN UNIV OF ELECTRONIC TECH
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, memcache also has shortcomings. Distributed storage systems in the field of file backup rarely use memcache for distributed caching
[0007] These mechanisms are all oriented to the traditional data center platform architecture. However, the tightly coupled deployment of computing resources and storage resources on the map / reduce platform and the characteristics of data localization processing make the statistical results of data access characteristics based on data fast be affected by computing resource allocation strategies and The interference of real-time load makes it difficult to completely and truly reflect the characteristics of data access
[0008] Aiming at the need to read a large amount of data during the execution of the mapreduce task, the storage and transmission of the intermediate processing results put a huge pressure on network transmission and I / O bandwidth, and there is no good solution in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing system and method based on distributed caching
  • Data processing system and method based on distributed caching

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0051] There are generally three important roles in HDFS: namenode, datanode, and client. Namenode can be regarded as the manager of the distributed file system. It is mainly responsible for managing the namespace of the file system, cluster configuration information, and replication of stored data blocks. Namenode will store the metadata (Meta-data) of the file system in the memory, which mainly includes file information, the data block information corresponding to each file, and the location information of the data block in the namenode. Datanode is the basic unit of file storage. It stores data blocks in the local file system, saves the metadata information of the data blocks, and periodically sends the informati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data processing system based on distributed caching. The data processing system comprises a mapreduce data processing module, a map task memory processing module and a reduce distributed caching module, wherein the mapreduce data processing module is used for decomposing submitted user jobs into multiple map tasks and multiple reduce tasks, the map task memory processing module is used for processing the map tasks, and the reduce distributed caching module is used for processing the map tasks through the reduce tasks. The invention further relates to a data processing method based on distributed caching. The data processing system and method have the advantages of mainly serving for the map tasks, optimizing map task processing data, ensuring that the map can find target data within the shortest time and transmitting an intermediate processing result at the highest speed; data transmission quantity can be reduced, the data can be processed in a localized mode, the data hit rate is increased, and therefore the execution efficiency of data processing is promoted.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a distributed cache-based data processing system and processing method. Background technique [0002] Apache Hadoop (generally abbreviated as hadoop) is an open source distributed data processing platform. Its core mainly includes two parts, distributed file system (HDFS) and mapreduce computing model. [0003] HDFS shows great advantages in the storage of large-scale data, however, it has great shortcomings in dealing with real-time data reading. Since a large amount of data needs to be read during the execution of mapreduce tasks, it will cause huge pressure on network transmission and disk I / O (Input / Output) bandwidth, so it is necessary to set up a cache system on the basis of HDFS to reduce data transmission. Quantity to improve the execution efficiency of mapreduce. [0004] The mapreduce data calculation process can be divided into two stages: map and red...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/172G06F16/182
Inventor 蔡晓东王丽娟陈超村赵勤鲁吕璐甘凯今王迪杨超宋宗涛刘馨婷
Owner GUILIN UNIV OF ELECTRONIC TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products