Parallel programming method oriented to data intensive application based on multiple data architecture centers

A technology oriented to data and programming methods, applied in multi-programming devices, resource allocation, etc., can solve problems such as data block or data set movement, achieve high parallel efficiency and improve processing performance.

Inactive Publication Date: 2013-01-16
CENT FOR EARTH OBSERVATION & DIGITAL EARTH CHINESE ACADEMY OF SCI
View PDF1 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The use of workflow mode on multiple data centers has the following limitations: 1) workflow provides coarse-grained parallelism, and cannot meet the needs of high-throughput data processing, which often requires massive parallel processing; 2) typical data-intensive Computational workflow systems require a large amount of data transfer between multiple tasks, sometimes resulting in unnecessary movement of data blocks or data sets; 3) Workflow systems must consider the fault tolerance of task execution and data transmission, fault tolerance is the implementation of data-intensive computing important question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel programming method oriented to data intensive application based on multiple data architecture centers
  • Parallel programming method oriented to data intensive application based on multiple data architecture centers
  • Parallel programming method oriented to data intensive application based on multiple data architecture centers

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] A data-intensive application-oriented parallel programming method based on a multi-data architecture center described in an embodiment of the present invention includes the following steps:

[0022] 1) Build the main node of the system architecture, receive several jobs submitted by users, and divide the jobs into corresponding sub-level tasks through the main node, and distribute the sub-level tasks to the sub-nodes; the main node It is divided into upper and lower floors. First, install the Hadoop and Gfarm software packages on the computer system that is the master node of the system. Hadoop is a distributed computing platform that implements a distributed file system. Gfarm is also a distributed file system. Hadoop works on the upper layer of the master node and is responsible for job submission and tracking management. Gfarm works under the master node and is responsible for managing the storage system. Secondly, a GfarmFS Hadoop-Plugin plug-in software program...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a parallel programming method oriented to data intensive application based on multiple data architecture centers. The method comprises the following steps of: constructing a main node of system architecture, constructing a sub node of the system architecture, performing loading, performing execution and the like. The parallel programming method has the advantages that a technicist in the field of a large scale of data intensive scientific data does not need to know well a parallel calculation mode based on multiple data centers and does not need to have a MapReduce and multi-point interface (MPI) parallel programming technology relevant to high-performance calculation; a plurality of distributed clusters are simply configured, and a MapReduce calculation task is loaded to the distributed clusters; the hardware and software collocation of the existing cluster system is not required to be changed, and the architecture can quickly parallel the data intensive application based on the MapReduce programming model on multiple data centers; and therefore, relatively high parallelization efficiency is achieved, and the processing capacity of the large-scale distributed data intensive scientific data can be greatly improved.

Description

technical field [0001] The invention relates to the technical field of large-scale data-intensive scientific data processing, in particular to a data-intensive application-oriented parallel programming method based on a multi-data architecture center. Background technique [0002] The rapid growth of the World Wide Web has produced a vast amount of information available online. In addition, social, scientific, and engineering applications generate large amounts of structured and unstructured information that needs to be processed, analyzed, and connected. Currently, typical data-intensive computing uses data center architectures and large-scale data processing models. The invention studies a large-scale data processing model based on multiple data centers. [0003] The need for data-intensive scientific data analysis across multiple distributed clusters or data centers has grown significantly in recent years. A good example of data-intensive analysis is the field of high ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/46G06F9/50
Inventor 王力哲
Owner CENT FOR EARTH OBSERVATION & DIGITAL EARTH CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products