Cross-multi-data-center data distributed processing acceleration method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A distributed processing and multi-data technology, applied in the field of data analysis, can solve problems such as insufficient consideration of site heterogeneity

Active Publication Date: 2021-03-19

NAT UNIV OF DEFENSE TECH

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The purpose of the present invention is to provide a method and system for accelerating data distributed processing across multiple data centers, so as to solve the technical defects in the prior art that do not fully consider the heterogeneity between sites

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0205] This embodiment evaluates the performance of SDTP by comparing SDTP with several classic task placement methods, and comparing the average response time and average slowdown, mainly by reducing the average response time and average slowdown compared with various methods to draw conclusions. Among them, slowdown is defined as the reduction rate of response time of a single job compared with other methods. For example, the response time of job A using In-Place is , the response time of job A using SDTP is ; Therefore, compared to using the In-Place response time, the slowdown of job A is . The average slowdown is the sum of all slowdowns for each job divided by the number of jobs.

[0206] Figure 9 (a) shows SDTP's improvement in average job response time with varying numbers of stations. Clearly, SDTP significantly outperforms other baseline methods. In particular, when the number of sites is 10, our method reduces the average job response time of all job type...

Embodiment 2

[0214] This example will quantify the effect of various parameters on SDTP, including and the number of compute instances, It is the ratio of the intermediate data volume to the input data import stage.

[0215] Figure 12 (a) depicts Impact. The figure shows the different The value of the response time and the the ratio of Yes response time. It can be seen that the job response time varies with increased by the increase. This is because the larger Will generate more intermediate data. Transmitting these intermediate data during the shuffle phase and processing this intermediate data during the reduce phase may increase the overall response time.

[0216] Figure 12 (b) illustrates the difference compared to In-Place, Iridium and Tetrium The reduction in the average response time of the value. It can be observed that as q increases, the reduction in average response time increases compared to Tetrium, while the reduction in average response time is rela...

Embodiment 3

[0220] This embodiment considers the impact of parallelism in parallel computing. The effect of the prediction method on the response time of different stages is first evaluated. Thereafter, the impact of computational properties on different methods in parallel computing and the improvement in the average response time of methods is evaluated taking into account the degree of parallelism.

[0221] This embodiment uses BigDataBench to measure the time of multiple queries running on Spark with different data volumes and degrees of parallelism. According to the results, this embodiment uses a multiple linear regression algorithm to build a prediction model for the calculation time of each stage. The results showed that the R2 statistic was greater than 0.9, where R was the correlation coefficient. The value of the F statistic is greater than the value according to the F distribution table. The probabilities p corresponding to the F statistic are all less than 0.0001. That is...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a cross-multi-data-center data distributed processing acceleration method. According to the method, each station can execute the corresponding calculation task as long as obtaining the required input data. And the processes of input data loading, map calculation, buffer transmission and reduce calculation of each site do not need to wait for the previous process of other sites to complete the corresponding operation. Meanwhile, accurate calculation time estimation is provided, the method adapts to the dynamic wide area network bandwidth to improve the practicability of the SDTP, and the response time of operation can be greatly shortened. The invention further provides a cross-multi-data-center data distributed processing acceleration system, corresponding to the method, the network and computing resources of the cross-regional distribution sites can be fully used, and therefore the cross-regional distribution data can be effectively analyzed without waiting forthe bottleneck site of the previous stage to complete the corresponding data transmission or computing task.

Description

technical field [0001] The invention relates to the field of data analysis, and specifically discloses a data distributed processing acceleration method across multiple data centers and a system thereof. Background technique [0002] Cloud providers such as Google, Amazon, and Alibaba have deployed data centers around the world to provide instant services. These services generate large amounts of data globally, including transaction data, user logs, and performance logs, among others. Mining these geographically distributed data (also known as wide-area analytics) is critical for business recommendations, anonymous detection, performance upgrades, and system maintenance, among others. A distributed computing framework such as Map-Reduce is usually implemented to mine such massive datasets. The main challenge of this computing method is the heterogeneity of hardware resources among geographically distributed sites, mainly including computing, uplink bandwidth and downlink b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): H04L12/24H04L29/08

CPCH04L41/0823H04L67/10H04L67/60

Inventor 郭得科陈亦婷袁昊郑龙罗来龙

Owner NAT UNIV OF DEFENSE TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cross-multi-data-center data distributed processing acceleration method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology