A method and system for accelerating data distributed processing across multiple data centers

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A distributed processing, multi-data technology, applied in the field of data analysis, can solve problems such as insufficient consideration of site heterogeneity, and achieve the effect of reducing job processing time, accurate time estimation, and job response time.

Active Publication Date: 2021-05-11

NAT UNIV OF DEFENSE TECH

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The purpose of the present invention is to provide a method and system for accelerating data distributed processing across multiple data centers, so as to solve the technical defects in the prior art that do not fully consider the heterogeneity between sites

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0205] This example evaluates the performance of SDTP by comparing SDTP with several classical task placement methods in terms of average response time and average slowdown, mainly by reducing average response time and average slowdown compared to various methods to draw conclusions. Among them, slowdown is defined as the response time reduction rate of a single job compared to other methods. For example, the response time for job A using In-Place is , the response time of job A using SDTP is ; so the slowdown for job A compared to using the In-Place response time is . Average slowdown is the sum of all slowdowns for each job divided by the number of jobs.

[0206] Figure 9 (a) shows the SDTP improvement in average job response time with different number of sites. Clearly, SDTP significantly outperforms other benchmark methods. In particular, when the number of sites is 10, the method of the present invention reduces the average job response time of all job types by...

Embodiment 2

[0214] This example will quantify the impact of various parameters on SDTP, including and count the number of instances, is the ratio of the amount of intermediate data to the input data import stage.

[0215] Figure 12 (a) depicts Impact. The figure indicates the different The response time of the value is the same as ratio, where Yes response time. It can be seen that the job response time increases with increases with the increase. This is because the larger More intermediate data will be generated. Both transmitting this intermediate data during the shuffle phase and processing it during the reduce phase may increase the overall response time.

[0216] Figure 12 (b) illustrates the difference compared to In-Place, Iridium and Tetrium The reduction in the average response time of the value. It can be observed that as q increases, the decrease in mean response time increases compared to Tetrium, while the decrease in mean response time is relatively...

Embodiment 3

[0220] This embodiment considers the influence of parallelism in parallel computing. The effect of the forecasting method on the response time of different stages is first evaluated. Thereafter, the impact of computational properties on different methods in parallel computing and the improvement of the average response time of the methods are evaluated under consideration of the degree of parallelism.

[0221] This example uses BigDataBench to measure the time of multiple queries running on Spark with varying amounts of data and degrees of parallelism. According to the results, the present embodiment uses the multiple linear regression algorithm to construct a prediction model of the computation time of each stage. The results show that the R2 statistics are all greater than 0.9, where R is the correlation coefficient. The value of the F statistic is greater than the value according to the F distribution table. The probabilities p corresponding to the F statistic are all le...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention proposes a method for accelerating data distributed processing across multiple data centers. In this method, each station can perform corresponding computing tasks as long as it obtains the required input data. The input data loading, map calculation, shuffle transfer, and reduce calculation processes of each site do not need to wait for the previous processes of other sites to complete the corresponding operations. At the same time, the present invention provides accurate calculation time estimation, and makes the method of the present invention adapt to the dynamic wide area network bandwidth to improve the practicability of SDTP, and can greatly reduce the response time of the job. The present invention also proposes a data distributed processing acceleration system across multiple data centers. Corresponding to the above method, the network and computing resources of cross-regional distribution sites can be fully used, thereby effectively analyzing cross-regional distributed data without waiting The bottleneck site in the previous stage completes the corresponding data transmission or computing tasks.

Description

technical field [0001] The invention relates to the field of data analysis, and specifically discloses a data distributed processing acceleration method and system across multiple data centers. Background technique [0002] Cloud providers such as Google, Amazon and Alibaba have deployed data centers around the world to provide instant services. These services generate massive amounts of data on a global scale, including transaction data, user logs, and performance logs, among others. Mining this geographically distributed data (also known as wide-area analysis) is critical for business advice, anonymous detection, performance upgrades, and system maintenance, among others. Distributed computing frameworks such as Map-Reduce are often implemented to mine such massive datasets. The main challenge of this computing approach is the heterogeneity of hardware resources among geographically distributed sites, mainly including computation, uplink bandwidth and downlink bandwidth....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): H04L12/24H04L29/08

CPCH04L41/0823H04L67/10H04L67/60

Inventor 郭得科陈亦婷袁昊郑龙罗来龙

Owner NAT UNIV OF DEFENSE TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A method and system for accelerating data distributed processing across multiple data centers

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology