A cascade fault-tolerant processing method adapted to cpu and gpu heterogeneous clusters

A technology of heterogeneous clusters and processing methods, applied in the computer field, can solve problems such as long running time, large amount of data access, high computational complexity, etc., and achieve the effect of reducing losses and quickly resetting services

Active Publication Date: 2020-05-15
北京丁牛科技有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Large-scale data analysis and processing applications have high computational complexity, use large-scale computing resources, run for a long time, and require high reliability of computing resources. Some application services have a large amount of data access and require large-scale read-write storage devices; some Communication is intensive and requires high network transmission of the system; some occupy a large amount of GPU space, frequent read and write operations, and require high GPU stability; some processes are complex, with many intermediate links, and single-step errors affect the entire service process. High fault tolerance requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A cascade fault-tolerant processing method adapted to cpu and gpu heterogeneous clusters
  • A cascade fault-tolerant processing method adapted to cpu and gpu heterogeneous clusters
  • A cascade fault-tolerant processing method adapted to cpu and gpu heterogeneous clusters

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043]Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

[0044] Below, the present invention will be described in detail in conjunction with specific implementation examples. The following implementation examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several changes and improvements without d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a cascading fault-tolerant processing method adaptive to a CPU and GPU heterogeneous cluster. The method comprises the steps that a data transmission consistency detection model is constructed and used for detecting consistency of data transmission; a data access consistency model is constructed and used for realizing consistency of data access between a CPU and a GPU; a data operation result correctness detection model is constructed and used for detecting the correctness of a thread data operation result; a service backup model is constructed on an application layer and used for backing up historical records of service running; and a service job information backup model is constructed on a system layer and used for backing up job information of service running. In this way, when a non-physical damage fault occurs to the CPU and GPU heterogeneous cluster, a service fault is quickly positioned for business personnel, a pre-fault state is extracted, a service is reset quickly, and losses are reduced.

Description

technical field [0001] The present disclosure relates to the field of computer technology, and in particular to a cascade fault-tolerant processing method adapted to CPU and GPU heterogeneous clusters. Background technique [0002] Since the birth of the first computer, computer technology has developed rapidly, ranging from personal computers to supercomputers in various forms. Human’s demand for computing is endless. In order to meet human’s demand for computing power, the speed of computers produced is getting faster and faster. Occupying the first and second places in the world's supercomputer rankings, the operating capacity reached 100Pbit (petabit, gigabit) for the first time, reflecting the comprehensive capabilities of our country. But at the same time, supercomputing capability is accompanied by a super complex and huge computer architecture. For example, the main computing capability of "Tianhe-2" is provided by GPU (Graphics Processing Unit, Graphics Processing ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/14G06F9/54
CPCG06F9/546G06F11/1443G06F11/1458
Inventor 姜海王忠儒李海磊
Owner 北京丁牛科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products