Check patentability & draft patents in minutes with Patsnap Eureka AI!

Method for broadcasting matrix data in parallel processing

一种处理器、共享数据的技术,应用在计算系统领域,能够解决性能降低等问题

Pending Publication Date: 2021-06-22
AMD SHANGHAI
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, performance degrades when memory bandwidth is constrained and applications have high data reuse, resulting in multiple memory accesses to the same data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for broadcasting matrix data in parallel processing
  • Method for broadcasting matrix data in parallel processing
  • Method for broadcasting matrix data in parallel processing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In the following description, numerous specific details are set forth in order to provide a thorough understanding of the methods and mechanisms presented herein. However, one of ordinary skill in the art would recognize that various embodiments may be practiced without these specific details. In some instances, well-known structures, components, signals, computer program instructions and techniques have not been shown in detail to avoid obscuring the methods described herein. It should be understood that, for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some elements may be exaggerated relative to other elements.

[0037]Various systems, apparatus, methods, and computer-readable media are disclosed for efficiently executing multiple units of work in a processor in parallel by reducing the number of memory accesses. In various implementations, a computing system includes ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a calculation system, in particular to a method for broadcasting matrix data in parallel processing in the calculation system. Systems, apparatuses, and methods are disclosed for efficient parallel execution of multiple work units in a processor by reducing the number of memory accesses. A calculation system includes a processor core having a parallel data architecture. One or more of the software application and firmware implement matrix operations and support broadcast of shared data to a plurality of calculation units of the processor core. The application creates thread groups by matching a compute kernel of the application with a data item and grouping the resulting work units into thread groups. The application assigns thread groups to the calculation units based on detecting the shared data between the calculation units. A single access request is generated rather than sending multiple read accesses to the memory subsystem to obtain the shared data. A single access request includes information recognizing a plurality of calculation units to receive shared data upon broadcast.

Description

technical field [0001] The invention relates to a computing system, in particular to a method for parallel processing medium matrix data broadcasting in the computing system. Background technique [0002] Parallelization of tasks is used to increase the throughput of computing systems. To do this, the compiler extracts parallelized tasks from the program code for parallel execution on the system hardware. Processor cores include deep pipelines configured to execute multiple threads. To further increase parallel execution on hardware, multi-core architectures include multiple processor cores. Computing systems overcome the performance limitations of conventional general-purpose cores by offloading specific tasks to specialized hardware. Some types of special-purpose hardware include single-instruction multiple-data (SIMD) parallel architectures, other types include field-programmable gate arrays (FPGAs), and still others include other special-purpose types of processing co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/41G06F9/54G06F15/78
CPCG06F8/45G06F9/542G06F15/7807G06F12/0831G06F12/084G06F12/0813G06F2212/507G06F9/4881G06F12/0815
Inventor 彭莉杨建汤迟
Owner AMD SHANGHAI
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More