Combining compute tasks for a graphics processing unit

A graphics processing unit and central processing unit technology, applied in the field of general computing, can solve problems such as unpredictable technical capabilities

Active Publication Date: 2017-02-22
APPLE INC
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It is not always practical for developers to tune for the "best" parameters for a given hardware platform, since d

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Combining compute tasks for a graphics processing unit
  • Combining compute tasks for a graphics processing unit
  • Combining compute tasks for a graphics processing unit

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0011] The present disclosure relates to systems, methods, and computer-readable media that improve hardware utilization. In general, techniques are disclosed for combining multiple work items into a single work item by adding code to the newly formed single work item to "unroll" the kernel so that it spans a developer-defined instance More instance effects. Additionally, multiple workgroups can be combined into a single workgroup to reduce the total number of workgroups that must be initialized on a given hardware. More specifically, techniques disclosed herein can vary the designation of work items, instances of work items, and the total number of work groups to more closely match the performance characteristics of the runtime hardware.

[0012] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of inventive concepts. As part of this description, some of the drawings of the present...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Methods, systems and devices are disclosed to examine developer supplied graphics code and attributes at run-time. The graphics code designed for execution on a graphics processing unit (GPU) utilizing a coding language such as OpenCL or OpenGL which provides for run-time analysis by a driver, code generator, and compiler. Developer supplied code and attributes can be analyzed and altered based on the execution capabilities and performance criteria of a GPU on which the code is about to be executed. In general, reducing the number of developer defined work items or work groups can reduce the initialization cost of the GPU with respect to the work to be performed and result in an overall optimization of the machine code. Manipulation code can be added to adjust the supplied code in a manner similar to unrolling a loop to improve execution performance.

Description

Background technique [0001] This disclosure generally relates to the field of general computing with respect to graphics processing units (GPGPUs) and how to perform optimizations on developer-defined workgroup characteristics. More specifically, but without limitation, this disclosure relates to techniques for coalescing (eg, combining) work items in a workgroup when the workgroup size appears large, and for coalescing (eg, combining) work items in a workgroup when the workgroup size appears too small. Stack workgroups to coalesce work items from different workgroups. In some cases, these two techniques can be used together to reduce the overall overhead associated with work tasks. [0002] In the field of parallel computing utilizing graphics processing units (GPUs), several computing languages ​​are available. For example, OpenCL and OpenGL are the standards developers use to interface with GPUs. A GPU can have multiple cores running in parallel to process programs calle...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/445
CPCG06F9/44505G06F8/4441G06F9/445
Inventor G·阿维卡罗古拉里A·K·侃K·C·崔
Owner APPLE INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products