Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Policies for Shader Resource Allocation in a Shader Core

a shader core and shader technology, applied in the field of computing systems, can solve the problems of poor shader core performance, limited memory latency and power consumption, and gpus did not have as rich a programming ecosystem

Inactive Publication Date: 2013-06-20
ADVANCED MICRO DEVICES INC
View PDF8 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

This patent text describes a method for efficiently processing data in devices with multiple compute pipelines. The method selects queues of data for processing based on priority criteria and determines which queue to process at a given time. This allows for efficient use of resources and faster processing times. The invention is not limited to specific embodiments and can be applied to various devices with accelerated processing capabilities. The technical effects of this invention are improved efficiency and productivity in accelerated processing devices.

Problems solved by technology

These constraints arose from the fact that GPUs did not have as rich a programming ecosystem as CPUs.
Their use, therefore, has been mostly limited to two dimensional (2D) and three dimensional (3D) graphics and a few leading edge multimedia applications, which are already accustomed to dealing with graphics and video application programming interfaces (APIs).
Although OpenCL and DirectCompute are a promising start, there are many hurdles remaining to creating an environment and ecosystem that allows the combination of the CPU and GPU to be used as fluidly as the CPU for most programming tasks.
Both of these arrangements, however, still include significant challenges associated with (i) separate memory systems, (ii) efficient scheduling, (iii) providing quality of service (QoS) guarantees between processes, (iv) programming model, and (v) compiling to multiple target instruction set architectures (ISAs)—all while minimizing power consumption.
While these external interfaces (e.g., chip to chip) negatively affect memory latency and power consumption for cooperating heterogeneous processors, the separate memory systems (i.e., separate address spaces) and driver managed shared memory create overhead that becomes unacceptable for fine grain offload.
By way of example, computational commands (e.g., physics or artificial intelligence commands) often cannot be sent to the GPU for execution.
This limitation exists because the CPU may relatively quickly require the results of the operations performed by these computational commands.
However, because of the high overhead of dispatching work to the GPU in current systems and the fact that these commands may have to wait in line for other previously-issued commands to be executed first, the latency incurred by sending computational commands to the GPU is often unacceptable.
Given that a traditional GPU may not efficiently execute some computational commands, the commands must then be executed within the CPU.
Having to execute the commands on the CPU increases the processing burden on the CPU and can hamper overall system performance.
Although GPUs provide excellent opportunities for computational offloading, traditional GPUs may not be suitable for system-software-driven process management that is desired for efficient operation in some multi-processor environments.
These limitations can create several problems.
For example, since processes cannot be efficiently identified and / or preempted, a rogue process can occupy the GPU hardware for arbitrary amounts of time.
In other cases, the ability to context switch off the hardware is severely constrained—occurring at very coarse granularity and only at a very limited set of points in a program's execution.
This constraint exists because saving the necessary architectural and microarchitectural states for restoring and resuming a process is not supported.
Lack of support for precise exceptions prevents a faulted job from being context switched out and restored at a later point, resulting in lower hardware usage as the faulted threads occupy hardware resources and sit idle during fault handling.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Policies for Shader Resource Allocation in a Shader Core
  • Policies for Shader Resource Allocation in a Shader Core
  • Policies for Shader Resource Allocation in a Shader Core

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027]In the detailed description that follows, references to “one embodiment,”“an embodiment,”“an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0028]The term “embodiments of the invention” does not require that all embodiments of the invention include the discussed feature, advantage or mode of operation. Alternate embodiments may be devised without departing from the scope of the invention, and well-known elements ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method of determining priority within an accelerated processing device is provided. The accelerated processing device includes compute pipeline queues that are processed in accordance with predetermined criteria. The queues are selected based on priority characteristics and the selected queue is processed until a time quantum lapses or a queue having a higher priority becomes available for processing.

Description

BACKGROUND[0001]1. Field of the Invention[0002]The present invention is generally directed to computing systems. More particularly, the present invention is directed to arbitration policies for allocating graphic processing unit resources among multiple pipeline inputs.[0003]2. Background Art[0004]The desire to use a graphics processing unit (GPU) for general computation has become much more pronounced recently due to the GPU's exemplary performance per unit power and / or cost. The computational capabilities for GPUs, generally, have grown at a rate exceeding that of the corresponding central processing unit (CPU) platforms. This growth, coupled with the explosion of the mobile computing market and its necessary supporting server / enterprise systems, has been used to provide a specified quality of desired user experience. Consequently, the combined use of CPUs and GPUs for executing workloads with data parallel content is becoming a volume technology.[0005]However, GPUs have tradition...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06T1/20
CPCG06F9/3851G06T1/20G06F9/3879G06F9/4881G06F9/3888
Inventor HARTOG, ROBERT SCOTTLEATHER, MARKMANTOR, MICHAELMCCRARY, REXNUSSBAUM, SEBASTIENROGERS, PHILIP J.TAYLOR, RALPH CLAYWOLLER, THOMAS
Owner ADVANCED MICRO DEVICES INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products