Distributed Compute Work Parser Circuitry using Communications Fabric

a compute work and circuit technology, applied in the field of parallel processing, can solve problems such as substantial affecting the performance and power consumption of compute tasks

Active Publication Date: 2020-03-26
APPLE INC
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Fetching and distributing compute work efficiently may substantially affect performance and power consumption for compute tasks.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed Compute Work Parser Circuitry using Communications Fabric
  • Distributed Compute Work Parser Circuitry using Communications Fabric
  • Distributed Compute Work Parser Circuitry using Communications Fabric

Examples

Experimental program
Comparison scheme
Effect test

example front -

Example Front-End Circuitry for Control Stream

[0050]FIG. 3 is a block diagram illustrating example circuitry configured to fetch compute control stream data, according to some embodiments. In the illustrated embodiment, front-end circuitry 300 includes stream fetcher 310, control stream data buffer 320, fetch parser 330, indirect fetcher 340, execute parser 350, and execution packet queue 360. In some embodiments, decoupling of fetch parsing and execution parsing may advantageously allow substantial forward progress in fetching in the context of links and redirects, for example, relative to point reached by actual execution.

[0051]Note that, in some embodiments, output data from circuitry 300 (e.g., in execution packet queue 360) may be accessed by global workload parser 210 for distribution.

[0052]In some embodiments, the compute control stream (which may also be referred to as a compute command stream) includes kernels, links (which may redirect execution and may or may not include ...

example device

[0114]Referring now to FIG. 11, a block diagram illustrating an example embodiment of a device 1100 is shown. In some embodiments, elements of device 1100 may be included within a system on a chip. In some embodiments, device 1100 may be included in a mobile device, which may be battery-powered. Therefore, power consumption by device 1100 may be an important design consideration. In the illustrated embodiment, device 1100 includes fabric 1110, compute complex 1120 input / output (I / O) bridge 1150, cache / memory controller 1145, graphics unit 150, and display unit 1165. In some embodiments, device 1100 may include other components (not shown) in addition to and / or in place of the illustrated components, such as video processor encoders and decoders, image processing or recognition elements, computer vision elements, etc.

[0115]Fabric 1110 may include various interconnects, buses, MUX's, controllers, etc., and may be configured to facilitate communication between various elements of devic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Techniques are disclosed relating to distributing work from compute kernels using a distributed hierarchical parser architecture. In some embodiments, an apparatus includes a plurality of shader units configured to perform operations for compute workgroups included in compute kernels processed by the apparatus, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and a master workload parser circuit. In some embodiments, the master workload parser circuit is configured to iteratively determine a next position in multiple dimensions for a next batch of workgroups from the kernel and send batch information to the distributed workload parser circuits via the communications fabric to assign the batch of workgroups. In some embodiments, the distributed parsers maintain coordinate information for the kernel and update the coordinate information in response to the batch information, even when the distributed parsers are not assigned to execute the batch.

Description

BACKGROUNDTechnical Field[0001]This disclosure relates generally to parallel processing and more particularly to distributing compute kernels to processing elements (e.g., GPU shader cores) in distributed architectures.Description of the Related Art[0002]Given their growing compute capabilities, graphics processing units (GPUs) are now being used extensively for large-scale compute workloads. APIs such as Metal and OpenCL give software developers an interface to access the compute power of the GPU for their applications. In recent times, software developers have been moving substantial portions of their applications to using the GPU. Furthermore, GPUs are becoming more powerful in new generations.[0003]Compute work is often specified as kernels that are multi-dimensional aggregations of compute workgroups. For example, a program executed by a central processing unit may use one or more compute kernels that are compiled for another processor such as a GPU or digital signal processor ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06T15/00G06F9/50
CPCG06T15/005G06F9/5027G06T2200/28
Inventor HAVLIR, ANDREW M.BOWMAN, BENJAMINBRADY, JEFFREY T.
Owner APPLE INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products