Mask operation method of explicit independent mask register in GPU

A technology of a mask register and an operation method, applied in the field of image processing units, can solve the problems of frequent movement of data in implicit mask registers and general-purpose registers, increasing the power consumption of programmable processor cores, increasing the delay of program execution, and the like, Achieve the effect of avoiding invalid operand reading and performing pipeline operations, optimizing the instruction issue process, and reducing power consumption

Pending Publication Date: 2020-12-15
HUAXIA GENERAL PROCESSOR TECH INC
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a mask operation method of an explicit independent mask register in the GPU, to solve the problem that the graphics processor proposed in the above-mentioned background technology combines multiple vertices or work items in the kernel into threads, according to single instruction multiple Data mode to execute the corresponding shader program and kernel program. When conditional control codes appear in the program, one method of SIMD implementation to solve this problem is to use the execution mask to control the output of the execution result. Only when the corresponding bit in the mask When the value is 0x1, the corresponding destination operand will be rewritten. In order to save the encoding space of the GPU instruction word, a single implicit mask register is often used in graphics processors. When the conditional control statement in the shader or kernel code is relatively When multiple or nested, using a single implicit mask register ($exec) cannot change the mask register of the current instruction, resulting in frequent movement of data between the implicit mask register and general-purpose registers. In this way, not only additional Increase the number of program instructions, increase the execution delay of the program, and at the same time, the additional reading and writing between registers will increase the power consumption of the programmable processor core

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mask operation method of explicit independent mask register in GPU
  • Mask operation method of explicit independent mask register in GPU
  • Mask operation method of explicit independent mask register in GPU

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0023] The present invention provides a technical solution: a mask operation method of an explicit independent mask register in a GPU, comprising the following steps:

[0024] A mask operation method for an explicit independent mask register in a GPU, comprising the following steps:

[0025] S1: Each GPU hardware thread can access its own 8 128-bit-wide independent mask registers, marked as $m0~$m7;

[0026] S2: The data in $m0 defaults to the execution mask ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of image processing units, and particularly relates to a mask operation method of an explicit independent mask register in a GPU (Graphics Processing Unit), which comprises the following steps: S1, each GPU hardware thread can access respective eight 128-bit wide independent mask registers, and the eight 128-bit wide independent mask registers are recorded as m0-m7; according to the mask operation instruction of the explicit independent mask register in the GPU, each hardware thread in the GPU can access respective eight 128-bit wide independent mask registers, and four groups of mask operation instruction users can use the mask operation instruction to realize reduction operation, extension operation, logic operation and data movement among universal vector registers of the mask register respectively; the instruction can realize the generation of a branch mask in condition control, and meanwhile, the mask value is solved, so that the instruction transmitting process in the programmable core is optimized, invalid operand reading and pipeline operation execution are avoided, and the programmable power consumption is reduced.

Description

technical field [0001] The invention relates to the technical field of image processing units, in particular to a mask operation method of an explicit independent mask register in a GPU. Background technique [0002] Modern graphics processors contain many programmable processor cores and graphics-specific hardware acceleration units for executing shader codes. Initially, these processor cores were used to execute shader codes in graphics applications, and later found that they can handle non-graphics calculation-intensive tasks very well. applications, developed into a general-purpose graphics processor. [0003] In graphics applications, the GPU has to process a huge number of vertices and fragments. It is impossible to write codes for each of these objects individually. Therefore, the GPU programming model defines shaders for processing vertices and shaders for processing fragments. Users can The algorithm for processing vertices and fragments is described in Shader. A s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/30
CPCG06F9/30018G06F9/30036G06F9/30105G06F9/3887G06F9/3851G06F9/38
Inventor 殷诚信王磊
Owner HUAXIA GENERAL PROCESSOR TECH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products