Hardware accelerator capable of configuring sparse attention mechanism

A technology of hardware accelerator and attention, which is applied to digital computer components, instruments, computers, etc., can solve the problems of lack of hardware acceleration capabilities, and achieve the effects of maintaining accuracy, low chip area, and high throughput

Pending Publication Date: 2022-01-07
PEKING UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this work only designed a few dot product operators at the hardware structure level, and did not have good hardware acceleration capabilities.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hardware accelerator capable of configuring sparse attention mechanism
  • Hardware accelerator capable of configuring sparse attention mechanism
  • Hardware accelerator capable of configuring sparse attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way. The present invention provides a hardware accelerator with a configurable sparse attention mechanism, which includes: a sampled dense matrix multiplication module, a mask segmentation and packaging module, and a configurable sparse matrix multiplication module. In the task of natural language processing and computer vision, when using the artificial neural network that processing attention mechanism comprises converter structure to carry out reasoning, can send three input matrices Q, K and V to the hardware accelerator that the present invention provides, and receive The output matrix of the hardware accelerator achieves the effect of improving the operation speed.

[0041] In natural language processing, the Q and K matrices are the encoding matrices of the words in the text, and V is the encoding ma...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hardware accelerator capable of configuring a sparse attention mechanism. The hardware accelerator comprises a sample dense matrix multiplication module, a mask block packaging module and a configurable sparse matrix multiplication module, wherein the sample dense matrix multiplication module adopts a hardware structure of a systolic array; the mask block packaging module comprises a column number counter, a row activation unit counter and a buffer area; the configurable sparse matrix multiplication module comprises a configurable arithmetic unit PE, a register array and a divider, and the configurable arithmetic unit is separated from the register array. According to the method, the sparse mode of the fractional matrix is efficiently and dynamically determined according to the characteristics of the input matrix, the high circulation amount can still be kept under the high sparseness, and the operation of the sparse attention mechanism can be efficiently and dynamically accelerated.

Description

technical field [0001] The invention relates to an artificial intelligence application hardware accelerator, in particular to a hardware accelerator capable of configuring a sparse attention mechanism, which is a multi-stage systolic array hardware accelerator that can be configured for the sparse attention mechanism. Background technique [0002] Artificial neural networks based on attention mechanism have played an important role in machine learning in recent years. Literature [1] (A.Vaswani, N.Shazeer, N.Parmar, J.Uszkoreit, L.Jones, A.N.Gomez, Kaiser, and I. Polosukhin, "Attention is all you need," in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp.6000–6010.) document the attention mechanism. Among them, the Transformer structure uses the attention mechanism as its basic component, and has excellent performance in various artificial intelligence tasks, such as language models in the field of natural language processi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F30/331G06F17/16G06F15/78G06F7/523
CPCG06F30/331G06F15/7871G06F17/16G06F7/523
Inventor 梁云卢丽强罗梓璋金奕成
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products