Data access method, device and system and AI accelerator

A data access device and data access technology, applied in memory systems, machine execution devices, electrical digital data processing, etc., can solve the problem that the number of multipliers cannot be increased indefinitely, and can not meet the user's demand for AI accelerator performance, The overall performance of the AI ​​accelerator is poor, which saves clock cycles, reduces the difficulty of data read control, and improves overall performance.

Active Publication Date:
View PDF11 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the limitation of chip area and power consumption, the number of multipliers cannot be increased without limit. Therefore, how to make full use of these limited number of multipliers has become the key to improving chip performance.
[0004] However, the overall performance of existing AI accelerators is poor and cannot meet the needs of users for the overall performance of AI accelerators

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data access method, device and system and AI accelerator
  • Data access method, device and system and AI accelerator
  • Data access method, device and system and AI accelerator

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] At present, PE arrays are usually synchronous digital sequential circuits, which are driven by the same clock. The ideal working state is: all multipliers can perform effective operations in each clock cycle, which requires that each clock cycle needs to be given Each multiplier is assigned two valid input data.

[0023] Usually, the input of parallel computing in the algorithm is a set of multidimensional data, and the amount of data is much larger than the number of multipliers, which requires a certain division of the input data, and assigns it to the PE array according to a specific format and order. Each multiplier in .

[0024] At present, the following two schemes are usually used to allocate data for each multiplier in the PE array:

[0025] Solution 1: Arrange the input data in advance according to the format and order required for parallel computing, and store them in the memory of the chip, so that the AI ​​accelerator can easily read the data and distribute...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data access method, device and system and an AI accelerator. The method comprises the following steps: reading to-be-calculated data stored in a memory; writing the read data into a preset cache block group based on the number of multipliers in the processing array, so that the data of the same write-in address of each cache block group corresponds to different read addresses in a memory, and the data stored in each cache block group is completely different; and enabling the processing array to read the to-be-calculated data from the preset cache block group for parallel calculation. By adopting the scheme, the overall performance of the AI accelerator can be improved.

Description

technical field [0001] The present invention relates to the field of AI acceleration technology, in particular to a data access method, device, system, and AI accelerator. Background technique [0002] Nowadays, artificial intelligence (AI) technology has been more and more applied to people's daily life, such as face recognition, image segmentation, speech recognition, speech synthesis and so on. The development of AI technology is inseparable from the progress of AI algorithms. The carrier necessary for the implementation of AI algorithms is chips, and more and more complex AI algorithms have put forward higher and higher performance requirements for chips, which is mainly reflected in the greatly improved computing power. In the past, system-on-chip (SOC) chips based on traditional central processing units (Central Processing Unit, CPU) and graphics processing units (Graphics Processing Unit, GPU) have been difficult to meet the algorithm requirements. Heterogeneous SOC...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/30G06F12/0877G06F7/523
CPCG06F9/3004G06F12/0877G06F7/523Y02D10/00
Inventor 刘聪
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products