Hardware architecture of accelerated artificial intelligence processor

A technology of artificial intelligence and hardware architecture, applied in the field of artificial intelligence, can solve problems such as inapplicability, achieve high performance, improve scalability, and accelerate the work of artificial intelligence

Inactive Publication Date: 2019-01-11
NANJING ILUVATAR COREX TECH CO LTD (DBA ILUVATAR COREX INC NANJING)
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As for CPU and DSP solutions, the core of their computer is a...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hardware architecture of accelerated artificial intelligence processor
  • Hardware architecture of accelerated artificial intelligence processor
  • Hardware architecture of accelerated artificial intelligence processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The present invention is described in further detail now in conjunction with accompanying drawing.

[0024] Such as figure 1 As shown, the artificial intelligence feature map can usually be described as a four-dimensional tensor [N, C, Y, X]. These four dimensions are, feature map dimension: X, Y; channel dimension: C; batch dimension: N. A kernel can be a 4D tensor [K,C,S,R]. The AI ​​job is to give the input feature map tensor and kernel tensor, we according to figure 1 The formula in computes the output tensor [N,K,Y,X].

[0025] Another important operation in AI is matrix multiplication, which can also be mapped to feature map processing. exist figure 2 In , matrix A can be mapped to tensor [1,K,1,M], matrix B can be mapped to tensor [N,K,1,1], and the result C is tensor [1,N,1,M].

[0026] In addition, there are other operations, such as normalization and activation, which can be supported in general-purpose hardware operators.

[0027] We propose a hardwar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A Hardware architecture for an accelerated artificial intelligence processor includes: a main engine, a front lobe engine, a parietal lobe engine, a renderer engine, a pillow engine, a temporal lobe engine and a memory. The front-lobe engine obtains 5D tensor from the host and divides it into several sets of tensors, and sends these sets of tensors to the top-lobe engine. The front-lobe engine obtains 5D tensors from the host and divides them into several sets of tensors. The top engine acquires a set of tensors and divides them into a plurality of tensor waves, sends the tensor waves to the renderer engine to execute an input feature renderer, and outputs a portion of the tensors to the pincushion engine. The pincushion engine accumulates a partial tensor and executes an output feature renderer to obtain a final tensor sent to the temporal lobe engine. The temporal lobe engine compresses the data and writes the final tensor to memory. The artificial intelligence work in the inventionis divided into a plurality of highly parallel parts, some parts are allocated to an engine for processing, the number of engines is configurable, the scalability is improved, and all work partitioning and distribution are realized in the architecture, thereby obtaining high-performance efficiency. The artificial intelligence work in the invention is divided into a plurality of highly parallel parts, and some parts are allocated to an engine for processing, and the number of engines is configurable, and the scalability is improved.

Description

technical field [0001] The invention belongs to the field of artificial intelligence, and in particular relates to a hardware architecture for accelerating an artificial intelligence processor. Background technique [0002] Artificial intelligence (AI) processing, a hot topic these days, is both compute- and memory-intensive and requires high performance-power efficiency. Accelerating with current devices such as CPUs and GPUs is not easy, and many solutions such as GPU+TensorCore, TPU, CPU+FPGA, and AI ASIC try to solve these problems. GPU+ TensorCore mainly focuses on solving computing-intensive problems, TPU focuses on computing and data reuse, and CPU+ FPGA / AI ASIC focuses on improving performance-power efficiency. [0003] However, only one-third of the logic of the GPU is used for AI, so higher performance efficiency cannot be obtained. TPUs require more software work to reshape the data layout and split up jobs and send them to the computing cores. As for CPU and D...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06T1/20
CPCG06T1/20G06N3/063G06F13/124G06N3/048G06F9/5027G06F13/00G06N3/04
Inventor 李云鹏倪岭邵平平刘伟栋蔡敏
Owner NANJING ILUVATAR COREX TECH CO LTD (DBA ILUVATAR COREX INC NANJING)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products