Deep learning calculation method and device, chip and medium

A deep learning and computing method technology, applied in the computer field, can solve the problem of not supporting deep learning computing on chip, unable to fully utilize the computing performance of chip, etc., to achieve the effect of improving processing efficiency

Active Publication Date: 2021-08-31
上海燧原科技有限公司
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Currently, the existing ASIC (Application Specific Integrated Circuit, Application Specific Integrated Circuit) chips and computing frameworks (such as TensorFlow or pytorch) do not support on-chip distributed deep learning computing, including training and reasoning, and cannot fully utilize the computing performance of the chip

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep learning calculation method and device, chip and medium
  • Deep learning calculation method and device, chip and medium
  • Deep learning calculation method and device, chip and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] figure 1 It is a flow chart of a deep learning computing method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of implementing distributed computing on a chip. The method can be executed by the deep learning computing device provided by the embodiment of the present invention. The device can be implemented in the form of software and / or hardware, and generally can be integrated in a chip.

[0029] Such as figure 1 As shown, the deep learning calculation method provided in this embodiment is applied to the chip, including:

[0030] S110. Obtain an initial calculation graph.

[0031] The core of a machine learning task is the definition of the model and the parameter solution method of the model. After abstracting the two, a unique calculation logic can be determined, and this logic is represented by a graph, which is called a calculation graph. Computational graphs are directed acyclic graphs, which define the flow of...

Embodiment 2

[0049] figure 2 It is a flow chart of a deep learning calculation method provided by Embodiment 2 of the present invention. This embodiment is embodied on the basis of the foregoing embodiments, wherein generating a reconstructed calculation graph based on the initial calculation graph can be specifically:

[0050] determining data input nodes, computational node groups, and trainable variable nodes in the initial computational graph;

[0051] Adjust the input subgraph structure corresponding to the data input node, the variable subgraph structure corresponding to the trainable variable node, copy the calculation subgraph structure corresponding to the calculation node group, and add a calculation result summary subgraph Graph structure to obtain the reconfiguration calculation graph;

[0052] Wherein, execution devices corresponding to the computation subgraph structure in the initial computation graph and the copied computation subgraph structure are different computation ...

Embodiment 3

[0096] Figure 5 It is a flow chart of a deep learning calculation method provided by Embodiment 3 of the present invention. This embodiment provides a specific implementation manner on the basis of the foregoing embodiments.

[0097] Such as Figure 5 As shown, the deep learning calculation method provided in this embodiment is applied to the chip, including:

[0098] S310. Obtain an initial calculation graph.

[0099] S320. Determine whether the initial calculation graph and the chip hardware structure meet the on-chip distributed computing condition, if yes, execute S330, and if not, execute S360.

[0100] After the chip obtains the initial calculation graph, if it determines that the chip driver is in the on-chip distributed multi-computing cluster mode, and the deep learning calculation type of the initial calculation graph is training or inference, and the computing node deployment belongs to heterogeneous computing, then determine the The initial calculation graph an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a deep learning calculation method and a device, a chip and a medium. The method comprises the steps of obtaining an initial calculation graph; generating a reconstructed calculation graph according to the initial calculation graph; wherein the reconstruction calculation graph comprises a plurality of calculation node groups, and execution devices corresponding to different calculation node groups are different calculation clusters in the chip; and processing the reconstructed calculation graph through a plurality of calculation clusters in the chip. According to the technical scheme, distributed calculation in the chip is achieved, the calculation performance and the storage performance of all the calculation clusters in the chip are fully utilized, and the processing efficiency of the chip on the initial calculation graph is improved.

Description

technical field [0001] Embodiments of the present invention relate to the field of computer technology, and in particular, to a deep learning computing method, device, chip, and medium. Background technique [0002] With the development of deep learning, deep learning models can be trained or reasoned on multiple computing devices, realizing distributed deep learning computing among computing devices. [0003] Currently, existing ASIC (Application Specific Integrated Circuit) chips and computing frameworks (such as TensorFlow or pytorch) do not support on-chip distributed deep learning computing, including training and reasoning, and cannot give full play to the computing performance of the chip. Contents of the invention [0004] Embodiments of the present invention provide a deep learning computing method, device, chip, and medium, so as to realize distributed deep learning computing in the chip and make full use of the computing performance and storage performance of th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06N20/00
CPCG06F9/5027G06N20/00Y02D10/00
Inventor 方智毅丁圣阁贾明桥程伟王皓陶芝伟
Owner 上海燧原科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products