Unlock instant, AI-driven research and patent intelligence for your innovation.

Universal CPU-oriented deep learning calculation acceleration method and system

A deep learning, CPU core technology, applied in the computer field, can solve problems such as non-support

Pending Publication Date: 2021-07-30
北京睿芯高通量科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this instruction set acceleration is only applicable to the ARM platform, and does not support CPUs of other architectures, such as x86 architecture CPUs, making the system unable to be used on devices with x86 CPUs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Universal CPU-oriented deep learning calculation acceleration method and system
  • Universal CPU-oriented deep learning calculation acceleration method and system
  • Universal CPU-oriented deep learning calculation acceleration method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] figure 2 It is a flowchart of a deep learning calculation acceleration method according to an embodiment of the present invention, such as figure 2 As shown, the present embodiment provides a general-purpose CPU-oriented deep learning calculation acceleration method, which includes the following steps:

[0053] Step 1: After the system is initialized, obtain the number of cores of the CPU and the instruction set supported by the CPU through assembly instructions;

[0054] In this embodiment, among them, step 1 specifically is:

[0055] Step 11: Initialize the CPU architecture acquisition module at first when the system is initialized;

[0056] Step 12: The CPU architecture acquisition module obtains the number of cores of the CPU and the instruction set supported by the corresponding CPU through assembly instructions, and verifies.

[0057] Wherein, if the CPU architecture acquiring module of this embodiment acquires that the CPU architecture is a quad-core CPU of ...

Embodiment 2

[0082] image 3 It is an architecture diagram of a deep learning computing acceleration system according to an embodiment of the present invention, such as image 3 As shown, this embodiment provides a general-purpose CPU-oriented deep learning computing acceleration system for implementing the method of Embodiment 1, which includes:

[0083] A CPU architecture acquirer (301), configured to acquire the CPU architecture;

[0084] An instruction set analyzer (302), connected to the CPU architecture acquirer (301), for sorting the instruction sets;

[0085] A model configuration pool (303), connected to the instruction set analyzer (302), for storing the configuration of each model;

[0086] The simulation reasoner (304) is connected with the model configuration pool (303), and is used to obtain the optimal configuration of the input model through simulation reasoning;

[0087] The model reasoner (305), connected to the model configuration pool (303), is used to obtain the opt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a universal CPU-oriented deep learning calculation acceleration method and system, wherein the method comprises the steps: 1, obtaining the number of CPU cores and instruction sets supported by a CPU through an assembly instruction after the system is initialized; 2, sequencing the obtained instruction sets based on the sequencing of the acceleration effects of different instruction sets built in a database, and generating a sequenced list; 3, putting the list as a global configuration into a model configuration pool, inputting the model, obtaining the optimal configuration of the model, and sending the optimal configuration and data into a model reasoning module; and 4, carrying out model reasoning by the model reasoning module, and outputting a final reasoning result.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular, to a general-purpose CPU-oriented deep learning calculation acceleration method and system. Background technique [0002] In many application scenarios, deep learning has certain restrictions on hardware when inferring (for example, only general-purpose CPU (central processing unit), no GPU (graphics processing unit)), but there are still relatively high requirements for inference speed, such as Perform face recognition and speech semantic recognition on the mobile terminal, smoke alarm in the security field, etc. In these fields, the speed of reasoning not only directly affects the software effect and experience, but also determines whether a product can gain a broader market. Therefore, how to optimize deep learning inference based on general-purpose CPUs to obtain faster inference speed has become one of the hottest directions in the field of artificial intelligence. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F16/21
CPCG06F9/5027G06F16/211G06F2209/5011G06F2209/5018
Inventor 琚午阳罗鑫
Owner 北京睿芯高通量科技有限公司