Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Deployment method and system of reasoning server supporting multiple models and multiple chips and electronic equipment

A server and multi-chip technology, applied in the computer field, can solve the problems of accelerated processor deployment that is not easy and only supports, and achieve the effect of improving operating performance and good scalability

Pending Publication Date: 2022-01-28
INSPUR SUZHOU INTELLIGENT TECH CO LTD
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The emergence of these inference servers facilitates the deployment of models in the production environment, but these inference servers have two major defects: 1. They can only support very limited model types; 2. They can only support limited accelerated processors, usually It is CPU and GPU, but it is not so easy to deploy other accelerated processors. For example, the Tensorflow serving inference server only supports CPU, GPU, and TPU, and the Triton inference server only supports CPU and GPU.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deployment method and system of reasoning server supporting multiple models and multiple chips and electronic equipment
  • Deployment method and system of reasoning server supporting multiple models and multiple chips and electronic equipment
  • Deployment method and system of reasoning server supporting multiple models and multiple chips and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] figure 1 It is a flowchart of a method for deploying an inference server supporting multiple models and multiple chips in Embodiment 1 of the present invention. This embodiment is applicable to the situation where the inference service framework is deployed uniformly, and the method can be executed by a deployment system supporting multi-model and multi-chip inference servers. The system can be implemented by software and / or hardware, and Integrated in a server, wherein the reasoning server can be configured in the server.

[0035] For the convenience of description and understanding, the deployment process of the present invention will be described in detail in the subsequent description by taking the Triton reasoning server as an example. However, this method is not limited to the application of Triton inference server, and inference servers such as tensorflowserving or torchserve can also be used.

[0036] Triton Inference Server is an open source inference framewo...

Embodiment 2

[0079] figure 2 It is a schematic structural diagram of a deployment system of an inference server supporting multiple models and multiple chips in Embodiment 2 of the present application. Such as figure 2 As shown, the deployment system includes: a backend service plug-in generation unit, a plug-in access unit, and an execution unit.

[0080] Wherein, the back-end service plug-in generation unit is used to generate a TVM compiler back-end service plug-in conforming to the specified format according to the specified format accessed by the back-end of the reasoning server;

[0081] A plug-in access unit, configured to connect the back-end service plug-in to the reasoning server;

[0082] The execution unit is used to call the TVM compiler through the backend service plug-in to perform inference operations on the specified accelerator chip according to the inference request of the client. Wherein, when the DSP loaded with TVM is running, the DSP chip is used for reasoning, ...

Embodiment 3

[0085] According to the embodiments disclosed in the present application, the present application also provides an electronic device, a readable storage medium, and a computer program product.

[0086] image 3 A schematic block diagram of an example electronic device 700 that may be used to implement embodiments disclosed herein is shown. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the present disclosure described and / or claimed herein.

[0087] Such ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of computers, and discloses a deployment method and system for an inference server supporting multiple models and multiple chips. The method comprises the steps that a back-end service plug-in of a TVM compiler is customized, and the file format of the back-end service plug-in meets the file format specified by the back-end access of the inference server; accessing the back-end service plug-in to an inference server; and the inference server receives an inference request of a client, and calls the TVM compiler to perform inference operation on a specified accelerator chip through the back-end service plug-in. According to the method, rapid deployment of different accelerator chips based on different types of models in the same reasoning framework is realized.

Description

technical field [0001] The invention belongs to the field of computer technology, and in particular relates to a deployment method, system, storage medium and electronic equipment of an inference server supporting multiple models and multiple chips. Background technique [0002] With the rise of artificial intelligence technology, various models meet the requirements of practicality, how to deploy in the production environment has become a problem that plagues technicians. In order to facilitate the deployment of models, inference servers such as tensorflow serving and triton have appeared on the market. The emergence of these inference servers facilitates the deployment of models in the production environment, but these inference servers have two major defects: 1. They can only support very limited model types; 2. They can only support limited accelerated processors, usually It is CPU and GPU, but it is not so easy to deploy other accelerated processors. For example, the T...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N5/04G06F8/41G06F9/445G06F8/60
CPCG06N5/04G06F9/44505G06F8/41G06F8/60
Inventor 李柏宏
Owner INSPUR SUZHOU INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products