Quantitative model deployment method and system, storage medium, and equipment

A technology for quantifying models and models, applied in the server field, can solve problems such as slow inference speed, not supporting Dibit model inference deployment, etc., to achieve the effect of improving speed, avoiding bottlenecks in data transmission, and improving inference speed

Pending Publication Date: 2022-01-04
INSPUR SUZHOU INTELLIGENT TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In view of this, the object of the present invention is to propose a method, system, storage medium and equipment for deploying a trained model, so as to solve the problem of inference deployment and reasoning that the traditional reasoning framework in the prior art does not support the Dibit model. Slow speed, bottlenecks caused by data transmission, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Quantitative model deployment method and system, storage medium, and equipment
  • Quantitative model deployment method and system, storage medium, and equipment
  • Quantitative model deployment method and system, storage medium, and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0056] It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used to distinguish two entities with the same name or different parameters. It can be seen that "first" and "second" " is only for the convenience of expression, and should not be understood as limiting the embodiment of the present invention. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, of a process, method, system, product or other steps or elements inherent in a process, method, system, product, or device comprising a series of steps or elements.

[0057] Based on the above purpo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a quantitative model deployment method and system, a storage medium, and equipment. The method comprises the following steps: carrying out classification neural network retraining on a model through a quantitative perception training module, and obtaining a pseudo-quantitative model; reading the pseudo-quantization model into a deep learning compiling framework, wherein the deep learning compiling framework analyzes the pseudo-quantization model, identifies each node of the pseudo-quantization model, and performs convolution calculation to obtain a quantized four-bit model; compiling a rear end of the deep learning compiling framework, so as to enable the rear end to support the reasoning of the quantized four-bit model generated by the deep learning compiling framework; putting the quantized four-bit model into a model warehouse of a server, and creating a configuration file for calling the quantized four-bit model. According to the embodiment of the invention, the problem that model deployment is incompatible is solved; meanwhile, data transmission is reduced, and the data transmission bottleneck is avoided. In addition, the invention also relates to a method for executing reasoning on the basis of a quantitative model.

Description

technical field [0001] The present invention relates to the technical field of servers, in particular to a method, system, storage medium and equipment for quantitative model deployment. Background technique [0002] As early as 2016, Google launched TensorFlow Serving, a service-oriented framework for TensorFlow, which can expose the TensorFlow model to the outside world in the form of web services, accept request data from clients (Client) through network requests, calculate forward inference results and return. Triton's functionality is similar to TensorFlowServing. [0003] Model building and training are usually time-consuming and labor-intensive, and algorithm engineers need to do a lot of work to complete the building and training of a relatively complete model. The main purpose of the trained model is to solve practical problems more effectively, so deployment is a very important stage. Currently, however, the deployment of models is often problematic. For exampl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/60G06F8/71G06F8/41G06N3/08G06N5/04
CPCG06F8/60G06F8/71G06F8/41G06N3/08G06N5/041H04L69/04
Inventor 王曦辉
Owner INSPUR SUZHOU INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products