Quantitative model deployment method and system, storage medium, and equipment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for quantifying models and models, applied in the server field, can solve problems such as slow inference speed, not supporting Dibit model inference deployment, etc., to achieve the effect of improving speed, avoiding bottlenecks in data transmission, and improving inference speed

Pending Publication Date: 2022-01-04

INSPUR SUZHOU INTELLIGENT TECH CO LTD

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] In view of this, the object of the present invention is to propose a method, system, storage medium and equipment for deploying a trained model, so as to solve the problem of inference deployment and reasoning that the traditional reasoning framework in the prior art does not support the Dibit model. Slow speed, bottlenecks caused by data transmission, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0055] In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0056] It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used to distinguish two entities with the same name or different parameters. It can be seen that "first" and "second" " is only for the convenience of expression, and should not be understood as limiting the embodiment of the present invention. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, of a process, method, system, product or other steps or elements inherent in a process, method, system, product, or device comprising a series of steps or elements.

[0057] Based on the above purpo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a quantitative model deployment method and system, a storage medium, and equipment. The method comprises the following steps: carrying out classification neural network retraining on a model through a quantitative perception training module, and obtaining a pseudo-quantitative model; reading the pseudo-quantization model into a deep learning compiling framework, wherein the deep learning compiling framework analyzes the pseudo-quantization model, identifies each node of the pseudo-quantization model, and performs convolution calculation to obtain a quantized four-bit model; compiling a rear end of the deep learning compiling framework, so as to enable the rear end to support the reasoning of the quantized four-bit model generated by the deep learning compiling framework; putting the quantized four-bit model into a model warehouse of a server, and creating a configuration file for calling the quantized four-bit model. According to the embodiment of the invention, the problem that model deployment is incompatible is solved; meanwhile, data transmission is reduced, and the data transmission bottleneck is avoided. In addition, the invention also relates to a method for executing reasoning on the basis of a quantitative model.

Description

technical field [0001] The present invention relates to the technical field of servers, in particular to a method, system, storage medium and equipment for quantitative model deployment. Background technique [0002] As early as 2016, Google launched TensorFlow Serving, a service-oriented framework for TensorFlow, which can expose the TensorFlow model to the outside world in the form of web services, accept request data from clients (Client) through network requests, calculate forward inference results and return. Triton's functionality is similar to TensorFlowServing. [0003] Model building and training are usually time-consuming and labor-intensive, and algorithm engineers need to do a lot of work to complete the building and training of a relatively complete model. The main purpose of the trained model is to solve practical problems more effectively, so deployment is a very important stage. Currently, however, the deployment of models is often problematic. For exampl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F8/60G06F8/71G06F8/41G06N3/08G06N5/04

CPCG06F8/60G06F8/71G06F8/41G06N3/08G06N5/041H04L69/04

Inventor 王曦辉

Owner INSPUR SUZHOU INTELLIGENT TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Quantitative model deployment method and system, storage medium, and equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology