Method and device for deploying multi-model inference service based on k8s cluster

Active Publication Date: 2022-07-08
SUZHOU METABRAIN INTELLIGENT TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Currently, the main way to deploy multi-models is to deploy services that support multi-model loading in the system, such as Tensor Flow Serving, Trion Serving, and AWS Multi-Model Serving. However, such services are traditional services and do not support elastic scaling in clusters. , and the operation is complex

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for deploying multi-model inference service based on k8s cluster
  • Method and device for deploying multi-model inference service based on k8s cluster
  • Method and device for deploying multi-model inference service based on k8s cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are for the purpose of distinguishing two entities with the same name but not the same or non-identical parameters. It can be seen that "first" and "second" It is only for the convenience of expression and should not be construed as a limitation to the embodiments of the present invention, and subsequent embodiments will not describe them one by one.

[0039] In one embodiment, please refer to figure 1 As shown, the present invention provides a multi-model inference service deployment method based on k8s cluster, and the method specifically includes the following steps:

[0040] S100, deploying a scheduling service in the minimum scheduling unit of the k8s cluster, and configuring memory, computing resources and scheduling policies for the scheduling service; wherein, the minimum scheduling unit is a pod;

[0041] S200: Deploy a plurality of model inference se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-model inference service deployment method and device based on k8s cluster. The method includes: deploying a scheduling service in the smallest scheduling unit of the k8s cluster, and configuring memory, computing resources and scheduling policies for the scheduling service; deploying multiple model inference services according to the memory of the scheduling service, and inferring each model A service is configured to use computing resources of the scheduling service and is configured to be associated with the scheduling service; the scheduling service invokes the plurality of model inference services to process inference tasks according to the scheduling policy. The solution of the present invention realizes the ability of multiple model reasoning services to share the minimum scheduling unit, and the multi-model reasoning service can be elastically scaled with the service load, and the deployment operation is relatively simple.

Description

technical field [0001] The invention belongs to the field of cloud computing, and in particular relates to a method, device, computer equipment and storage medium for deploying a multi-model inference service based on a k8s cluster. Background technique [0002] As machine learning methods are more widely used in actual production, the number of models that need to be deployed in production systems is also increasing. For example, a machine learning application to provide a personalized experience often requires training many models; for example, a news classification service trains a custom model on the news category, and a recommendation model can train each user's usage history to personalize its recommendations; respectively; The main reason for training so many models is to protect the privacy of users' models and data. [0003] In a K8S cluster, the number of POD resources is limited (by default, each Node can start 110 POD instances). By default, in a cluster of 100 ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/455G06N5/04
CPCG06F9/45558G06N5/04G06F2009/45595G06F2009/45583G06F2009/4557
Inventor 陈清山
Owner SUZHOU METABRAIN INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products