Method and device for deploying multi-model inference service based on k8s cluster

Active Publication Date: 2022-07-08

SUZHOU METABRAIN INTELLIGENT TECH CO LTD

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Currently, the main way to deploy multi-models is to deploy services that support multi-model loading in the system, such as Tensor Flow Serving, Trion Serving, and AWS Multi-Model Serving. However, such services are traditional services and do not support elastic scaling in clusters. , and the operation is complex

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038] It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are for the purpose of distinguishing two entities with the same name but not the same or non-identical parameters. It can be seen that "first" and "second" It is only for the convenience of expression and should not be construed as a limitation to the embodiments of the present invention, and subsequent embodiments will not describe them one by one.

[0039] In one embodiment, please refer to figure 1 As shown, the present invention provides a multi-model inference service deployment method based on k8s cluster, and the method specifically includes the following steps:

[0040] S100, deploying a scheduling service in the minimum scheduling unit of the k8s cluster, and configuring memory, computing resources and scheduling policies for the scheduling service; wherein, the minimum scheduling unit is a pod;

[0041] S200: Deploy a plurality of model inference se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-model inference service deployment method and device based on k8s cluster. The method includes: deploying a scheduling service in the smallest scheduling unit of the k8s cluster, and configuring memory, computing resources and scheduling policies for the scheduling service; deploying multiple model inference services according to the memory of the scheduling service, and inferring each model A service is configured to use computing resources of the scheduling service and is configured to be associated with the scheduling service; the scheduling service invokes the plurality of model inference services to process inference tasks according to the scheduling policy. The solution of the present invention realizes the ability of multiple model reasoning services to share the minimum scheduling unit, and the multi-model reasoning service can be elastically scaled with the service load, and the deployment operation is relatively simple.

Description

technical field [0001] The invention belongs to the field of cloud computing, and in particular relates to a method, device, computer equipment and storage medium for deploying a multi-model inference service based on a k8s cluster. Background technique [0002] As machine learning methods are more widely used in actual production, the number of models that need to be deployed in production systems is also increasing. For example, a machine learning application to provide a personalized experience often requires training many models; for example, a news classification service trains a custom model on the news category, and a recommendation model can train each user's usage history to personalize its recommendations; respectively; The main reason for training so many models is to protect the privacy of users' models and data. [0003] In a K8S cluster, the number of POD resources is limited (by default, each Node can start 110 POD instances). By default, in a cluster of 100 ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F9/455G06N5/04

CPCG06F9/45558G06N5/04G06F2009/45595G06F2009/45583G06F2009/4557

Inventor 陈清山

Owner SUZHOU METABRAIN INTELLIGENT TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and device for deploying multi-model inference service based on k8s cluster

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology