Method and device for deploying multi-model inference service based on k8s cluster

CN112231054BActive Publication Date: 2022-07-08SUZHOU METABRAIN INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Patents(China)
Current Assignee / Owner
SUZHOU METABRAIN INTELLIGENT TECH CO LTD
Publication Date
2022-07-08

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a multi-model inference service deployment method and device based on k8s cluster. The method includes: deploying a scheduling service in the smallest scheduling unit of the k8s cluster, and configuring memory, computing resources and scheduling policies for the scheduling service; deploying multiple model inference services according to the memory of the scheduling service, and inferring each model A service is configured to use computing resources of the scheduling service and is configured to be associated with the scheduling service; the scheduling service invokes the plurality of model inference services to process inference tasks according to the scheduling policy. The solution of the present invention realizes the ability of multiple model reasoning services to share the minimum scheduling unit, and the multi-model reasoning service can be elastically scaled with the service load, and the deployment operation is relatively simple.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the field of cloud computing, and in particular relates to a method, device, computer equipment and storage medium for deploying a multi-model inference service based on a k8s cluster. Background technique

[0002] As machine learning methods are more widely used in actual production, the number of models that need to be deployed in production systems is also increasing. For example, a machine learning application to provide a personalized experience often requires training many models; for example, a news classification service trains a custom model on the news category, and a recommendation model can train each user's usage history to personalize its recommendations; respectively; The main reason for training so many models is to protect the privacy of users' models and data.

[0003] In a K8S cluster, the number of POD resources is limited (by default, each Node can start 110 POD instances). By default, in a cluster of 100 ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More