Model deployment method, electronic device, storage medium, and program product

By splitting the neural network model into operator modules and deploying them according to the capability indicators of the computing power cards, the problem of increased execution time caused by differences in computing power card capabilities is solved, and efficient execution of the model on multiple computing power cards and rational utilization of computing power resources are achieved.

CN122240133APending Publication Date: 2026-06-19INSPUR SUZHOU INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INSPUR SUZHOU INTELLIGENT TECH CO LTD
Filing Date
2026-05-21
Publication Date
2026-06-19

Smart Images

  • Figure CN122240133A_ABST
    Figure CN122240133A_ABST
Patent Text Reader

Abstract

This application discloses a model deployment method, electronic device, storage medium, and program product, relating to the field of computer technology. The method includes: splitting the model to be deployed into multiple operator modules; obtaining the capability indicators of a computing power card; determining the operators supported by the computing power card based on the capability indicators; determining the target computing power card corresponding to the operator module based on the supported operators; and deploying the operator module to the target computing power card. This solves the problem of difficulty in distinguishing the capability differences of different computing power cards when deploying a neural network model across multiple computing power cards, leading to an increase in the overall execution time of the model. This method merges related operators into operator modules when splitting the model to be deployed, avoiding time loss due to data transmission. It determines the operators supported by the computing power card, the energy consumption cost and communication cost of the computing power card through capability indicators, and then determines the most suitable target computing power card for executing the operator module. This enables the deployment of the model on multiple computing power cards, reducing model execution time and improving computing power utilization.
Need to check novelty before this filing date? Find Prior Art