Deployment method, inference method and electronic device of neural network model

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By defining a set of interfaces for dynamic libraries at the platform and device layers, a unified interface is provided, which solves the problem of repeatedly deploying neural network models on different chip platforms, simplifies the process, and reduces development costs.

CN115562677BActive Publication Date: 2026-06-19BEIJING SUNNIWELL DIGITAL S&T CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: BEIJING SUNNIWELL DIGITAL S&T CO LTD
Filing Date: 2022-10-26
Publication Date: 2026-06-19

Application Information

Patent Timeline

26 Oct 2022

Application

19 Jun 2026

Publication

CN115562677B

IPC: G06F8/60; G06F8/41; G06F8/71; G06F9/445; G06F9/54; G06N3/06; G06N5/04

AI Tagging

Application Domain

Interprogram communication Version control

Technical Efficacy Phrases

simple process reduce workload

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Anti-mud polycarboxylate water reducing agent and preparation method and application thereof
CN121135974Breduce adsorptionincrease overallSuperplasticizer Water reducer
A flue gas decarbonization system and process for natural gas purification
CN117619110BMeet Carbon Capturesmall footprint Flue gas Sulfide
A high-strength anti-softening copper-tantalum composite material with a bimodal grain structure and a preparation method thereof
CN120719156Bhigh strength Improve plasticity Grain structure Plastic property
Panax ramosa rare saponin composition, preparation method and application thereof
CN110959853Befficient conversionIncrease expansionCosmetic preparationsSugar food ingredientsLactic acid bacterium Yeast
A code spraying recognition method applied to three-dimensional intelligent detection of steel plates
CN117746409BRealize automatic identificationimprove accuracy Character and pattern recognition Manufacturing computing systems Algorithm Engineering

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Deploying neural network models on different chip platforms requires connecting to the APIs of each platform separately, resulting in a large amount of repetitive and tedious work and a high learning cost.

Method used

By defining a set of interfaces for platform-level dynamic libraries and device-level dynamic libraries, a unified set of interfaces is provided. Developers only need to interface once to deploy models on multiple chip platforms, simplifying the process and reducing learning difficulty and cost.

Benefits of technology

It implements an interface for uniformly deploying neural network models on multiple chip platforms, reducing repetitive workload and lowering the learning difficulty and cost for developers.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115562677B_ABST

Patent Text Reader

Abstract

This disclosure provides a method for deploying a neural network model, an inference method, and an electronic device. The deployment method includes: determining the chip platform corresponding to the neural network model; compiling a platform-level dynamic library and a device-level dynamic library corresponding to the chip platform, wherein the device-level dynamic library is configured to provide a first set of interfaces to the platform-level dynamic library, and the first set of interfaces is configured to call hardware-related interfaces of the chip platform; the platform-level dynamic library is configured to provide a second set of interfaces to the application layer, and the second set of interfaces is configured to call the first set of interfaces of the device-level dynamic library; and deploying the neural network model, the platform-level dynamic library, and the device-level dynamic library to the chip platform. This disclosure improves model deployment efficiency.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computers, and more particularly to a method for deploying a neural network model, an inference method, and an electronic device. Background Technology

[0002] Artificial intelligence technology, due to its versatility and its characteristics of standardization, automation, and modularity, has entered the stage of large-scale implementation and industrial mass production after years of vigorous development based on deep learning frameworks.

[0003] Meanwhile, with the gradual implementation of artificial intelligence applications, various smart hardware devices such as smart cameras, smart speakers, and smart robots are emerging in large numbers. For developers of smart devices (individuals or enterprises), applying artificial intelligence algorithms to terminal devices requires the edge deployment of neural network models. However, devices using different chips have different edge deployment APIs provided by their chip manufacturers, resulting in the same neural network model needing to interface with multiple sets of APIs, leading to a large amount of repetitive work.

[0004] For example, the existing object detection model M1 needs to be deployed on three devices, A, B, and C, which use different chip platforms. Since the terminal deployment APIs of these three devices are different, M1 needs to interface with the APIs of each of these three platforms to achieve edge deployment. If a new model M2 is developed, M2 still needs to repeat the entire process of M1, which is very tedious and repetitive. Summary of the Invention

[0005] According to one aspect of this disclosure, a method for deploying a neural network model is provided, comprising:

[0006] Determine the chip platform corresponding to the neural network model;

[0007] Compile a platform-level dynamic library and a device-level dynamic library corresponding to the chip platform. The device-level dynamic library is configured to provide a first set of interfaces to the platform-level dynamic library, and the first set of interfaces is configured to call the hardware-related interfaces of the chip platform. The platform-level dynamic library is configured to provide a second set of interfaces to the application layer, and the second set of interfaces is configured to call the first set of interfaces of the device-level dynamic library.

[0008] Deploy neural network models, platform-level dynamic libraries, and device-level dynamic libraries onto the chip platform.

[0009] In some embodiments:

[0010] The second set of interfaces includes a model inference request interface, which is configured to: respond to a model inference request from the application layer, obtain model input data and model input description information provided by the application layer; encapsulate the model input data and model input description information according to the specification of the model inference request interface of the device layer dynamic library; and use the encapsulated data to call the model inference request interface of the device layer dynamic library.

[0011] The first set of interfaces includes a model inference request interface, which is configured to: decapsulate the encapsulated data in response to calls to the platform-level dynamic library; and use the decapsulated data to call the hardware-related interfaces of the chip platform for model inference.

[0012] In some embodiments:

[0013] The second set of interfaces includes an inference result retrieval interface, which is configured to: in response to an application layer inference result retrieval request, call the inference result retrieval interface of the device layer dynamic library; and return the inference result to the application layer.

[0014] The first set of interfaces includes an inference result acquisition interface, which is configured to: in response to a call to the platform-layer dynamic library, call the hardware-related interfaces of the chip platform to obtain the inference result; package the obtained inference result; and return the packaged inference result to the platform-layer dynamic library.

[0015] In some embodiments:

[0016] The second set of interfaces includes a model loading interface, which is configured to: in response to a model loading request from the application layer, call the first set of interfaces of the device layer dynamic library to load the neural network model;

[0017] The first set of interfaces includes a model loading interface, which is configured to: in response to calls to the platform-level dynamic library, call the hardware-related interfaces of the chip platform to load the neural network model.

[0018] According to another aspect of this disclosure, a reasoning method for a neural network model is provided, comprising:

[0019] Load the pre-deployed device-layer dynamic library;

[0020] When using a neural network model at the application layer, the second set of interfaces of the platform layer dynamic library responds to the application layer's call and calls the first set of interfaces of the device layer dynamic library.

[0021] The first set of interfaces of the device layer dynamic library responds to the calls of the platform layer dynamic library, calling the hardware-related interfaces of the chip platform.

[0022] In some embodiments:

[0023] The second set of interfaces in the platform-layer dynamic library responds to calls from the application layer by calling the first set of interfaces in the device-layer dynamic library, including:

[0024] The model inference request interface in the second interface set responds to the model inference request from the application layer and obtains the model input data and model input description information provided by the application layer.

[0025] The model inference request interface in the second interface set encapsulates the model input data and model input description information according to the specification of the model inference request interface of the device layer dynamic library;

[0026] The model inference request interface in the second set of interfaces uses the encapsulated data to call the model inference request interface of the device layer dynamic library;

[0027] The first set of interfaces in the device-layer dynamic library responds to calls from the platform-layer dynamic library, calling the hardware-related interfaces of the chip platform, including:

[0028] The model inference request interface in the first interface set responds to the call of the platform layer dynamic library and decapsulates the encapsulated data;

[0029] The model inference request interface in the first interface set uses the decapsulated data to call the hardware-related interfaces of the chip platform to perform model inference.

[0030] In some embodiments:

[0031] The second set of interfaces in the platform-layer dynamic library responds to calls from the application layer by calling the first set of interfaces in the device-layer dynamic library, including:

[0032] The inference result retrieval interface in the second interface set responds to the inference result retrieval request from the application layer and calls the inference result retrieval interface of the device layer dynamic library;

[0033] The first set of interfaces in the device-layer dynamic library responds to calls from the platform-layer dynamic library, calling the hardware-related interfaces of the chip platform, including:

[0034] The inference result retrieval interface in the first interface set responds to the call of the platform-level dynamic library and calls the hardware-related interfaces of the chip platform to obtain the inference result;

[0035] The inference result retrieval interface in the first interface set packages the retrieved inference results;

[0036] The inference result retrieval interface in the first interface set returns the packaged inference result to the platform layer dynamic library;

[0037] The method further includes: the inference result acquisition interface in the second interface set returns the inference result to the application layer.

[0038] In some embodiments:

[0039] The second set of interfaces in the platform-layer dynamic library responds to calls from the application layer by calling the first set of interfaces in the device-layer dynamic library, including:

[0040] The model loading interface in the second interface set responds to the application layer's model loading request and calls the first interface set of the device layer dynamic library to load the neural network model.

[0041] The first set of interfaces in the device-layer dynamic library responds to calls from the platform-layer dynamic library, calling the hardware-related interfaces of the chip platform, including:

[0042] The model loading interface in the first interface set responds to the call to the platform-level dynamic library, calling the hardware-related interfaces of the chip platform to load the neural network model.

[0043] According to another aspect of this disclosure, an electronic device is provided, comprising:

[0044] Processor; and

[0045] Stored program memory,

[0046] The program includes instructions that, when executed by the processor, cause the processor to perform the methods of embodiments of this disclosure.

[0047] According to another aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause the computer to perform the methods of embodiments of this disclosure.

[0048] One or more technical solutions provided in this application embodiment provide a first set of interfaces to a platform-level dynamic library through a device-level dynamic library. This first set of interfaces is configured to call hardware-related interfaces of the chip platform. A second set of interfaces is provided to the application layer through the platform-level dynamic library. This second set of interfaces is configured to call the first set of interfaces of the device-level dynamic library. This provides a unified interface to the application, allowing developers to deploy edge models on multiple chip platforms simply by interfacing with this unified interface. The overall process is simplified, workload is greatly reduced, and learning difficulty and cost are relatively lowered. Attached Figure Description

[0049] Further details, features, and advantages of this disclosure are disclosed in the following description of exemplary embodiments in conjunction with the accompanying drawings, in which:

[0050] Figure 1 A schematic block diagram of the interface structure of a neural network model according to an exemplary embodiment of the present disclosure is shown;

[0051] Figure 2 A flowchart illustrating a method for deploying a neural network model according to an exemplary embodiment of the present disclosure is shown;

[0052] Figure 3 A flowchart illustrating an inference method for a neural network model according to an exemplary embodiment of the present disclosure is shown;

[0053] Figure 4 A schematic diagram of a processing framework for a neural network model according to an exemplary embodiment of the present disclosure is shown;

[0054] Figure 5 A structural block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure is shown. Detailed Implementation

[0055] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0056] It should be understood that the steps described in the method embodiments of this disclosure may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this disclosure is not limited in this respect.

[0057] The term "comprising" and its variations as used herein are open-ended, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the description below. It should be noted that the concepts of "first", "second", etc., used in this disclosure are only used to distinguish different devices, modules, or units, and are not intended to limit the order of functions performed by these devices, modules, or units or their interdependencies.

[0058] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0059] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0060] The present disclosure is described below with reference to the accompanying drawings.

[0061] Figure 1 A schematic block diagram of an interface structure according to an exemplary embodiment of the present disclosure is shown, such as Figure 1 As shown, it includes: multiple device-layer dynamic libraries 10 and platform-layer dynamic libraries 20. Each device-layer dynamic library 10 is configured to provide a first interface set 101 to the platform-layer dynamic library 20, and the first interface set 101 of each device-layer dynamic library 10 is configured to call a hardware-related interface of a chip platform. The platform-layer dynamic library 20 is configured to provide a second interface set 201 to the application layer, and the second interface set 201 is configured to call the first interface set 101 of the device-layer dynamic libraries 10.

[0062] The application layer uses the neural network model by calling the second set of interfaces 201. Examples include: Load Model, Model Forward, Get Model Output, Show Model Info, and Model Destroy. The application layer also parses the inference results to obtain outputs that it can understand.

[0063] like Figure 1 As shown, the second interface set 201 includes a model inference request interface 2011, and the first interface set 101 includes a model inference request interface 1011. The model inference request interface 2011 of the second interface set 201 is configured to, in response to a model inference request from the application layer, obtain model input data and model input description information provided by the application layer; encapsulate the model input data and model input description information according to the specification of the model inference request interface of the device layer dynamic library 10; and use the encapsulated data to call the model inference request interface 1011 of the device layer dynamic library 10. The model inference request interface 1011 in the first interface set 101 is configured to, in response to a call from the platform layer dynamic library 20, decapsulate the encapsulated data; and use the decapsulated data to call the hardware-related interfaces of the chip platform for model inference.

[0064] like Figure 1As shown, the second interface set 201 includes an inference result acquisition interface 2012, and the first interface set 101 includes an inference result acquisition interface 1012. The inference result acquisition interface 2012 in the second interface set 201 is configured to, in response to an application layer inference result acquisition request, call the inference result acquisition interface 1012 of the device layer dynamic library 10; and, after obtaining the inference structure, return the inference result to the application layer. The inference result acquisition interface 1012 in the first interface set 101 is configured to, in response to a call from the platform layer dynamic library 20, call the hardware-related interfaces of the chip platform to obtain the inference result; package the obtained inference result; and return the packaged inference result to the platform layer dynamic library 20.

[0065] like Figure 1 As shown, the second interface set 201 includes a model loading interface 2013, and the first interface set 101 includes a model loading interface 1013. The model loading interface 2013 in the second interface set 201 is configured to, in response to a model loading request from the application layer, call the first interface set 101 of the device layer dynamic library 10 to load a neural network model; the model loading interface 1013 in the first interface set 101 is configured to, in response to a call from the platform layer dynamic library 20, call the hardware-related interfaces of the chip platform to load a neural network model.

[0066] like Figure 1 As shown, the second interface set 201 includes a model unloading interface 2014, and the first interface set 101 includes a model unloading interface 1014. The model unloading interface 2014 in the second interface set 201 is configured to, in response to a model unloading request from the application layer, call the first interface set 101 of the device layer dynamic library 10 to unload the neural network model; the model unloading interface 1014 in the first interface set 101 is configured to, in response to a call from the platform layer dynamic library 20, call the hardware-related interfaces of the chip platform to unload the neural network model.

[0067] like Figure 1 As shown, the second interface set 201 includes a model structure display interface 2015, and the first interface set 101 includes a model structure display interface 1015. The model structure display interface 2015 in the second interface set 201 is configured to, in response to a model structure display request from the application layer, call the first interface set 101 of the device layer dynamic library 10 to display the model structure of the neural network model; the model structure display interface 1015 in the first interface set 101 is configured to, in response to a call from the platform layer dynamic library 20, call the hardware-related interfaces of the chip platform to display the model structure of the neural network model.

[0068] It should be understood that the above interfaces are for illustrative purposes only.

[0069] In this embodiment, a first set of interfaces is provided to the platform-level dynamic library through the device-level dynamic library. This first set of interfaces is configured to call the hardware-related interfaces of the chip platform. A second set of interfaces is provided to the application layer through the platform-level dynamic library. This second set of interfaces is configured to call the first set of interfaces of the device-level dynamic library. This provides a unified interface to the application, allowing developers to deploy the edge model on multiple chip platforms simply by interfacing with this unified interface. The overall process is simplified, the workload is greatly reduced, and the learning difficulty and cost are relatively lowered.

[0070] Figure 2 A flowchart illustrating a method for deploying a neural network model according to an exemplary embodiment of the present disclosure is shown, such as... Figure 2 As shown, the method includes steps S201 to S203.

[0071] Step S201: Determine the chip platform corresponding to the neural network model.

[0072] As one implementation method, when deploying a neural network model, the target chip platform to be deployed is specified by the chip platform identifier.

[0073] Step S202: Compile the platform-level dynamic library and the device-level dynamic library corresponding to the chip platform.

[0074] Specifically, the device-layer dynamic library is configured to provide a first set of interfaces to the platform-layer dynamic library, and the first set of interfaces is configured to call the hardware-related interfaces of the chip platform; the platform-layer dynamic library is configured to provide a second set of interfaces to the application layer, and the second set of interfaces is configured to call the first set of interfaces of the device-layer dynamic library.

[0075] In this embodiment, device-layer dynamic libraries corresponding to each chip platform are pre-configured. Each device-layer dynamic library calls the hardware-related interfaces of the chip platform and provides a unified first set of interfaces to the platform-layer dynamic library. The platform-layer dynamic library calls the first set of interfaces and provides a second set of interfaces to the application layer, wherein interfaces in the first set of interfaces call interfaces in the second set of interfaces. The application layer uses the neural network model by calling interfaces in the second set of interfaces.

[0076] Step S203: Deploy the neural network model, platform-layer dynamic library, and device-layer dynamic library to the chip platform.

[0077] In this embodiment, a first set of interfaces is provided to the platform-level dynamic library through the device-level dynamic library. This first set of interfaces is configured to call the hardware-related interfaces of the chip platform. A second set of interfaces is provided to the application layer through the platform-level dynamic library. This second set of interfaces is configured to call the first set of interfaces of the device-level dynamic library. This provides a unified interface to the application, allowing developers to deploy the edge model on multiple chip platforms simply by interfacing with this unified interface. The overall process is simplified, the workload is greatly reduced, and the learning difficulty and cost are relatively lowered.

[0078] For example, developers can deploy neural network model M by providing a second set of interfaces based on the platform-level dynamic library. Through the deployment method of this embodiment, neural network model M can be deployed to multiple chip platforms without modifying the application-level deployment code of neural network model M.

[0079] In some embodiments, the second interface set includes a model inference request interface, configured to, in response to a model inference request from the application layer, obtain model input data and model input description information provided by the application layer; encapsulate the model input data and model input description information according to the specification of the model inference request interface of the device layer dynamic library; and use the encapsulated data to call the model inference request interface of the device layer dynamic library. Simultaneously, the first interface set includes a model inference request interface, configured to, in response to a call from the platform layer dynamic library, decapsulate the encapsulated data; and use the decapsulated data to call the hardware-related interfaces of the chip platform for model inference.

[0080] In some embodiments, the second interface set includes an inference result acquisition interface, configured to, in response to an application-layer inference result acquisition request, invoke the inference result acquisition interface of the device-layer dynamic library; and return the inference result to the application layer. Simultaneously, the first interface set includes an inference result acquisition interface, configured to, in response to a call to the platform-layer dynamic library, invoke the hardware-related interfaces of the chip platform to acquire the inference result; package the acquired inference result; and return the packaged inference result to the platform-layer dynamic library.

[0081] In some embodiments, the second interface set includes a model loading interface, configured to invoke the first interface set of the device layer dynamic library to load the neural network model in response to a model loading request from the application layer. Simultaneously, the first interface set includes a model loading interface, configured to invoke the hardware-related interfaces of the chip platform to load the neural network model in response to a call to the platform layer dynamic library.

[0082] Figure 3 A flowchart illustrating an inference method for a neural network model according to an exemplary embodiment of the present disclosure is shown, such as... Figure 3 As shown, the method includes steps S301 to S303.

[0083] Step S301: Load the pre-deployed device layer dynamic library.

[0084] In this embodiment, the neural network model is deployed using the neural network model deployment method of this disclosure.

[0085] In step S302, when using a neural network model at the application layer, the second set of interfaces of the platform layer dynamic library responds to the application layer's call and calls the first set of interfaces of the device layer dynamic library.

[0086] In step S303, the first set of interfaces of the device layer dynamic library responds to the call of the platform layer dynamic library and calls the hardware-related interfaces of the chip platform.

[0087] In some embodiments, the use of a neural network model includes processes such as model loading, model inference, inference result acquisition, model structure display, and model unloading. These processes are described below.

[0088] Model reasoning, including:

[0089] The model inference request interface in the second interface set responds to the model inference request from the application layer and obtains the model input data and model input description information provided by the application layer.

[0090] The model inference request interface in the second interface set encapsulates the model input data and model input description information according to the specification of the model inference request interface of the device layer dynamic library;

[0091] The model inference request interface in the second set of interfaces uses the encapsulated data to call the model inference request interface of the device layer dynamic library;

[0092] The model inference request interface in the first interface set responds to the call of the platform layer dynamic library and decapsulates the encapsulated data;

[0093] The model inference request interface in the first interface set uses the decapsulated data to call the hardware-related interfaces of the chip platform to perform model inference.

[0094] The reasoning results are obtained, including:

[0095] The inference result retrieval interface in the second interface set responds to the inference result retrieval request from the application layer and calls the inference result retrieval interface of the device layer dynamic library;

[0096] The inference result retrieval interface in the first interface set responds to the call of the platform-level dynamic library and calls the hardware-related interfaces of the chip platform to obtain the inference result;

[0097] The inference result retrieval interface in the first interface set packages the retrieved inference results;

[0098] The inference result retrieval interface in the first interface set returns the packaged inference result to the platform layer dynamic library;

[0099] The inference result retrieval interface in the second interface set returns the inference result to the application layer.

[0100] Model loading includes:

[0101] The model loading interface in the second interface set responds to the application layer's model loading request and calls the first interface set of the device layer dynamic library to load the neural network model.

[0102] The model loading interface in the first interface set responds to the call to the platform-level dynamic library, calling the hardware-related interfaces of the chip platform to load the neural network model.

[0103] Model unloading includes:

[0104] The model unloading interface in the second interface set responds to the model unloading request from the application layer and calls the first interface set of the device layer dynamic library to unload the neural network model.

[0105] The model unloading interface in the first interface set responds to the call of the platform-level dynamic library, calling the hardware-related interfaces of the chip platform to unload the neural network model.

[0106] The model structure is shown, including:

[0107] The model structure display interface in the second interface set responds to the application layer's model structure display request by calling the first interface set of the device layer dynamic library to display the model structure of the neural network model;

[0108] The model structure display interface in the first interface set responds to the call of the platform-level dynamic library, calling the hardware-related interfaces of the chip platform to display the model structure of the neural network model.

[0109] The following example illustrates an embodiment of this application.

[0110] like Figure 4 As shown, this example includes: Application Layer, Platform Layer, and Device Layer.

[0111] The Device Layer is used to interface with the Application Programming Interfaces (APIs) provided by different chip manufacturers. It encapsulates the interfaces according to the specifications and provides them to the platform layer for use.

[0112] For the device layer, an example API is as follows:

[0113] The API definitions for model loading and initialization are as follows:

[0114] int SocLoadModel(HiwkModel**pstModel, const char*pModelFilePath);

[0115] The API definition for displaying model structure information is:

[0116] int SocShowModelInfo(HiwkModel*pstModel);

[0117] The API definition for model inference is:

[0118] int SocModelForward(HiwkModel*pstModel,ModelInput*pstInputs);

[0119] The API for obtaining model inference output is defined as follows:

[0120] int SocGetModelOutput(HiwkModel*pstModel,ModelOutput*pstOutputs);

[0121] The API definition for unloading and releasing resources from a model is as follows:

[0122] int SocModelDestroy(HiwkModel*pstModel).

[0123] Different chip platforms compile and generate device-layer dynamic library files with unified naming rules in their respective compilation environments. For example, chip platforms A, B, and C correspond to libsw_A_deploy.so, libsw_B_deploy.so, and libsw_C_deploy.so, respectively. Because the device layer provides a unified API, it resolves the specific details of API deployment and hardware differences between different chip manufacturers.

[0124] The platform layer is used to shield the differences between different chip SDKs through a unified interface, and to provide the application layer with basic capabilities such as device model loading, inference and unloading.

[0125] For the platform layer, an example API is as follows:

[0126] The API definitions for model loading and initialization are as follows:

[0127] int SWModelEngine::LoadModel(std::string modelFilePath);

[0128] The API definition for displaying model structure information is:

[0129] int SWModelEngine::ShowModelInfo();

[0130] The API definition for model inference is:

[0131] int SWModelEngine::ModelForward(SModelInput stInputs);

[0132] The API for obtaining model inference output is defined as follows:

[0133] int SWModelEngine::GetModelOutput(SModelOutput*pstOutputs);

[0134] The API definition for unloading and releasing resources from a model is as follows:

[0135] int SWModelEngine::ModelDestroy().

[0136] The platform layer selects and loads the corresponding chip platform's device layer dynamic library based on the type of the model object, while providing a unified API to the upper layer. This shields the differences between different device libraries, so that the application layer does not need to care about the underlying implementation details, reducing the learning cost and development workload for developers.

[0137] The Application Layer obtains the model's inference results through the unified interface of the platform layer, parses the binary data of the inference results, and outputs structured information that humans can understand.

[0138] The application layer creates corresponding model objects based on the needs of the actual task, and then calls the unified API provided by the platform layer to implement the deployment of models on multiple chip platforms.

[0139] In some examples, the application layer is also used for model algorithm parsing tasks. After obtaining the output results of model inference through the platform layer's API, the obtained data is parsed according to the specific model structure and the relevant algorithms used. Only after the parsing is completed can the final prediction result be obtained.

[0140] This example demonstrates that developers only need to connect to the relevant APIs at the platform layer to deploy the edge model on multiple chip platforms. The overall process is simplified, the workload is greatly reduced, and the learning difficulty and cost are also relatively lowered.

[0141] Exemplary embodiments of this disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to cause the electronic device to perform a method according to an embodiment of this disclosure.

[0142] Exemplary embodiments of this disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a computer's processor, is used to cause the computer to perform a method according to embodiments of this disclosure.

[0143] Exemplary embodiments of this disclosure also provide a computer program product, including a computer program, wherein, when executed by a processor of a computer, the computer program is used to cause the computer to perform a method according to an embodiment of this disclosure.

[0144] refer to Figure 5 The present invention describes a structural block diagram of an electronic device 500 that can serve as a server or client of the present disclosure, which is an example of a hardware device that can be applied to various aspects of the present disclosure. The electronic device is intended to represent various forms of digital electronic computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0145] like Figure 5 As shown, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 502 or a computer program loaded from a storage unit 508 into a random access memory (RAM) 503. The RAM 503 may also store various programs and data required for the operation of the device 500. The computing unit 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0146] Multiple components in electronic device 500 are connected to I / O interface 505, including: input unit 506, output unit 507, storage unit 508, and communication unit 509. Input unit 506 can be any type of device capable of inputting information to electronic device 500. Input unit 506 can receive input digital or character information and generate key signal inputs related to user settings and / or function control of electronic device. Output unit 507 can be any type of device capable of presenting information and may include, but is not limited to, a display, speaker, video / audio output terminal, vibrator, and / or printer. Storage unit 508 may include, but is not limited to, disk and optical disk. Communication unit 509 allows electronic device 500 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and / or chipsets, such as Bluetooth devices, WiFi devices, WiMax devices, cellular communication devices, and / or the like.

[0147] The computing unit 501 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above. For example, in some embodiments, the deployment and inference methods of a neural network model can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program can be loaded and / or installed on the electronic device 500 via ROM 502 and / or communication unit 509. In some embodiments, the computing unit 501 can be configured to perform the deployment and inference methods of a neural network model by any other suitable means (e.g., by means of firmware).

[0148] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0149] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0150] As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, device, and / or apparatus (e.g., disk, optical disk, memory, programmable logic device (PLD)) for providing machine instructions and / or data to a programmable processor, including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal for providing machine instructions and / or data to a programmable processor.

[0151] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0152] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with embodiments of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0153] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other.

Claims

1. A method for deploying a neural network model, characterized in that, include: Determine the chip platform corresponding to the neural network model; Compile a platform-level dynamic library and a device-level dynamic library corresponding to the chip platform, wherein the device-level dynamic library is configured to provide a first set of interfaces to the platform-level dynamic library, and the first set of interfaces is configured to call the hardware-related interfaces of the chip platform; the platform-level dynamic library is configured to provide a second set of interfaces to the application layer, and the second set of interfaces is configured to call the first set of interfaces of the device-level dynamic library. Deploy the neural network model, the platform layer dynamic library, and the device layer dynamic library to the chip platform; in: The second set of interfaces includes a model inference request interface, configured to: in response to a model inference request from the application layer, obtain model input data and model input description information provided by the application layer; encapsulate the model input data and model input description information according to the specification of the model inference request interface of the device layer dynamic library; and use the encapsulated data to call the model inference request interface of the device layer dynamic library. The first set of interfaces includes a model inference request interface, configured to: decapsulate the encapsulated data in response to a call to the platform-layer dynamic library; and use the decapsulated data to call the hardware-related interfaces of the chip platform for model inference.

2. The method as described in claim 1, characterized in that, The second set of interfaces includes an inference result retrieval interface, configured to: in response to an application layer inference result retrieval request, call the inference result retrieval interface of the device layer dynamic library; and return the inference result to the application layer; The first set of interfaces includes an inference result acquisition interface, which is configured to: in response to a call to the platform-layer dynamic library, call the hardware-related interfaces of the chip platform to acquire inference results; package the acquired inference results; and return the packaged inference results to the platform-layer dynamic library.

3. The method as described in claim 1, characterized in that, The second set of interfaces includes a model loading interface, which is configured to call the first set of interfaces of the device layer dynamic library to load the neural network model in response to a model loading request from the application layer. The first set of interfaces includes a model loading interface, which is configured to call the hardware-related interfaces of the chip platform to load the neural network model in response to the call of the platform-layer dynamic library.

4. A reasoning method for a neural network model, characterized in that, include: Load the pre-deployed device-layer dynamic library; When using a neural network model at the application layer, the second set of interfaces of the platform layer dynamic library responds to the application layer call and invokes the first set of interfaces of the device layer dynamic library. The first set of interfaces of the device layer dynamic library responds to the call of the platform layer dynamic library by calling the hardware-related interfaces of the chip platform. The second set of interfaces of the platform-layer dynamic library responds to calls from the application layer by calling the first set of interfaces of the device-layer dynamic library, including: The model inference request interface in the second interface set responds to the model inference request of the application layer and obtains the model input data and model input description information provided by the application layer; The model inference request interface in the second interface set encapsulates the model input data and the model input description information according to the specification of the model inference request interface of the device layer dynamic library; The model inference request interface in the second interface set uses the encapsulated data to call the model inference request interface of the device layer dynamic library; The first set of interfaces in the device-layer dynamic library responds to calls to the platform-layer dynamic library by calling hardware-related interfaces of the chip platform, including: The model inference request interface in the first interface set responds to the call of the platform layer dynamic library and decapsulates the encapsulated data; The model inference request interface in the first interface set uses the decapsulated data to call the hardware-related interfaces of the chip platform to perform model inference.

5. The method as described in claim 4, Its features are, in, The second set of interfaces of the platform-layer dynamic library responds to calls from the application layer by calling the first set of interfaces of the device-layer dynamic library, including: The inference result retrieval interface in the second interface set responds to the inference result retrieval request from the application layer by calling the inference result retrieval interface of the device layer dynamic library; The first set of interfaces in the device-layer dynamic library responds to calls to the platform-layer dynamic library by calling hardware-related interfaces of the chip platform, including: The inference result acquisition interface in the first interface set responds to the call of the platform layer dynamic library and calls the hardware-related interface of the chip platform to obtain the inference result; The inference result retrieval interface in the first interface set packages the retrieved inference results; The inference result retrieval interface in the first interface set returns the packaged inference result to the platform layer dynamic library; The method further includes: the inference result acquisition interface in the second interface set returning the inference result to the application layer.

6. The method as described in claim 4, characterized in that, in, The second set of interfaces of the platform-layer dynamic library responds to calls from the application layer by calling the first set of interfaces of the device-layer dynamic library, including: The model loading interface in the second interface set responds to the model loading request from the application layer by calling the first interface set of the device layer dynamic library to load the neural network model; The first set of interfaces in the device-layer dynamic library responds to calls to the platform-layer dynamic library by calling hardware-related interfaces of the chip platform, including: The model loading interface in the first set of interfaces responds to the call of the platform layer dynamic library by calling the hardware-related interfaces of the chip platform to load the neural network model.

7. An electronic device, comprising: processor; as well as Stored program memory, The program includes instructions that, when executed by the processor, cause the processor to perform the method according to any one of claims 1-6.

8. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-6.

Citation Information

Patent Citations

CNN (Convolutional Neural Network) inference framework design method supporting multi-core parallelism of embedded platform
CN113298259A
Embedded artificial intelligence implementation method for inference deployment and hardware platform
CN114168186A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

CNN (Convolutional Neural Network) inference framework design method supporting multi-core parallelism of embedded platform

Embedded artificial intelligence implementation method for inference deployment and hardware platform