Inference method and device of NVDLA software stack, equipment and storage medium

By receiving inference datasets and calling pre-stored algorithms in the NVDLA software stack, and performing inference in a preset order, the problem of slow inference speed for single sets of data in existing technologies is solved, achieving more efficient data processing.

CN115249068BActive Publication Date: 2026-06-19PENG CHENG LAB

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PENG CHENG LAB
Filing Date
2022-08-11
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing NVDLA software stack has a low average inference speed for a single set of data when performing inference on multiple sets of data, and there are lengthy recognition steps.

Method used

By receiving the inference dataset and calling pre-stored algorithms from the preset algorithm library, the inference data is inferred sequentially in combination with the preset inference order. The algorithm library is a combination of pre-stored algorithms based on the preset model algorithms after segmentation, association, and format conversion.

Benefits of technology

This reduces the processing time of the algorithm and improves the average inference speed of a single set of data.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115249068B_ABST
    Figure CN115249068B_ABST
Patent Text Reader

Abstract

This application discloses an inference method, apparatus, device, and storage medium for the NVDLA software stack. The method includes: receiving an inference dataset and calling a pre-stored algorithm from a preset algorithm library for inference of the inference dataset; wherein the algorithm library is a combination of pre-stored algorithms based on preset model algorithms after segmentation, association, and format conversion; and performing inference on the inference data in the inference dataset sequentially based on the pre-stored algorithms and a preset inference order. In this application, the processed model algorithms are saved to obtain an algorithm library. When performing inference on the inference dataset, the pre-stored algorithms in the algorithm library are directly called, which allows for sequential inference on the inference data in the inference algorithm dataset, reducing the processing time of the algorithms and improving the average inference speed of a single set of data when performing inference on multiple sets of data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of deep learning model technology, and in particular to an inference method, apparatus, device and storage medium for an NVDLA software stack. Background Technology

[0002] As deep learning technology becomes more widely used, its application in artificial intelligence technology is also becoming more and more common. Deep learning technology is mainly divided into training models and deployment models. Through deep learning models, artificial intelligence can have the ability to analyze and learn like humans, and can solve many complex pattern recognition problems.

[0003] To accelerate deep learning, NVDLA (Deep Learning Accelerator) has emerged. NVDLA is a free and open architecture that promotes a standard way of designing deep learning inference accelerators. The NVDLA software stack enables the rapid and efficient deployment of deep learning models on embedded devices. However, the NVDLA software stack currently has a lengthy recognition step when inferring from recognition data, which reduces the average inference speed of a single set of data when inferring from multiple sets of data. Summary of the Invention

[0004] The main objective of this application is to provide an inference method, apparatus, device, and storage medium for the NVDAL software stack, aiming to solve the technical problem that the existing NVDAL software stack has lengthy identification steps when inferring identification data, and reduces the average inference speed of a single set of data when inferring multiple sets of data.

[0005] To achieve the above objectives, this application provides an inference method for the NVDLA software stack, the inference method for the NVDLA software stack comprising:

[0006] Receive the inference dataset and call a pre-stored algorithm from a preset algorithm library to infer the inference dataset;

[0007] The algorithm library is a combination of pre-stored algorithms based on preset model algorithms after being segmented, associated, and format-converted;

[0008] Based on the pre-stored algorithm and combined with the preset reasoning order, the reasoning data in the reasoning dataset is reasoned sequentially.

[0009] Optionally, before the step of receiving the inference dataset and calling a pre-stored algorithm for inference on the inference dataset from a preset algorithm library, the method includes:

[0010] Receive the inference dataset and perform step segmentation on the model algorithm to obtain an NPU-type neural algorithm;

[0011] The neural algorithm and the preset execution module are associated to obtain the association algorithm between the neural algorithm and the execution module;

[0012] Based on the application scenario of the inference data, the association algorithm is converted to obtain the corresponding format algorithm;

[0013] Based on the preset information segments, the format algorithm is encapsulated and saved to obtain the pre-stored algorithm, and the pre-stored algorithm is combined into an algorithm library.

[0014] Optionally, the step of encapsulating and saving the format algorithm based on a preset information segment to obtain a pre-stored algorithm includes:

[0015] Based on a preset information segment, the format algorithm is classified to determine the field algorithm in the format algorithm that corresponds to the information segment;

[0016] The field algorithm is encapsulated and saved to obtain the pre-stored algorithm.

[0017] Optionally, before the step of converting the association algorithm to a corresponding format algorithm based on the application scenario of the inference data, the method includes:

[0018] The application scenarios for the association algorithm are determined;

[0019] Based on the application scenario, the format that the association algorithm needs to convert is determined.

[0020] Optionally, the step of sequentially reasoning the reasoning data in the reasoning dataset based on a pre-stored algorithm and a preset reasoning order includes:

[0021] The pre-stored algorithm is then checked for preservation.

[0022] If the pre-stored algorithm is not saved, the neural algorithm and the preset execution module are recompiled until the pre-stored algorithm is saved.

[0023] After the pre-stored algorithm is saved, it is called to perform inference on the inference data in the inference dataset in sequence.

[0024] Optionally, the step of calling the pre-stored algorithm after saving it, and sequentially performing inference on the inference data in the inference dataset, includes:

[0025] After the pre-stored algorithm is saved, the pre-stored algorithm is called again.

[0026] The pre-stored algorithm is parsed to obtain the recognition information in the pre-stored algorithm;

[0027] Based on the identification information, inference is performed sequentially on the inference data in the inference dataset.

[0028] Optionally, before the step of sequentially reasoning the reasoning data in the reasoning dataset according to a preset reasoning order, the method includes:

[0029] Extract the reception time of the inference data;

[0030] Based on the receiving time, the inference data is sorted to obtain the inference order.

[0031] This application also provides an inference apparatus for an NVDLA software stack, the inference apparatus for the NVDLA software stack comprising:

[0032] The calling module is used to receive the inference dataset and call a pre-stored algorithm from a preset algorithm library to infer the inference dataset;

[0033] The algorithm library is a combination of pre-stored algorithms based on preset model algorithms after being segmented, associated, and format-converted;

[0034] The inference module is used to perform inference on the inference data in the inference dataset sequentially based on a pre-stored algorithm and a preset inference order.

[0035] This application also provides an inference device for an NVDLA software stack, wherein the NVDLA software stack inference device is a physical node device, and the NVDLA software stack inference device includes: a memory, a processor, and a program for the NVDLA software stack inference method stored in the memory and executable on the processor. When the program for the NVDLA software stack inference method is executed by the processor, it can implement the steps of the NVDLA software stack inference method as described above.

[0036] This application also provides a storage medium storing a program that implements the inference method of the NVDLA software stack described above. When the program of the inference method of the NVDLA software stack is executed by a processor, it implements the steps of the inference method of the NVDLA software stack described above.

[0037] This application provides an inference method, apparatus, device, and storage medium for the NVDAL software stack. Compared with the prior art, the NVDAL software stack has lengthy recognition steps when inferring recognition data, which reduces the average inference speed of a single set of data when inferring multiple sets of data, in this application, an inference dataset is received, and a pre-stored algorithm for inferring the dataset is called from a preset algorithm library; wherein, the algorithm library is a combination of pre-stored algorithms based on preset model algorithms after segmentation, association, and format conversion; based on the pre-stored algorithms and combined with a preset inference order, the inference data in the inference dataset is inferred sequentially. In this application, after receiving the inference dataset, a pre-stored algorithm is directly called from the algorithm library obtained after steps of segmentation, association, and format conversion based on a preset model algorithm. The inference data in the inference dataset is then inferred sequentially in combination with a preset inference order. That is, in this application, the processed model algorithm is saved to obtain an algorithm library. When inferring the inference dataset, the pre-stored algorithm in the algorithm library is directly called to infer the inference data in the inference algorithm dataset sequentially, which reduces the processing time of the algorithm and improves the average inference speed of a single set of data when inferring multiple sets of data. Attached Figure Description

[0038] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0039] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0040] Figure 1 This is a flowchart illustrating the first embodiment of the inference method for the NVDLA software stack of this application.

[0041] Figure 2 This is a schematic diagram of the module flow of the first embodiment of the inference method of the NVDLA software stack of this application;

[0042] Figure 3 This is a schematic diagram of the device structure of the hardware operating environment involved in the embodiments of this application;

[0043] Figure 4 This is a schematic diagram of a preset information sub-segment in the embodiment of this application;

[0044] Figure 5 This is a schematic diagram of the encapsulation and storage process of the association algorithm in the embodiments of this application;

[0045] Figure 6 This is a schematic diagram of the reading process of the pre-stored algorithm in the embodiment of this application;

[0046] Figure 7 This is a schematic diagram of the compilation process in the embodiments of this application.

[0047] The purpose, features, and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0048] It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to limit this application.

[0049] This application provides an inference method for the NVDLA software stack. In the first embodiment of the inference method for the NVDLA software stack of this application, referring to... Figure 1 The inference method of the NVDLA software stack includes:

[0050] Step S10: Receive the inference dataset and call a pre-stored algorithm from the preset algorithm library to infer the inference dataset;

[0051] The algorithm library is a combination of pre-stored algorithms based on preset model algorithms after being segmented, associated, and format-converted;

[0052] Step S20: Based on the pre-stored algorithm and combined with the preset reasoning order, the reasoning data in the reasoning dataset is reasoned sequentially.

[0053] This embodiment aims to reduce the number of recognition steps in the NVDAL software stack when performing inference on the recognition data, thereby improving the average inference speed of a single set of data when performing inference on multiple sets of data.

[0054] In this embodiment, it should be noted that the inference method of the NVDLA software stack can be applied to the inference device of the NVDLA software stack, which is subordinate to the inference equipment of the NVDLA software stack, and the inference equipment of the NVDLA software stack is subordinate to the inference system of the NVDLA software stack.

[0055] NVDLA, a deep learning accelerator, is a free and open architecture that promotes a standard approach to designing deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability.

[0056] In this embodiment, the NVDLA software stack is based on Tengine for inference. Tengine enables the rapid and efficient deployment of deep learning neural network models on embedded devices.

[0057] The software stack can be a set of components that provide different services from top to bottom. For example, a VM (Virtual Machine) consists of components from bottom to top: an OS (Operating System), a hypervisor, the VM itself, and the software running on the VM.

[0058] In this embodiment, the inference data can be images, text data, video data, etc., and there is no specific limitation. In this embodiment, images are used as an example for specific illustration.

[0059] In this embodiment, refer to Figure 2 , Figure 2 This is a schematic diagram of the module flow of the first embodiment of the inference method of the NVDLA software stack of this application.

[0060] The specific steps are as follows:

[0061] Step S10: Receive the inference dataset and call a pre-stored algorithm from a preset algorithm library to infer the inference dataset.

[0062] Among them, the inference dataset corresponds to Figure 2 The image set in the dataset must contain at least one image; that is, the inference dataset must contain at least one inference data.

[0063] The algorithm library is a combination of pre-stored algorithms based on preset model algorithms after being segmented, associated, and formatted.

[0064] In this embodiment, when performing inference on the inference data in the inference dataset in sequence, it is only necessary to call the pre-stored algorithm from the algorithm library to reduce the steps of splitting, associating, and converting the format of the model algorithm each time.

[0065] Step S20: Based on the pre-stored algorithm and combined with the preset reasoning order, the reasoning data in the reasoning dataset is reasoned sequentially.

[0066] In this embodiment, the images in the image set are reasoned sequentially according to a preset reasoning order using a pre-stored algorithm. It should be noted that since the reasoned content of each image needs to be confirmed after reasoning, there is a certain time interval between reasoning between every two images. During this time, the image reasoning execution module enters standby mode. When reasoning for the next image, the pre-stored algorithm needs to be retrieved again from the algorithm library.

[0067] In this embodiment, when performing inference on the first image, the reception of the image and the processing of the model algorithm can be performed simultaneously. The neural algorithm in the model algorithm is converted into a pre-stored algorithm before inference is performed on the first image. That is, during inference on the first image, the following steps are executed: Figure 2 The numbers "1", "2", "3", "4", "+1", "5", "6", "7", "8", and "9" in the algorithm can be directly invoked when performing inference on the second image and subsequent images. In other words, the algorithm is executed... Figure 2 The addition of "+2", "5", "6", "7", "8", and "9" reduces the number of steps in the algorithm and improves the average inference speed of a single image.

[0068] In this embodiment, refer to Figure 6 , Figure 6 This is a schematic diagram of the reading process of the pre-stored algorithm in the embodiment of this application. The application scenario is determined by the user. If the user selects the application scenario of batch processing images, the memory-format pre-stored algorithm is called when inferring the images in the image set in sequence, and the pre-stored algorithm needs to be called once for each image. If the user selects the application scenario of real-time image acquisition, the file-format pre-stored algorithm is called when inferring the acquired images.

[0069] In this embodiment, the model algorithm can also be processed first to obtain a pre-stored algorithm, and then inference can be performed on the image. That is, when inference is performed on the first image, the algorithm is executed. Figure 2 The numbers “+2”, “5”, “6”, “7”, “8”, and “9” are included.

[0070] In this embodiment, Figure 2 The "5" in the text refers to receiving image data. After receiving the image data, it is necessary to standardize the image data size, including the size, average value, and other information of the images in the image set, to facilitate the inference of the pre-stored algorithm on the images.

[0071] Specifically, before the step of receiving the inference dataset and calling a pre-stored algorithm for inference on the inference dataset from a preset algorithm library, the method includes:

[0072] Step S100: Receive the inference dataset and perform step segmentation on the model algorithm to obtain an NPU-type neural algorithm;

[0073] In this embodiment, before segmenting the model algorithm of the inference data, the model algorithm format is converted into a unified format supported by Tengine, such as tmfile format, through Tengine. The converted model algorithm is then parsed by the Serilalizer to extract the network model parameters of the modified format, thereby extracting the general steps of CPU (general-purpose processor), the graphics steps of GPU (graphics processor), and the neural algorithms of NPU (neural network processor).

[0074] Tengine supports various model algorithms, including TensorFlow, PyTorch, Caffe, and Darknet. Therefore, it is necessary to standardize the format of these model algorithms to facilitate accurate parsing by the Serilalizer and reduce its computational workload.

[0075] In this embodiment, the model algorithm is divided according to the processor type by the Scheduler to obtain general steps for CPU, graphics steps for GPU, and neural algorithms for NPU, and the model algorithms are allocated accordingly. The general steps are executed by the CPU, the graphics steps are executed by the GPU, and the neural algorithms are executed by the NPU.

[0076] It should be noted that NVDLA belongs to the NPU series, and it requires the use of neural algorithms in the model algorithm to perform inference on inference data.

[0077] It should be noted that the general steps, graphical steps, and neural algorithms are all consecutive steps in the model algorithm. For example, the model neural algorithm has 100 steps, the general steps can be 1 to 30 steps, the graphical steps can be 31 to 60 steps, and the neural algorithm can be 61 to 100 steps.

[0078] Step S200: Associate the neural algorithm with the preset execution module to obtain the association algorithm between the neural algorithm and the execution module;

[0079] Among them, the association corresponds Figure 2 The NVDLA compiler in the text compiles the neural algorithm and the preset execution module.

[0080] In this embodiment, refer to Figure 7 , Figure 7This is a schematic diagram of the compilation process in the embodiment of this application. After the Scheduler completes tasks such as splitting and allocating model algorithms, the NPU transmits the neural algorithm to the NVDLA compiler (the compilation process of the deep learning accelerator). The NVDLA compiler compiles the neural algorithm and the preset execution model, binding each step of the neural algorithm with the corresponding execution module, so that each step in the neural algorithm can be implemented by a corresponding execution module, thus obtaining the associated algorithm.

[0081] Step S300: Based on the application scenario of the inference data, the association algorithm is format-converted to obtain the corresponding format algorithm;

[0082] Based on the application scenarios of the inference data, the association algorithm is saved to obtain a pre-stored algorithm;

[0083] In this embodiment, the application scenarios are divided into batch image processing and real-time image acquisition. In the batch image processing scenario, at least one image needs to be input at a time, that is, the input of an image set. In the real-time image acquisition scenario, one image is input at a time, that is, the input of a single image.

[0084] In this embodiment, refer to Figure 5 , Figure 5 This diagram illustrates the encapsulation and storage process of the association algorithm in the embodiments of this application. After compilation, the application scenario is determined. If the scenario involves batch image processing, memory space is allocated to temporarily store the association algorithm within that memory space. In this scenario, the association algorithm is converted into a pre-stored algorithm in memory format. If the scenario involves real-time image acquisition, disk space is allocated to serialize the association algorithm using FlatBuffers. That is, the result encapsulation frame is reformatted according to the FlatBuffers protocol, and the serialized algorithm is written to a file. In this scenario, the association algorithm is converted into a pre-stored algorithm in file format. Figure 5 Scenario 1 corresponds to the batch image processing scenario in this embodiment, and Scenario 2 corresponds to the real-time image acquisition scenario in this embodiment.

[0085] FlatBuffers is an open-source, cross-platform, high-efficiency serialization library that provides interfaces for multiple programming languages. It serializes the association algorithm and stores it in a buffer. The association algorithm can be stored in a file or transmitted over the network as is without any parsing overhead.

[0086] In this embodiment, since batch processing of images involves processing the image set in a concentrated manner, the extraction time of the pre-stored algorithm is concentrated and regular. It is only necessary to temporarily store the association algorithm in memory to obtain the pre-stored algorithm. After the image set is processed, the pre-stored algorithm in memory is deleted to free up memory storage space and speed up the segmentation and compilation speed of the model algorithm when inferring the image set next time.

[0087] In this embodiment, since real-time image acquisition processes images one at a time, and the acquisition time is irregular, if the association algorithm is temporarily stored in memory in this scenario, it may result in the pre-stored algorithm being deleted before all images are acquired. Continuing processing would then require re-splitting and rewriting the model algorithm; in other words, the model algorithm would need to be re-executed. Figure 2 The steps “1”, “2”, “3”, “4” and “+1” in the algorithm are re-saved, which increases the processing time of the model algorithm.

[0088] In this embodiment, if the association algorithm is stored in disk space to obtain the pre-stored algorithm, the pre-stored algorithm can be saved for a long time, and the scenario of collecting images without time intervals can be ignored. Since the pre-stored algorithm is implemented in memory, storing the association algorithm in disk space for a long time will not affect the calculation speed of the pre-stored algorithm. However, Tengine may temporarily shut down or go into standby during the image collection interval. When Tengine restarts, the data stored in memory space may be lost, and the association algorithm needs to be reprocessed, which increases the inference time of the image and reduces the inference speed of a single image.

[0089] It should be noted that memory space can temporarily store association algorithms. After the pre-stored algorithm extracted from memory space has finished inferring the image set, it will be automatically deleted within a preset time to avoid occupying memory space and causing the pre-stored algorithm to run slower. Disk space can store association algorithms for a longer period of time, which can adapt to irregular collection intervals. Moreover, the association algorithm can be stored in disk space for a long time without affecting the speed of the pre-stored algorithm running in memory space.

[0090] Step S32: Based on the preset information segments, the format algorithm is encapsulated and saved to obtain the pre-stored algorithm, and the pre-stored algorithm is combined into an algorithm library.

[0091] In this embodiment, refer to Figure 4 , Figure 4This diagram illustrates the pre-defined information segments in the embodiments of this application. The information segments include version number, task list, memory list, address list, event list, stored data list, tensor list, redirection list, and submission list. The association algorithm, after converting the format based on different application scenarios, encapsulates the converted association algorithm into nine clearly defined pre-stored algorithms based on the information segments. This facilitates inference of image information based on these segments when calling the pre-stored algorithms, and the pre-stored algorithms are combined into an algorithm library.

[0092] The version number can be the version number recorded during compilation.

[0093] The task list includes two tasks: one is for accelerating the execution module, and the other is for the simulator.

[0094] The memory list can be a format converted by the association algorithm, such as a memory format or a file format. Specifically, the memory format can be FAT32, NTFS, EXFAT, etc.; the file format can be FlatBuffers, etc.

[0095] The address list can be the location where the association algorithm is stored and the location that needs to be used when it is called. For example, the storage location of a part of the association algorithm is the specific location of the storage area in memory space, and the location of the execution module associated with this part of the algorithm.

[0096] The stored data list can include the binding relationship between the associated algorithm and the corresponding execution module.

[0097] Specifically, the step of encapsulating and saving the format algorithm based on preset information segments to obtain a pre-stored algorithm includes:

[0098] Step S321: Based on the preset information segments, classify the format algorithm to determine the field algorithm in the format algorithm that corresponds to the information segments;

[0099] Step S322: Encapsulate and save the field algorithm to obtain the pre-stored algorithm.

[0100] In this embodiment, the format algorithm is classified according to the version number and eight lists in the information sub-segment. That is, the corresponding field algorithm is extracted from the format algorithm, and the information sub-segment corresponding to the field algorithm is encapsulated and saved to obtain the pre-stored algorithm.

[0101] In this embodiment, the information in each step of the association algorithm is classified according to information segments, the field algorithm of the corresponding information segment is extracted from each step, and the field algorithm is encapsulated and saved to obtain the pre-stored algorithm.

[0102] Specifically, before the step of converting the format of the association algorithm based on the application scenario of the inference data to obtain the format algorithm, the method includes:

[0103] Step A10: Determine the application scenario of the association algorithm;

[0104] Step A20: Based on the application scenario, determine the format that the association algorithm needs to convert.

[0105] In this embodiment, the application scenario for which the association algorithm runs is determined based on the application scenario selected by the user. Based on different application scenarios, the format that the association algorithm needs to convert is determined. If the application scenario is batch processing of images, the format that the association algorithm needs to convert is determined to be memory format; if the application scenario is real-time image acquisition, the format that the association algorithm needs to convert is determined to be file format.

[0106] In this embodiment, the model algorithm is segmented and compiled before or during reasoning on the first image to obtain the association algorithm. The association algorithm is saved in the corresponding format according to the application scenario selected by the user. When reasoning on subsequent images, the pre-stored algorithm can be directly called to reduce the time spent on repeated processing of the model algorithm and improve the speed of reasoning on a single image.

[0107] This application provides an inference method, apparatus, device, and storage medium for the NVDAL software stack. Compared with the prior art, the NVDAL software stack has lengthy recognition steps when inferring recognition data, which reduces the average inference speed of a single set of data when inferring multiple sets of data, in this application, the inference dataset is received, and a pre-stored algorithm for inferring the dataset is called from a preset algorithm library; based on the pre-stored algorithm and combined with a preset inference order, the inference data in the dataset is inferred sequentially; wherein, the algorithm library is a combination of pre-stored algorithms based on preset model algorithms after segmentation, association, and format conversion. In this application, after receiving the inference dataset, a pre-stored algorithm is directly called from the algorithm library obtained after steps of segmentation, association, and format conversion based on a preset model algorithm. The inference data in the inference dataset is then inferred sequentially in combination with a preset inference order. That is, in this application, the processed model algorithm is saved to obtain an algorithm library. When inferring the inference dataset, the pre-stored algorithm in the algorithm library is directly called to infer the inference data in the inference algorithm dataset sequentially, which reduces the processing time of the algorithm and improves the average inference speed of a single set of data when inferring multiple sets of data.

[0108] Furthermore, based on the above embodiments of this application, another embodiment of this application is provided. In this embodiment, the step of sequentially reasoning on the reasoning data in the reasoning dataset based on a pre-stored algorithm and a preset reasoning order includes:

[0109] Step B10: Perform a save check on the pre-stored algorithm;

[0110] Step B20: If the pre-stored algorithm is not saved, the neural algorithm and the preset execution module are recompiled until the pre-stored algorithm is saved.

[0111] Step B30: After the pre-stored algorithm is saved, the pre-stored algorithm is called to perform inference on the inference data in sequence.

[0112] In this embodiment, since Tengine may be in a standby state, the pre-stored algorithm may be automatically deleted after the standby state ends. Therefore, when calling the pre-stored algorithm, it is necessary to first check whether the pre-stored algorithm has been automatically deleted in the standby state. If the check result is that the pre-stored algorithm has not been saved, the neural algorithm and execution module are recompiled until the compiled association algorithm is converted and saved as the pre-stored algorithm, and then the inference data is inferred through the pre-stored algorithm.

[0113] In this embodiment, refer to Figure 6 , Figure 6 This is a schematic diagram of the pre-stored algorithm reading process in the embodiment of this application. If the user selects scenario one, it is determined whether there is a pre-stored algorithm in the memory space. If there is a pre-stored algorithm, it is directly read from the memory. If there is no pre-stored algorithm, the compiler task is re-executed to recompile the neural algorithm and the corresponding execution module to obtain the association algorithm. The association algorithm is then converted back into a pre-stored algorithm and saved. If the user selects scenario two, it is determined whether there is a pre-stored algorithm in the disk space. If there is a pre-stored algorithm, the pre-stored algorithm in FlatBuffer format is deserialized and then run through the runtime. If there is no pre-stored algorithm, the compiler task is re-executed.

[0114] Specifically, the step of calling the pre-stored algorithm after it has been saved, and then sequentially performing reasoning on the reasoning data, includes:

[0115] Step B31: After the pre-stored algorithm is saved, the pre-stored algorithm is called again;

[0116] Step B32: Parse the pre-stored algorithm to obtain the recognition information in the pre-stored algorithm;

[0117] Step B33: Based on the identification information, perform inference on the inference data in the inference dataset in sequence.

[0118] In this embodiment, the application scenarios include real-time processing scenarios and batch centralized processing scenarios. When the application scenario is a real-time processing scenario, the recompiled association algorithm needs to be serialized, formatted into a pre-stored algorithm, stored in disk space, and then the pre-stored algorithm is called and run through runtimer.

[0119] In this embodiment, if it is determined that the pre-stored algorithm does not exist before reasoning the Nth set of reasoning data, the neural algorithm and execution module are recompiled, and the resulting association algorithm format is converted into a pre-stored algorithm, which can be directly applied to the Nth set of reasoning data. The pre-stored algorithm can also be stored during the reasoning process of the Nth set of reasoning data.

[0120] It should be noted that, in this embodiment, the batch centralized processing scenario corresponds to Figure 6 Scenario 1 in the context of real-time processing Figure 6 Scene 2 in the story.

[0121] In this embodiment, before calling the pre-stored algorithm, it is first determined whether the pre-stored algorithm is saved, so as to monitor the storage status of the pre-stored algorithm in a timely manner. This can avoid the situation where there is no available algorithm to reason about the reasoning data when there is a storage error in the pre-stored algorithm.

[0122] Furthermore, based on the above embodiments of this application, another embodiment of this application is provided. In this embodiment, before the step of sequentially reasoning the reasoning data in conjunction with a preset reasoning order, the method includes:

[0123] Step C10: Extract the reception time of the inference data;

[0124] Step C20: Based on the receiving time, sort the inference data to obtain the inference order.

[0125] The reception time is the time when inference data reception begins.

[0126] In this embodiment, in order to meet the inference requirements, it is necessary to extract the reception time of the inference data, and then to sequentially mark the inference data according to the reception time. That is, to sort the inference data to obtain the inference order, and then to infer the inference data sequentially based on the inference order.

[0127] In this embodiment, refer to Figure 2After the Scheduler completes the invocation of the pre-stored algorithm and the receipt of inference data, it sends the NPU task to the NVDLA runtime through the NPU. The NVDLA runtime analyzes the inference data according to the pre-stored algorithm and submits the inference task to the module in KMD (kernel mode driver). The KMD driver device then completes the inference work on the inference data.

[0128] The NVDLA runtime is primarily responsible for loading pre-stored algorithms, reading inference data, binding input / output tensors and memory locations, and submitting inference tasks to the KMD kernel module.

[0129] In this embodiment, the NVDLA runtime uses a pre-stored algorithm to complete the operation control instructions between devices, so that the pre-stored algorithm can actually drive the devices, and then the KMD completes the control of the devices.

[0130] In this embodiment, since the received inference data is in the form of a dataset, the user may have already prioritized it before receiving the data. To maintain the original order of the inference data during inference, the receiving order needs to be determined. Because the data receiving speeds differ, data received earlier may be received slower than data received later. For example, according to the pre-arranged order, the first inference data may be received first, followed by the second. However, if the second inference data is received faster than the first, the second inference data may be received before the first. Therefore, the receiving order is determined based on the time when the inference data reception began.

[0131] In this embodiment, inference data is received according to the order ordered by the user. After receiving the first piece of inference data, the second piece of inference data is received, and so on, until all inference data is received. It should be noted that when receiving inference data, it is not necessary to wait until the first piece of inference data is received before starting the next piece; the inference data only needs to be received sequentially according to the order ordered by the user. In this embodiment, sorting the inference data based on the reception time avoids confusion in the inference order.

[0132] Reference Figure 3 , Figure 3 This is a schematic diagram of the device structure of the hardware operating environment involved in the embodiments of this application.

[0133] like Figure 3As shown, the inference device of this NVDLA software stack may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used to establish communication between the processor 1001 and the memory 1005. The memory 1005 may be high-speed RAM or stable non-volatile memory, such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.

[0134] Optionally, the inference device of this NVDLA software stack may also include a rectangular user interface, a network interface, a camera, RF (Radio Frequency) circuitry, sensors, audio circuitry, a WiFi module, etc. The rectangular user interface may include a display screen and an input submodule such as a keyboard. Optionally, the rectangular user interface may also include standard wired or wireless interfaces. The network interface may optionally include standard wired or wireless interfaces (such as a Wi-Fi interface).

[0135] Those skilled in the art will understand that Figure 3 The inference device structure of the NVDLA software stack shown does not constitute a limitation on the inference device of the NVDLA software stack. It may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0136] like Figure 3 As shown, the memory 1005, serving as a storage medium, may include an operating system, a network communication module, and the inference program for the NVDLA software stack. The operating system is a program that manages and controls the hardware and software resources of the NVDLA software stack inference device, supporting the operation of the NVDLA software stack inference program and other software and / or programs. The network communication module is used to enable communication between the various components within the memory 1005, as well as communication with other hardware and software in the NVDLA software stack inference system.

[0137] exist Figure 3 In the NVDLA software stack inference device shown, processor 1001 is used to execute the NVDLA software stack inference program stored in memory 1005 to implement the steps of the NVDLA software stack inference method described in any of the above claims.

[0138] The specific implementation of the inference device of the NVDLA software stack in this application is basically the same as the various embodiments of the inference method of the NVDLA software stack described above, and will not be repeated here.

[0139] This application also provides an inference apparatus for an NVDLA software stack, the inference apparatus for the NVDLA software stack comprising:

[0140] The calling module is used to receive the inference dataset and call a pre-stored algorithm from a preset algorithm library to infer the inference dataset;

[0141] The reasoning module is used to perform reasoning on the reasoning data in the reasoning dataset sequentially based on a pre-stored algorithm and a preset reasoning order.

[0142] The algorithm library is a combination of pre-stored algorithms based on preset model algorithms after being segmented, associated, and formatted.

[0143] Optionally, the inference device of the NVDLA software stack further includes:

[0144] The segmentation module is used to receive the inference dataset and segment the model algorithm into steps to obtain an NPU-type neural algorithm.

[0145] An association module is used to associate the neural algorithm with a preset execution module to obtain an association algorithm between the neural algorithm and the execution module;

[0146] The conversion module is used to convert the format of the association algorithm based on the application scenario of the inference data to obtain the corresponding format algorithm.

[0147] The storage module is used to encapsulate and save the format algorithm based on a preset information segment to obtain a pre-stored algorithm, and combine the pre-stored algorithms into an algorithm library.

[0148] Optionally, the storage submodule includes:

[0149] The classification module is used to classify information based on preset information segments and determine the field algorithm in the format algorithm that corresponds to the information segment.

[0150] The storage unit is used to encapsulate and save the field algorithm to obtain the pre-stored algorithm.

[0151] Optionally, the inference device of the NVDLA software stack further includes:

[0152] The scenario determination module is used to determine the application scenario of the association algorithm;

[0153] The format determination module is used to determine the format that the association algorithm needs to convert based on the application scenario.

[0154] Optionally, the inference module includes:

[0155] The judgment module is used to judge whether the pre-stored algorithm is saved.

[0156] The compilation submodule is used to recompile the neural algorithm and the preset execution module if the pre-stored algorithm is not saved, until the pre-stored algorithm is saved.

[0157] The inference submodule is used to call the pre-stored algorithm after it has been saved, and to perform inference on the inference data in the inference dataset in sequence according to the preset inference order.

[0158] Optionally, the inference submodule includes:

[0159] The calling module is used to call the pre-stored algorithm again after the pre-stored algorithm has been saved;

[0160] The parsing module is used to parse the pre-stored algorithm to obtain the recognition information in the pre-stored algorithm;

[0161] The reasoning unit is used to perform reasoning on the reasoning data in the reasoning dataset sequentially based on the identification information.

[0162] Optionally, the inference module includes:

[0163] The extraction module is used to extract the reception time of the inference data;

[0164] The sorting module is used to sort the inference data based on the receiving time to obtain the inference order.

[0165] The specific implementation of the inference device of the NVDLA software stack in this application is basically the same as the various embodiments of the inference method of the NVDLA software stack described above, and will not be repeated here.

[0166] This application provides a storage medium that stores one or more programs, which can be executed by one or more processors to implement the steps of the inference method of the NVDLA software stack described in any of the above claims.

[0167] The specific implementation of the storage medium in this application is basically the same as the various embodiments of the inference method of the NVDLA software stack described above, and will not be repeated here.

[0168] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0169] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0170] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of the present invention.

[0171] The above are merely preferred embodiments of the present invention and do not limit the scope of the patent. Any equivalent structural or procedural transformations made based on the description and drawings of the present invention, or direct or indirect applications in other related technical fields, are similarly included within the scope of patent protection of the present invention.

Claims

1. An inference method of an NVDLA software stack, characterized in that, The inference methods of the NVDLA software stack include: Receive the inference dataset and perform step segmentation on the preset model algorithm to obtain an NPU-type neural algorithm; The neural algorithm and the preset execution module are associated to obtain the association algorithm between the neural algorithm and the execution module; Based on the application scenarios of the inference data in the inference dataset, the association algorithm is format-converted to obtain the corresponding format algorithm; the application scenarios include batch processing and real-time acquisition. If the application scenario is batch processing, memory space is allocated and the association algorithm is temporarily stored in the memory space. If the application scenario is real-time acquisition, disk space is allocated and the association algorithm is serialized and written to a file. Based on the preset information segments, the format algorithm is encapsulated and saved to obtain the pre-stored algorithm, and the pre-stored algorithm is combined into an algorithm library. From the algorithm library, call the pre-stored algorithm for inference on the inference dataset; Based on the pre-stored algorithm and combined with the preset reasoning order, the reasoning data is reasoned sequentially.

2. The inference method of the NVDLA software stack according to claim 1, wherein, The step of encapsulating and saving the format algorithm based on a preset information segment to obtain a pre-stored algorithm includes: Based on a preset information segment, the format algorithm is classified to determine the field algorithm in the format algorithm that corresponds to the information segment; The field algorithm is encapsulated and saved to obtain the pre-stored algorithm.

3. The inference method of the NVDLA software stack according to claim 1, wherein, Before the step of converting the association algorithm to obtain the corresponding format algorithm in the application scenario based on the inference data in the inference dataset, the method includes: The application scenarios for the association algorithm are determined; Based on the application scenario, the format that the association algorithm needs to convert is determined.

4. The inference method of the NVDLA software stack according to claim 1, wherein, The step of performing reasoning on the reasoning data in the reasoning dataset sequentially based on a pre-stored algorithm and a preset reasoning order includes: The pre-stored algorithm is then checked for preservation. If the pre-stored algorithm is not saved, the neural algorithm and the preset execution module are recompiled until the pre-stored algorithm is saved. After the pre-stored algorithm is saved, it is invoked and, in conjunction with a preset reasoning order, the reasoning data in the reasoning dataset is reasoned sequentially.

5. The inference method for the NVDLA software stack as described in claim 4, characterized in that, The step of calling the pre-stored algorithm after it has been saved, and performing inference on the inference data in the inference dataset in sequence according to a preset inference order, includes: After the pre-stored algorithm is saved, the pre-stored algorithm is called again. The pre-stored algorithm is parsed to obtain the recognition information in the pre-stored algorithm; Based on the identification information, inference is performed sequentially on the inference data in the inference dataset.

6. The inference method of the NVDLA software stack according to claim 1, wherein, Before the step of sequentially reasoning the reasoning data in the reasoning dataset according to a preset reasoning order, the method includes: Extract the reception time of the inference data; Based on the receiving time, the inference data is sorted to obtain the inference order.

7. An inference apparatus of an NVDLA software stack, characterized in that, The inference device of the NVDLA software stack includes: The segmentation module is used to receive the inference dataset and perform step segmentation on the preset model algorithm to obtain NPU-type neural algorithms; An association module is used to associate the neural algorithm with a preset execution module to obtain an association algorithm between the neural algorithm and the execution module; The conversion module is used to convert the format of the association algorithm based on the application scenarios of the inference data in the inference dataset to obtain the corresponding format algorithm. The application scenarios include batch processing and real-time acquisition. If the application scenario is batch processing, memory space is allocated to temporarily store the association algorithm in the memory space. If the application scenario is real-time acquisition, disk space is allocated to serialize the association algorithm and write it to a file. The storage module is used to encapsulate and save the format algorithm based on a preset information segment to obtain a pre-stored algorithm, and combine the pre-stored algorithms into an algorithm library; The calling module is used to call a pre-stored algorithm from the algorithm library for inference on the inference dataset; The inference module is used to perform inference on the inference data in the inference dataset sequentially based on a pre-stored algorithm and a preset inference order.

8. An inference device of an NVDLA software stack, characterized in that, The inference device of the NVDLA software stack includes: a memory, a processor, and a program stored in the memory for implementing the inference method of the NVDLA software stack. The memory is used to store programs that implement the inference method of the NVDLA software stack; The processor is configured to execute a program that implements the inference method of the NVDLA software stack to implement the steps of the inference method of the NVDLA software stack as claimed in any one of claims 1 to 6.

9. A storage medium, characterized by The storage medium stores a program implementing an inference method for the NVDLA software stack, which is executed by a processor to implement the steps of the inference method for the NVDLA software stack as described in any one of claims 1 to 6.