Method and apparatus for testing the integrity of algorithms in generated program code for computing neural networks in a hardware environment.
The code generator addresses the challenge of verifying algorithmic integrity in neural networks by creating and testing program code in hardware environments, ensuring error-free and compatible neural network implementations.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- ROBERT BOSCH GMBH
- Filing Date
- 2025-12-19
- Publication Date
- 2026-07-02
AI Technical Summary
Existing methods fail to automatically verify the algorithmic integrity of neural networks implemented in hardware environments, particularly in safety-critical applications, due to the inability to check for errors such as division by zero, invalid mathematical operations, and invalid memory access.
A code generator that creates and tests program code for neural networks in a hardware environment, ensuring algorithmic integrity by executing test code on the hardware and checking for compliance with defined criteria, including memory allocation and numerical output matching, to ensure the program code accurately represents the original neural network.
Ensures the program code for neural networks is error-free and compatible with the hardware environment, preventing potential errors and ensuring correct functionality in safety-critical applications.
Smart Images

Figure 2026110586000001 
Figure 2026110586000002 
Figure 2026110586000003
Abstract
Description
Technical Field
[0001] The present invention relates to the implementation of program code in a hardware environment, such as that which occurs as a microcontroller-controlled control device or the like. The present invention further relates to a method for inspecting program code generated for calculating a neural network.
Background Art
[0002] Background Art Neural networks have come to form a large class of algorithms in both academia and industry over the past decade. An important point when used in manufacturing is to ensure that these algorithms function properly when installed in the hardware environment in which they are generated. This is required for use in safety-critical applications such as in the automotive field, medical devices, or aerospace.
[0003] The forms and methods for how to develop neural networks are essentially extremely useful for exploration. Therefore, in product development, many neural network architectures must be installed on the target hardware. This occurs in the form of software updates during the product life cycle. Therefore, all neural networks in use, regardless of whether they are implemented in the form of high-level code such as C, or low-level code such as LLVM, or even in the form of calling a pre-implemented library, cannot be manually inspected. Therefore, it is required to perform automatic inspection of program code.
[0004] The checks required to ensure that a neural network implementation fulfills the desired functionality developed for the hardware environment concern data integrity and algorithm integrity. While data integrity should typically be checked independently of the hardware implementation, algorithm integrity includes measures to ensure that the implemented neural network's program code substantially represents the originally trained network. That is, for all input data, the output of the created neural network is substantially identical to the output of the originally trained neural network. This also includes checks to ensure that the neural network code used does not contain errors that could cause errors in the hardware environment. Such errors can be caused by division by zero, other invalid mathematical operations, or invalid memory access, such as accessing an array outside of its valid area. [Overview of the Initiative] [Problems that the invention aims to solve]
[0005] The object of the present invention is to provide a code generator that creates program code for computing a neural network in a hardware environment, and that automatically checks the program code for the relevant hardware environment. [Means for solving the problem]
[0006] Disclosure of the invention The above problems are solved by the method for operating a code generator described in claim 1, which creates and implements program code that has been tested for computing a neural network in a hardware environment, and by the parallel code generator described in the claim.
[0007] Further configurations are described in each dependent claim.
[0008] According to the first aspect, a computer-implemented method for operating a code generator that generates and implements program code that has been tested for computation of a neural network in a hardware environment, comprising the following steps, namely: - The steps include creating program code to compute a neural network in a hardware environment based on a defined neural network, - A step of providing test code in the created program code, wherein the test code is for checking the execution of the program code in the hardware environment based on at least one test criterion when the test code is executed. - The step of implementing and executing program code with a test code in a hardware environment, - The step of creating program code that does not contain test code, provided that at least one test criterion is confirmed to be met, - The step of implementing program code that does not include the said check code in the hardware environment, A method including this is proposed.
[0009] In the code generation framework, program code for computing a neural network is created using a code generator, such as an embedded software encoder. Here, the neural network is created as program code that can be executed as a hardware environment on a microprocessor, microcontroller, GPU, or dedicated neural network accelerator. The hardware environment may be the unit or equivalent environment on which the program code is ultimately executed, for example, a ×86 architecture.
[0010] The program code is based on a defined neural network, provided in a general definition description, for example, as an ONNX data file, a model stored in Keras, a TensorFlow-Lite model, or an exported PyTorch data file. This can be formed in a higher-order programming language, such as C, as a combination of a lower-level representation, such as LLVM or inline assembly, or a representation that invokes pre-implemented library functions, such as C code.
[0011] Before implementing the created program code in a hardware environment, especially when used in safety-critical applications, it is crucial to verify the integrity of the algorithm. For this purpose, the method described above aims to automatically verify the created program code based on one or more verification criteria. Only if all verification criteria are met will the created program code represent a valid program code and be implemented in the hardware environment.
[0012] In other words, the core of the method for operating the code generator described above is to automatically check the program code for the implemented neural network for algorithmic integrity. In this case, the implemented program code can be tested as valid or error-free for the corresponding hardware environment, thereby demonstrating whether the implemented program code satisfies all requirements for algorithmic integrity for the configured hardware environment. If errors are identified, the program code can be configured not to be used in the hardware environment.
[0013] To verify the inspection criteria, program code created using the corresponding inspection code is executed on the hardware environment.
[0014] At least one testing criterion is the following test, namely, -By executing program code created in the hardware environment, the output of the layer is written only to the memory area set aside for the operation of the created program code. - In order to verify the integrity of the algorithm, the numerical output of the calculations of each layer of the neural network by the created program code is matched to the numerical output of the calculations of the corresponding layer of the original neural network. - The calculations of each layer of the created program code will write the corresponding output only to the memory area allocated for the calculation of that layer. It can be configured to include.
[0015] In the criteria for verifying global memory integrity, it is guaranteed that the program code being created is not written to memory areas outside of the memory allocated to it. For example, the main memory used to compute a neural network can be placed in a larger auxiliary memory area. The entire auxiliary memory area is initialized with defined values, and then the program code is executed. After the execution of the created program code, only the values in the memory areas within the auxiliary memory area corresponding to the main memory can be changed. Otherwise, the criteria are not met.
[0016] In the criteria for checking the local completeness of an algorithm, it is guaranteed that the results of the calculations of each layer in the created program code correspond to the calculations of the layers in the original neural network. During this check, it is verified that each layer in the created program code forms a correct numerical output. If these criteria are met, each layer is numerically correct and its output is correct. The calculations of each layer in the created program code produce an output tensor, which is temporarily stored for comparison with the results of the layer calculations in the original program code.
[0017] In this case, the numerical output of the initial network is computed using a defined standard library, such as CMSIS-NN, which provides an implementation of the neural network layers in combination with underlying tools, such as the ONNX runtime for ONNX data files, the TensorFlow Lite interpreter for TensorFlow models, PyTorch, or a tool for mapping the initial neural network using a standard library implementation. Such a tool is, for example, TensorFlow Lite Micro.
[0018] To prevent intermediate output tensors from being overwritten by further calculations, comparisons can be performed immediately after the execution of the corresponding layer of the generated program code. This is feasible in the hardware environment or in the code generator hardware.
[0019] Alternatively, the output tensor of the intermediate results of the execution of the generated program code can be copied to a separate memory area after execution and analyzed using the code generator's hardware after the execution of the generated program code is complete.
[0020] Alternatively, a memory layout can be used that does not overwrite the intermediate results of individual layers for the execution of the generated program code. In local algorithmic integrity testing with a simplified memory layout, the test should always complement the manufacturing memory layout to test for errors that may arise in relation to the memory layout created before code generation, because the memory planning algorithm may contain errors. Immediately after the execution of each computation in a layer, the corresponding output is copied to a separate memory area. In this case, a comparison can be performed against the simplified memory layout.
[0021] In devices where the main memory is strongly restricted by large required memory, such strategies cannot be executed for internal inspections of the device. This is because, in particular, the required memory increases for each new layer in the created program code. Providing in an emulated, identical memory layout tolerates the remaining small uncertainties because the inspection is not performed in a hardware environment.
[0022] Alternatively, the inspection for algorithmic completeness can also be performed step - by - step only for one or more specific layers. In this case, the inspection code for the inspection is inserted into the original program code only for each one layer of the neural network step - by - step for all layers of the neural network. For each version, the algorithmic completeness for the corresponding layer is inspected.
[0023] The inspection of algorithmic completeness may require temporary storage of the output tensors of partial computations when tiling is applied, as in other cases the partial computations are overwritten by subsequent computations and thus can no longer be used for the inspection of algorithmic completeness.
[0024] Therefore, a defined neural network can be configured to include a serial sequence of multiple layers to be calculated using tiling in partial computations in the created program code. Here, the layers to which tiling is applied are calculated per thread, providing a partial output tensor for each thread. The partial output tensor is integrated into the output tensor of the last layer to which tiling is applied. In this case, the inspection of algorithmic completeness is performed by integrating the respective numerical outputs of the output tensors of the specific layers to which tiling is applied for each thread and comparing the integrated output tensor with the numerical output of the original neural network.
[0025] Tiling is a strategy for reducing the main memory required to a maximum by dividing some consecutive layers within a network. Layers calculated using tiling can include additional slicing operators that are not compliant in the neural network defined according to the network definition. These layers divide the input tensor of the thread to which tiling is applied into partial input tensors for each thread to which tiling is applied. Also, according to tiling, additional chain operators (concatenations) that are not compliant in the defined neural network are introduced. During tiling, multiple divided layers are calculated for each thread until one output tensor is produced as a partial result for each, and after the calculation of all threads, each output tensor is integrated into the whole of the output tensor by additional concatenation. The partial results of the layers within the thread to which tiling is applied do not exist in a complete form in memory, and in all cases, they do not exist in memory in an integrated state even when using a memory layout that does not overwrite the results.
[0026] Therefore, the inspection of the above criteria in the layer to which tiling is applied must include that the output tensor of the partial calculation of the layer by the generated program code can be integrated outside the hardware environment or outside with respect to the output tensor of the layer (especially by concatenation) or can already be stored in an integrated state, and that the output tensor is temporarily stored so that the integrated output tensor matches the calculation of the layer of the original neural network.
[0027] Prior to code generation, it is possible to perform an inspection as to whether the neural network has characteristics that could give rise to errors in implementation or safety. This can be, for example, - when the neural network includes layers that are not supported, - when the neural network includes configurations that are not supported, for example, layers having a kernel size that is not supported in convolution. - If you are using a data type that is not supported by the neural network, - If the generated program code is not compatible with the data memory of the hardware environment. It occurs in [location].
[0028] A coupled layer can be created consisting of two consecutive operations, where the intermediate result is not written to memory. Therefore, for example, ReLU, Leaky-ReLU, and other activation functions can be integrated in the preceding layer. Pooling layers can also be integrated with the preceding convolutional layer.
[0029] To circumvent these limitations, layer fusion can be optionally implemented within the code generator. First, the generated program code is produced without layer fusion, and then, if all of the above criteria are met, the generated program code is newly generated with layer fusion, and all criteria are checked again.
[0030] Furthermore, a check criterion may be established to ensure that the computations of each layer of the created program code are written only to the memory area allocated for that layer's computation. Each layer of the created program code writes only to a weight prefetch buffer for preloading computation parameters before its own computation, and this computation can only be performed by accessing the scratchpad buffer and the intermediate buffer for outputting the output tensor. To check whether the layer is written to any other memory area, the data in global main memory is compared with the data after weight prefetching, before weight prefetching, thereby comparing the data in main memory before layer execution with the data in main memory after layer execution.
[0031] These checks are important for identifying potential errors in the layer implementation and can be performed in two ways: 1. After computing each layer of the created program code, the entire memory area of the neural network is copied to a separate auxiliary memory area, and all versions are compared after the created program code has been fully executed. At this time, the aforementioned memory area can be identified and compared with the memory layout for the created program code, thereby allowing for the identification of memory writes outside the memory area indicated by the memory layout for each layer. 2. Copy the main memory only before weight prefetching for each layer, compare it with the state after weight prefetching, and then copy it again after weight prefetching. That is, compare it with the state after processing before computing each layer, thereby allowing the memory regions with errors to be identified based on a comparison with the memory layout, as described above.
[0032] To avoid overlooking potential problems in weight prefetching, it is desirable that the weight prefetch buffer be inspected after weight prefetching is performed and before the computation of the specific layer is invoked.
[0033] Due to the high memory demand, this strategy is unfeasible in devices with severely limited main memory unless modifications are made for internal device testing. This is because the method itself duplicates the required main memory according to point 2. In this case, a stepwise approach is necessary. In this way, for example, 10% of the main memory can be temporarily stored at each step, and the process can be repeated 10 times. Due to the completely static memory layout of the implementation, this method is equivalent to a single test of the entire memory.
[0034] The embodiments will be described in more detail below with reference to the attached drawings. The drawings show the following: [Brief explanation of the drawing]
[0035] [Figure 1]This is a schematic diagram showing a platform for code generation and implementation in a hardware environment. [Figure 2] This flowchart outlines a method for checking the algorithmic integrity of program code generated to compute neural networks in a hardware environment. [Figure 3a] This figure shows the portion of a defined and created neural network that has layers to which tiling is applied, and the memory plan that stores the output tensors of each layer. [Figure 3b] This figure shows the portion of a defined and created neural network that has layers to which tiling is applied, and the memory plan that stores the output tensors of each layer. [Figure 3c] This figure shows the portion of a defined and created neural network that has layers to which tiling is applied, and the memory plan that stores the output tensors of each layer. [Modes for carrying out the invention]
[0036] Description of the Embodiment Figure 1 shows a block diagram of platform 1 having a code generator that performs code generation to produce program code and implements the generated or created program code in a hardware environment 2. The hardware environment corresponds to a control device including, for example, a microcontroller, microprocessor, or similar. Code generation is performed in a conventional computer 3 or workstation, with the configuration of the neural network set up. Computer 3 is configured to perform memory planning and code generation, where memory planning first performs the arrangement of memory areas to accommodate at least one input data block, a scratchpad block, an optional weight prefetch block, and at least one output data block for each computation step of the neural network.
[0037] A scratchpad block is a memory area where the implementation of a neural network layer is required to store intermediate results during computation. A neural network layer does not contain its own unique results, and its contents are not used in subsequent layers.
[0038] An output data block contains the output of a layer within the network, unless that block is the output block for the entire network. The output data block is then used as input by later layers of the neural network.
[0039] The weight prefetch block consists of the trained weights of a neural network, which are preloaded by data memory in main memory to reduce processing time.
[0040] Main memory is used to store all input data blocks, output data blocks, scratchpad blocks, and weight prefetch blocks for the duration required for the computation of one or more layers. The same memory region can be reused to store different memory regions during the execution of later layers.
[0041] The code generator is used to create program code adapted to the hardware environment. Once the code is generated, it is transmitted to the hardware environment 2, as in the conventional method, where it is implemented or executed.
[0042] However, it must be guaranteed that the program code will be reliably executed in the hardware environment, and for this purpose, an automated method is constructed that is performed by a code generator.
[0043] The method described below is used to check the algorithmic integrity of the program code generated for neural network computation before implementing the program code in hardware environment 2.
[0044] The flow of this method will be explained in more detail according to the flowchart in Figure 2.
[0045] Here, in step S1, program code created to compute the neural network in the hardware environment is prepared based on the defined neural network. The defined neural network can be provided, for example, as an ONNX data file, as a Keras model, as a TensorFlow-Lite model, or as a PyTorch model. It can be implemented in a higher-order language, such as C, or in a lower-order representation, such as LLVM or inline assembly, or in a combination of these representations that call pre-implemented library functions, such as C code.
[0046] Following or simultaneously with step S1, in step S2, a test code is provided in the created program code, where the test code is used to test the execution of the program code in the hardware environment based on at least one test criterion when the test code is executed.
[0047] Next, in step S3, the program code containing the test code is implemented and executed within the hardware environment.
[0048] In step S4, it is checked whether at least one inspection criterion is met. If it is met (option: yes), the method can proceed to step S5; otherwise, an error is reported in step S8 and the method terminates.
[0049] When multiple inspection criteria are being tested, the necessary inspection code for each criterion is always newly added to the program code and made implementable in the hardware environment in which the inspection is performed.
[0050] In step S5, program code is generated or created without check code.
[0051] In step S6, the program code created without test code is implemented in the hardware environment.
[0052] The tests here can be conducted based on various testing standards.
[0053] Therefore, it is possible to execute calculations of the created program code that are allocated to the auxiliary memory memory area. If the calculations describe memory areas outside of main memory that are set aside for the operation of the created program code, then errors in the algorithm can be checked.
[0054] Furthermore, by calculating the numerical output of each layer of the defined neural network and comparing it with the numerical output of the corresponding layer of the created neural network, the created program code can be checked for algorithmic integrity. For example, if a deviation occurs that exceeds the number of digits (order of magnitude) of the deviation expected based on the final precision of the floating-point numbers, an error can be detected in the algorithm.
[0055] Checking the integrity of an algorithm can request temporary memory for the output tensors of sub-computations when tiling is applied. However, these sub-computations are otherwise overwritten by subsequent computations in the layer to which tiling is applied, and therefore the temporary memory is no longer available for checking the integrity of the algorithm.
[0056] Figure 3a shows an example of a portion from a given neural network 10 having three serial convolutional layers 11-13 and a subsequent fully connected layer 14. Figure 3b shows the structure of the neural network using the generated program code, in which tiling is applied to two of the convolutional layers.
[0057] Layers 12 and 13, to which tiling is applied, are computed sequentially for each thread in operations OP1, OP2, OP3, and OP4, thereby providing partial output tensors TA1 and TA2 for each thread. These partial output tensors TA1 and TA2 are then integrated into the output tensor A of the last layer to which tiling is applied. The integrity of the algorithm is checked by integrating the numerical outputs of the partial output tensors of the last layer to which tiling is applied for each thread and comparing the integrated output tensors with the numerical outputs of the computations of the corresponding layers of the neural network created according to the standard library.
[0058] To also examine the calculations of layer 12 to which tiling is applied, the partial output tensors of operations OP1 and OP3 are temporarily stored so that these partial output tensors are not overwritten by subsequent operations. The temporarily stored partial output tensors are combined and compared to the numerical output of the output tensor of the corresponding layer in the original neural network.
[0059] In Figure 3c, the memory plan showing the memory region where the output tensors of each operation OP0 to OP6 are written, it can be seen that the output tensors of operations OP1 and OP3 are created separately in memory after the execution of operation OP3. This allows the algorithm's completeness to be checked after the execution of operation OP3.
[0060] Furthermore, the generated program code can be inspected by executing the calculations of each layer of the program code. If it is detected that a write operation has occurred to a memory area not allocated to that layer, an error in the algorithm can be identified.
[0061] Before code generation, it is possible to perform checks to determine whether the neural network has any characteristics that could lead to implementation errors or safety issues. This is, for example, - If the neural network includes layers that are not supported, - If the neural network has an unsupported configuration, for example, if it includes layers with kernel sizes that are not supported in convolution, - If you are using a data type that is not supported by the neural network, - If the generated program code is not compatible with the data memory of the hardware environment. It occurs in [location].
Claims
1. A computer-implemented method for operating a code generator that creates and implements program code tested for computing a neural network in a hardware environment (2), comprising the following steps, namely: - Step (S1) of creating program code for computing a neural network in a hardware environment based on a defined neural network, - Step (S2) of providing a test code in the created program code, wherein the test code is for checking the execution of the program code in the hardware environment (2) based on at least one test criterion when the test code is executed, - Step (S3) of implementing and executing the program code on which the aforementioned test code is provided in the hardware environment (2), - If it is confirmed that at least one of the above inspection criteria is met, the step of creating program code that does not include inspection code (S5), - Step (S6) of implementing the program code that does not include the aforementioned test code in the hardware environment (2), A method that includes this.
2. The aforementioned at least one inspection criterion is the following inspection, namely, - The execution of the program code created in the hardware environment (2) results in writing the output of the layer only to the memory area provided for the operation of the created program code in the memory area, - In order to verify the integrity of the algorithm, the numerical output of the calculations of each layer of the neural network by the created program code is matched to the numerical output of the original neural network calculation by the tool provided for the calculation. - The calculations of each layer of the created program code will write the corresponding output only to the memory area allocated for the calculation of that layer. including, The method according to claim 1.
3. The aforementioned program code is generated from defined program code using a code generator, particularly an embedded AI coder, and the defined program code describes, in general definition descriptions, a neural network provided as an ONNX data file, a model stored in Keras, a TensorFlow-Lite model, or an exported PyTorch data file. The method according to claim 2.
4. In order to verify the generated program code, the output of the layer is written to a memory area based on the calculations of the generated program code, and it is checked whether the memory area is located outside the memory area provided for the relevant layer of the neural network. The method according to claim 2 or 3.
5. For testing purposes, calculations for each layer of the created program code are performed based on the numerical output of each layer of the defined neural network, and the layers of the neural network are calculated in accordance with the network definition using a tool that executes the original neural network or standard library, and the output tensor is calculated and temporarily stored. Each layer of the created program code is computed, the corresponding output tensor is obtained, and this output tensor is compared with the output tensor of the layer computed using the neural network executed with the tool or the standard library, thereby confirming that each layer in the created program code generates the correct numerical output. The method according to any one of claims 2 to 4.
6. The defined neural network includes a serial sequence of multiple layers to be computed in the created program code using tilling in the partial computation, wherein the layer to which tilling is applied is computed for each thread, thereby providing a partial output tensor for each thread, and the partial output tensor is integrated into the output tensor of the last layer to which tilling is applied. The algorithm's integrity is checked by integrating the numerical outputs of the output tensors of the specific layer to which tilting is applied for each thread, and comparing the resulting integrated output tensor with the numerical output of the computation of the corresponding layer of the neural network, which is created using tools or according to the standard library. The method according to claim 5.
7. The above program code has already been checked before it was created, and the creation of the program code is performed in each of the following cases, namely, - If the defined neural network includes layers that are not supported, - If the defined neural network includes a configuration that is not supported, particularly layers with kernel sizes that are not supported in convolutions, - If you are using a data type that is not supported by the defined neural network, and / or, - If the generated program code requires more memory than the data memory of the hardware environment. Executed only in The method according to any one of claims 1 to 6.
8. A code generator for carrying out one of the methods described in any one of claims 1 to 7.
9. A computer program product comprising, when the program is executed by at least one data processing device, instructions for causing the data processing device to perform a step of the method according to any one of claims 1 to 7.
10. A machine-readable storage medium that, when executed by at least one data processing device, includes instructions causing the data processing device to perform a step of the method according to any one of claims 1 to 7.