Verification method and device for ai computing
By using a heterogeneous parallel verification method to verify AI chips, the problem of the lack of error detection mechanism in AI chips is solved, ensuring the real-time performance and safety of intelligent driving systems, reducing hardware costs, and improving system reliability and availability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- YINWANG INTELLIGENT TECHNOLOGIES CO LTD
- Filing Date
- 2021-08-12
- Publication Date
- 2026-06-26
AI Technical Summary
Existing AI chips lack effective error detection mechanisms, resulting in insufficient reliability of the computing platform of intelligent driving systems under harsh environments and electromagnetic interference, failing to meet the requirements of vehicle safety integrity level. Furthermore, traditional error detection methods are not applicable to AI chips, affecting computing real-time performance and security.
This paper presents a heterogeneous parallel verification method in which verification is performed by a computing unit different from the AI computing unit. By using heterogeneous computing units to perform heterogeneous verification on different processing layers of the AI model, the computational load and hardware cost are reduced, and the real-time performance and accuracy of the verification results are ensured.
This approach improves the reliability and security of AI chips without compromising the real-time performance of AI computing, reduces hardware costs, avoids resource waste, and enhances system availability.
Smart Images

Figure CN115705487B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence, and more specifically, to a method and apparatus for verifying AI computation. Background Technology
[0002] Artificial intelligence (AI) is the theory, methods, technology, and application system that uses digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. AI has a wide range of applications; data processing in fields such as transportation, healthcare, and security can all be accomplished through AI neural networks. The more data that needs to be analyzed and processed, the greater the computational load on the neural network. Taking autonomous driving scenarios as an example, high-level autonomous vehicles are often equipped with multiple cameras, lidar, ultrasonic radar, and other sensors to achieve comprehensive perception of their surroundings. This generates a large amount of information that needs to be processed. Furthermore, autonomous vehicles have very high real-time requirements for neural network inference and computation. If the neural network's inference and computation lags, it cannot provide timely environmental information for subsequent control decisions, reducing the safety of autonomous driving. Traditional central processing units (CPUs) cannot handle the inference calculations of such massive neural networks. Therefore, artificial intelligence chips are used as hardware acceleration units specifically for neural network inference calculations. AI chips are faster and more energy-efficient than traditional chips when performing neural network inference calculations. Currently commonly used AI chips include graphics processing units (GPUs), field programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs).
[0003] Autonomous vehicles operate in external environments and may encounter various severe weather conditions and electromagnetic interference, requiring extremely high reliability from the computing platform of the autonomous driving system. Traditional components such as CPUs and memory have error detection and tolerance mechanisms such as program flow monitoring, data flow monitoring, memory error checking and correction (ECC), and parity checking to ensure that data in the CPU and memory is not affected by soft failures. For AI chips, in order to achieve their high-speed computing, AI chips generally lack effective error detection mechanisms. Furthermore, due to the different computing architectures of AI chips and traditional chips such as CPUs, the error detection mechanisms of traditional chips cannot be directly applied to AI chips.
[0004] To ensure the safety of intelligent driving, the computing platform needs to meet the requirements of the automotive safety integrity level (ASIL). An error detection method is needed to perform real-time detection of the AI chip, thereby ensuring that the application of the AI chip meets the needs of the application scenario. Summary of the Invention
[0005] This application provides a verification method and apparatus for AI computing. The verification method can be executed by a computing unit other than the computing unit that processes AI computing, without affecting the processing of AI computing. Compared with redundant verification, the verification method for AI computing in this application embodiment has a very small amount of computation and low performance requirements for the computing unit used for verification, thereby reducing hardware costs and ensuring the reliability of AI chips.
[0006] Firstly, a verification method for AI computation is provided. The method is executed by a first computing unit and includes: obtaining parameters of an AI model processed by a second computing unit for AI computation, the AI model including one or more first processing layers; performing the following verification processing on each of the one or more first processing layers to obtain a verification flag bit for each of the one or more first processing layers: obtaining input data of the first processing layer from the second computing unit; performing verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain a verification flag bit for the first processing layer, wherein the computational amount of the verification processing of the first processing layer is less than the computational amount of the second computing unit processing the input data through the first processing layer; determining whether the output result of the second computing unit processing the AI computation is correct based on the verification result, the verification result including the verification flag bit of each of the one or more first processing layers.
[0007] The AI computation verification method in this application embodiment is executed by a computing unit other than the computing unit that performs AI computation. Compared with the verification method that periodically runs the self-test library, the AI computation verification method in this application embodiment does not interfere with the normal inference computation of the AI model. Therefore, it does not affect the acceleration performance of the AI computing unit and avoids the same AI computing unit performing verification while performing AI computation. It ensures the correctness of the AI model output results while ensuring the real-time performance of the AI model's inference computation. Furthermore, since the computational amount of verification processing is less than that of AI computation, the performance requirements of the computing unit used for verification processing can be no higher than those of the computing unit used for AI computation. Compared with redundant computing units performing verification, the heterogeneous verification method provided in this application embodiment can save power consumption and reduce costs.
[0008] In some possible implementations, the AI model further includes one or more second processing layers, and the method further includes: performing redundancy checks on each of the one or more second processing layers to obtain a check flag bit for each of the one or more second processing layers; the check result further includes the check flag bit for each of the one or more second processing layers.
[0009] The second processing layer includes a pooling layer and an activation layer. Since pooling and activation calculations only consume a small amount of resources, even using redundant verification will not consume too many resources.
[0010] In some possible implementations, the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the first processing layer is validated based on the parameters of the AI model and the input data of the first processing layer to obtain the validation flag bits of the first processing layer, including:
[0011] Obtain the first verification flag bit, which is obtained by performing a first verification calculation on the weight matrix; obtain the second verification flag bit, which is obtained by performing a second verification calculation on the feature map matrix; obtain the pre-calculation verification flag bit based on the first and second verification flag bits; obtain the output matrix from the second calculation unit, which is obtained by the second calculation unit from the weight matrix and feature map matrix in the first processing layer; perform a third verification calculation on the output matrix to obtain the post-calculation verification flag bit; obtain the verification flag bit based on the pre-calculation verification flag bit and the post-calculation verification flag bit.
[0012] The aforementioned weight matrix and feature map matrix can be calculated offline and stored in memory. Furthermore, the matrix calculations for obtaining different marker bits differ. The AI computation verification method in this application embodiment designs different verification methods for different processing layers in the AI model, maximizing the conservation of computational resources and enabling verification to be performed in computationally efficient units with lower computational capabilities, thus reducing verification costs.
[0013] In some possible implementations, the verification flag indicates whether the verification flag before and after the calculation are consistent. Based on the verification result, it is determined whether the output result of the second calculation unit processing the AI calculation is correct, including: if at least one verification flag in the verification result indicates that the verification flag before and after the calculation are inconsistent, then the output result is incorrect.
[0014] In some possible implementations, the first processing layer is a convolutional layer or a fully connected layer. Multiple first processing layers may include multiple convolutional layers, or multiple fully connected layers, or one or more convolutional layers and one or more fully connected layers.
[0015] In some possible implementations, when the output result is determined to be incorrect, the state of the second computing unit includes transient failure and permanent failure.
[0016] In some possible implementations, when the output result is determined to be incorrect, the method also includes: determining whether the state of the second computing unit is transient or permanent failure by running a self-test library.
[0017] In some possible implementations, the method further includes: when the state of the second computing unit is permanently failed, reporting the failure state of the second computing unit.
[0018] The AI computing verification method in this application embodiment can further determine the specific failure state of the hardware after determining that the hardware has failed. If the computing unit only experiences a transient failure, it can continue to be used, thereby avoiding waste of resources and improving the availability of the AI chip.
[0019] Secondly, a verification method for AI computation is provided, characterized in that the method is executed by a first computing unit, and the method includes: obtaining the verification result of the output result of the AI model processed by the second computing unit for AI computation, wherein the verification result determines that the output result is incorrect; and running a self-test library to determine whether the state of the second computing unit is transient failure or permanent failure.
[0020] In some possible implementations, the self-test library determines whether the state of the second computing unit is transient or permanent, including: when the self-test library's result is no fault, the state of the second computing unit is transient; when the self-test library's result is a fault, the state of the second computing unit is permanent.
[0021] In some possible implementations, the method further includes: discarding the output result when the state of the second computing unit is transient failure; and reporting the failure state of the second computing unit when the state of the second computing unit is permanent failure.
[0022] The AI computation verification method in this application embodiment uses CPU-executed system scheduling to call a self-test library to perform a self-test on AI cores that have experienced hardware failures. It determines whether the AI core has experienced a permanent or transient failure. If the self-test does not detect a fault, it indicates a transient failure of the AI core, which does not affect subsequent computation, and the AI core can continue to participate in system operations. If the self-test detects a fault, it indicates a permanent failure of the AI core, in which case the AI core cannot continue to participate in computation and a fault report is required. This avoids directly disabling failed AI cores, reducing resource waste and improving the availability of the AI chip.
[0023] Thirdly, an AI computation verification device is provided, comprising: a transceiver unit, configured to acquire parameters of an AI model processed by a second computing unit, the AI model including one or more first processing layers; performing the following verification processing on each of the one or more first processing layers to obtain a verification flag bit for each of the one or more first processing layers; the transceiver unit is further configured to acquire input data of the first processing layer from the second computing unit; a processing unit, configured to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain a verification flag bit for the first processing layer, wherein the computational amount of the verification processing on the first processing layer is less than the computational amount of the second computing unit processing the input data through the first processing layer; the processing unit is further configured to determine whether the output result of the second computing unit processing the AI computation is correct based on the verification result, the verification result including a verification flag bit for each of the one or more first processing layers.
[0024] In some possible implementations, the AI model further includes one or more second processing layers, and the processing unit is further configured to: perform redundancy checks on each of the one or more second processing layers to obtain a check flag bit for each of the one or more second processing layers; the check result also includes the check flag bit for each of the one or more second processing layers.
[0025] In some possible implementations, the parameters of the AI model include a weight matrix, and the input data of the first processing layer includes a feature map matrix. Specifically, the processing unit is used to: obtain a first verification flag, which is obtained by performing a first verification calculation on the weight matrix; obtain a second verification flag, which is obtained by performing a second verification calculation on the feature map matrix; obtain a pre-calculation verification flag based on the first and second verification flags; obtain an output matrix from the second calculation unit, which is obtained by the second calculation unit performing calculations on the weight matrix and feature map matrix in the first processing layer; perform a third verification calculation on the output matrix to obtain a post-calculation verification flag; and obtain a verification flag based on the pre-calculation and post-calculation verification flags.
[0026] In some possible implementations, the verification flag indicates whether the verification flag before calculation and the verification flag after calculation are consistent. The processing unit is specifically used to: if at least one verification flag in the verification result indicates that the verification flag before calculation and the verification flag after calculation are inconsistent, then the output result is incorrect.
[0027] In some possible implementations, the first processing layer is a convolutional layer or a fully connected layer.
[0028] In some possible implementations, when the output result is determined to be incorrect, the state of the second computing unit includes transient failure and permanent failure.
[0029] In some possible implementations, when the output result is determined to be incorrect, the processing unit is also used to: determine whether the state of the second computing unit is transient or permanent failure by running the self-test library.
[0030] In some possible implementations, the transceiver unit is also used to: report the failure status of the second computing unit when the second computing unit is in a state of permanent failure.
[0031] Fourthly, an AI computation verification device is provided, comprising: a transceiver unit for acquiring the verification result of the output result of the AI model processed by the second computation unit, wherein the verification result determines that the output result is incorrect; and a processing unit for running a self-test library to determine whether the state of the second computation unit is transient failure or permanent failure.
[0032] In some possible implementations, when the result of running the self-test library is no fault, the state of the second computing unit is transient failure; when the result of running the self-test library is a fault, the state of the second computing unit is permanent failure.
[0033] In some possible implementations, the device is also used to: discard the output result when the state of the second computing unit is transient failure; and report the failure state of the second computing unit when the state of the second computing unit is permanent failure.
[0034] Fifthly, a chip is provided, including a first computing unit, which is used to execute the method in any possible implementation of the first and second aspects described above.
[0035] In some possible implementations, the chip also includes a second computing unit for performing AI calculations.
[0036] In a sixth aspect, a computer-readable medium is provided, characterized in that the computer-readable medium stores program code that, when the computer program code is run on a computer, causes the computer to perform the method in any of the possible implementations of the first and second aspects described above.
[0037] In a seventh aspect, a computing device is provided, including a first computing unit and a second computing unit, wherein the second computing unit is used to process AI calculations based on an AI model, and the first computing unit executes the method in any possible implementation of the first and second aspects described above.
[0038] In some possible implementations, the processing power of the first computing unit is less than or equal to the processing power of the second computing unit.
[0039] In some possible implementations, the first computing unit is at least one of a computing unit in an AI chip, a computing unit in a CPU chip, or a computing unit in a GPU chip, and the second computing unit is a computing unit in an AI chip. Attached Figure Description
[0040] Figure 1 This is a schematic diagram of the verification method for periodically running the self-test library according to an embodiment of this application;
[0041] Figure 2 This is a schematic diagram of the system architecture of a possible application of the AI calculation verification method according to an embodiment of this application;
[0042] Figure 3 This is a schematic diagram of the hardware units that may be involved when the AI calculation verification method of this application is applied to an intelligent driving computing platform;
[0043] Figure 4 This is a system architecture diagram of the AI calculation verification method applied in the embodiments of this application;
[0044] Figure 5 This is a schematic flowchart of the AI calculation verification method according to an embodiment of this application;
[0045] Figure 6 This is a schematic diagram illustrating the calculation of the offline verification flag bit in an embodiment of this application;
[0046] Figure 7 This is a schematic diagram of the calculation feature map verification flag bits in an embodiment of this application;
[0047] Figure 8 This is a schematic diagram of the calculation of the verification flag bit before calculation in an embodiment of this application;
[0048] Figure 9 This is a schematic diagram illustrating how the AI model processes data according to an embodiment of this application;
[0049] Figure 10 This is a schematic diagram of the calculation and verification of the flag bit in an embodiment of this application;
[0050] Figure 11 This is a schematic diagram illustrating the calculation of the verification flag bit in an embodiment of this application;
[0051] Figure 12 This is a schematic diagram illustrating the further detection of the specific failure state of the second computing unit after a failure is detected in an embodiment of this application.
[0052] Figure 13 This is a schematic diagram illustrating the determination of the specific failure state of the second computing unit according to an embodiment of this application;
[0053] Figure 14 This is a schematic flowchart of another AI calculation verification method according to an embodiment of this application;
[0054] Figure 15 A schematic block diagram of an AI calculation verification device provided in this application embodiment;
[0055] Figure 16 This is a schematic structural diagram of an AI calculation verification device according to an embodiment of this application. Detailed Implementation
[0056] The technical solutions in this application will now be described with reference to the accompanying drawings.
[0057] Currently, AI chips lack a dedicated verification mechanism to validate the inference results of neural networks. While neural network inference falls under AI computation, ensuring the correctness of these results is crucial. For example, in autonomous driving scenarios, making incorrect decisions based on erroneous neural network outputs could pose significant dangers to drivers. Therefore, to guarantee the correctness of neural network inference results, several verification methods have been proposed, including redundancy verification and periodic running of a self-test library (STL). Redundancy verification includes dual module redundancy (DMR) and triple module redundancy (TMR), which involves using two or three times the number of identical computing chips or units to simultaneously perform the same computation, then comparing the results. If the results match, the computation is correct; otherwise, it is incorrect. While redundancy verification theoretically verifies neural network outputs, it also increases costs by two or three times.
[0058] The following is combined Figure 1 This section introduces the verification methods for STL. Figure 1 Figure (a) illustrates a schematic diagram of an AI core in an AI chip performing four AI calculations sequentially in chronological order. Each AI calculation involves running a neural network to perform inference calculations and obtain an output result. To ensure the accuracy of the output result, the calculation result needs to be checked. Figure 1 Figure (b) shows the use of the STL detection method to detect four AI calculations. The STL detection is periodic. Ideally, the period of the STL detection is the same as the time of performing one AI calculation. That is, an STL detection is performed after each AI calculation, and the next AI calculation is performed only if the STL detection result is normal. Figure 1Figure (c) shows the situation where STL detects a hardware failure. As shown in Figure (c), during the third detection cycle, STL detection found that the AI chip had a permanent failure. Therefore, the AI chip will no longer perform AI calculations and will report the fault.
[0059] As described above, STL detection has some shortcomings. First, STL detection is performed periodically with a fixed detection period, but the AI computation time is uncertain. For example, the preset detection period is 10 milliseconds, but the AI computation time in a particular instance may exceed 10 milliseconds. Therefore, running STL detection may interrupt the AI computation process, leading to decision delays. Furthermore, AI computation must be stopped when performing STL detection, which cannot guarantee the real-time nature of the decision. Additionally, even with a preset detection period of 10 milliseconds, the AI computation time may be less than 10 milliseconds. For example… Figure 1 In the third detection cycle of Figure (c), AI calculations were performed twice within a single detection cycle. A failure of the AI chip was detected in the third detection cycle. This failure might have occurred during the first AI calculation in the third detection cycle, but STL detection failed to detect it in time. This erroneous calculation result led to incorrect decisions, potentially threatening the safety of the device. Furthermore, STL detection primarily identifies permanent failures and is weaker at detecting transient failures. If a transient failure occurs, STL detection will also classify it as a permanent failure and report the fault. However, AI chips contain numerous multiply-accumulate calculation units, resulting in a large area and a theoretically higher probability of being impacted by high-energy particles. Therefore, the probability of a transient failure in an AI chip is higher than that of a permanent failure. Since STL detection classifies a transient failure as a permanent failure and reports it as such, the system will deactivate the reported AI chip, wasting hardware resources and reducing system availability.
[0060] Therefore, this application provides an AI computation verification method that can be executed by a computing unit other than the computing unit that processes AI computations, without affecting the processing of AI computations. The verification is performed synchronously with the inference computation of the neural network, ensuring the correctness and real-time performance of the neural network's computation results. Compared with redundant verification, the AI computation verification method of this application has a very small computational load and low performance requirements for the computing unit used for verification, thereby reducing hardware costs and ensuring the reliability of AI chips.
[0061] The AI computation verification method of this application embodiment can be applied to any neural network inference computation scenario, including: scenarios with high security requirements for neural network inference computation, such as in 5G smart industry where massive amounts of industrial data are analyzed, processed, and optimized through AI neural networks, which requires more secure and reliable neural network inference computation. Therefore, the AI computation verification method of this application embodiment can be applied to 5G smart industry, such as control equipment, servers, and other devices deployed with AI chips, to improve the reliability of neural network inference computation; scenarios requiring large-scale deployment of AI chips, such as cloud computing servers, smart security cameras, autonomous vehicles, and terminal devices. These devices are large-scale, operate for long periods, and are prone to hardware failure. Hardware failure will adversely affect the accuracy of the AI chip's computation results and the availability of the system, reducing the user experience. Therefore, this application... The AI calculation verification method described in this embodiment can be applied to cloud computing servers, smart security cameras, and autonomous vehicles to verify the correctness of AI calculation results in real time. Scenarios involving numerous matrix operations exist, such as those found in smartphones and smart TVs, which are typically equipped with graphics processing units (GPUs). Considering cost, these GPUs have relatively low manufacturing requirements, resulting in a high failure rate and often lacking error detection capabilities. However, devices equipped with these GPUs need to process large amounts of image data, primarily involving matrix operations. When a failure occurs, it may manifest as image garbled text in smartphones or smart TVs, negatively impacting the user experience. Therefore, the AI calculation verification method described in this application, as a low-cost verification method, can be applied to smart devices equipped with the aforementioned GPUs, reducing the impact of hardware failures on data processing and improving the user experience.
[0062] Figure 2 The illustration shows the hardware units that may be involved in the application of the AI computation verification method according to embodiments of this application, including AI chips, CPU chips, memory chips, buses, and GPUs. Figure 2 As shown, neural network inference computation is performed by an AI chip, which may include one or more AI cores. AI computation can be performed by each AI core in the AI chip. An AI core is the smallest computing unit for inference computation in the AI chip; therefore, an AI core can be called an AI computing unit. On-chip storage units are used to cache the parameters of the AI model, intermediate computation results, and inference computation results. Each AI core is connected to the on-chip storage unit via a bus. CPUs and GPUs may also each include one or more computing units.
[0063] The neural network computation verification method of this application verifies each computation of the neural network. Each verification can be performed by the AI core of the AI chip, or by the computing unit in the CPU chip or GPU. The verification result is stored in the memory chip. Finally, the CPU chip reads the verification result from the memory chip, performs logical judgment, and determines whether the computation result of the neural network is correct. The AI chip, CPU chip, memory chip, and GPU are connected via a bus.
[0064] Taking intelligent driving scenarios as an example, Figure 3 This application illustrates a system architecture for an application of the AI computation verification method according to an embodiment of this application, such as... Figure 3 As shown, for intelligent driving vehicles, data acquired by sensors such as cameras, lidar, and ultrasonic radar needs to be processed by an intelligent driving computing platform to generate a series of execution commands, which are then sent to specific actuators for execution. For example, the steering actuator controls the vehicle's steering based on the execution commands, the throttle actuator controls the vehicle's acceleration, and the brake actuator controls the vehicle's deceleration. If the intelligent driving computing platform makes a calculation error and generates incorrect execution commands, the actuators executing these incorrect commands will cause driving hazards. Therefore, it is particularly necessary to verify the inference calculations of the neural network in the intelligent driving computing platform. The AI calculation verification method of this application can be applied to... Figure 3 In the intelligent driving computing platform, the inference calculations of the neural network in the intelligent driving computing platform are verified to ensure the correctness of the calculation results and ensure the safety of intelligent driving. Figure 3 The intelligent driving computing platform includes Figure 2 The hardware unit shown.
[0065] Figure 4 The system architecture diagram of the AI computation verification method according to an embodiment of this application is shown, as follows: Figure 4 As shown, the AI computation verification method in this embodiment is a heterogeneous parallel verification method, which is relative to traditional redundant verification computation. Redundant verification computation determines whether the verified process is correct by performing the same computation as the process being verified and comparing the computation results. For Figure 4For the AI computing unit being verified, another AI computing unit used for verification performs the same convolution, pooling, activation, and fully connected computations on the input data and outputs the computation results. The correctness of the verification of the AI computing unit is determined by comparing the computation results of the two AI computing units. This method requires that the processing power of the AI computing unit used for verification is the same as or higher than that of the AI computing unit being verified. The AI computing verification method in this application embodiment is a heterogeneous parallel verification method. Heterogeneity means that different computational processing is used for verification than the AI computing unit being verified. For example, different verification methods are designed for different structures such as convolution, fully connected, pooling, and normalization within the neural network, making the computation heterogeneous. Since the computation is heterogeneous, the computing unit used for verification can also be different from the AI computing unit being verified. The computing unit used for verification and the AI computing unit being verified can be hardware with different structures and capabilities. In this application embodiment, heterogeneous parallel verification can be a computational processing different from that of the AI computing unit being verified. In this case, the computing unit used for verification can be the same as the AI computing unit being verified. In this embodiment, heterogeneous parallel verification can also employ computational processing different from that used for the AI computation being verified. The verification unit and the AI computation unit being verified may have different structures or processing capabilities. For example, heterogeneous parallel verification of AI computation can be performed by the AI computation unit or by other computation units. Here, the computation units performing verification of AI inference computation are all referred to as heterogeneous computation units. The computing power of the heterogeneous computation unit can be the same as or lower than that of the computation unit performing AI inference computation. This is because the neural network computation verification method in this embodiment consumes far fewer resources for verifying AI computation than AI computation itself. Therefore, it can be performed in ordinary computation units with lower computing power, such as GPU computation units or CPUs with lower computing power, thereby lowering the verification threshold and saving costs. Parallelism refers to verifying each calculation within the neural network during AI computation, generating verification markers. Each calculation in the neural network refers to the computation performed by each data processing layer during data processing. Once the neural network completes its computation and outputs the result, the result can be judged based on all the verification markers to determine its correctness. This process does not require stopping AI computation, and the verification is real-time, meeting the real-time requirements of neural network inference computation in application scenarios. The following combines... Figure 4 The verification method for AI calculations in the embodiments of this application will be described in detail.
[0066] First, the AI computing unit performs the neural network's inference calculations normally. When the data to be processed is input into the AI computing unit, it undergoes preprocessing and is input into the AI kernel in the form of tensors. Simultaneously, the neural network model is also loaded into the AI kernel. The neural network model has different structures; the data to be processed is processed through multiple processing layers in the neural network model to obtain the inference result and output it. The processing layers in the neural network model can be categorized by structure as convolutional layers, pooling layers, activation layers, and fully connected layers. Typically, a neural network model can contain multiple processing layers, such as one or more convolutional layers, one or more pooling layers, one or more activation layers, and one or more fully connected layers. The calculations in convolutional and fully connected layers involve matrix multiplication. When the matrix dimension is high, matrix multiplication introduces a large amount of computation, consuming significant computing resources.
[0067] Figure 4 Taking a typical neural network model as an example, this paper illustrates the concept. This model includes convolutional layers, pooling layers, activation layers, and fully connected layers. It should be understood that the AI computation verification method in this application is not limited to... Figure 4 The structure of the neural network model described above is applicable to other neural network models, and each structure can have multiple identical processing layers. The preprocessed data undergoes computation through convolutional layers, pooling layers, activation layers, and fully connected layers to obtain the output calculation result. Then, the AI core transmits the output calculation result to the CPU, which executes the corresponding decision based on the result. As can be seen from the above process, the inference computation process of traditional neural networks does not include a verification mechanism, and the correctness of the calculation result transmitted to the CPU cannot be guaranteed. The AI computation verification method in this application can verify the inference computation of the neural network to ensure the correctness of the output calculation result and prevent the CPU from making incorrect decisions based on incorrect calculation results, which could lead to serious consequences.
[0068] Unlike redundancy checks that repeatedly perform one or more calculations on the neural network's inference computation, and unlike the self-test library's verification method that performs verification after the neural network's inference computation is complete, the heterogeneous computing unit performs synchronous verification of one or more calculations in the neural network within the AI core, according to the AI computation verification method of this application embodiment. Figure 4As shown, the first convolution calculation of a convolutional layer is verified to obtain verification flag 1. Since a convolutional layer may perform multiple convolution calculations, the second convolution calculation is verified to obtain verification flag 2, the third convolution calculation to obtain verification flag 3, and so on. The first calculation of a pooling layer is verified to obtain verification flag 2a, where a is a positive integer. The second calculation of a pooling layer is verified to obtain verification flag 2a+1, and so on. The first calculation of an activation layer is verified to obtain verification flag 3b, where b is a positive integer. The second calculation of an activation layer is verified to obtain verification flag 3b+1, and so on. The first calculation of a fully connected layer is verified to obtain verification flag 3c, where c is a positive integer. The second calculation of a fully connected layer is verified to obtain verification flag 3c+1, and so on. This verification is real-time and does not interfere with the normal inference calculation process of the neural network. When the neural network completes its inference computation and outputs the computation result, the synchronous verification of one or more computations within the neural network is also completed, generating one or more verification flags and storing them in memory. The CPU verifies the output result based on these one or more verification flags, determining whether the computation result output by the neural network is correct. Furthermore, the AI computation verification method in this application embodiment designs different verification methods for different internal structures of the neural network, such as... Figure 4 As shown, for processing layers such as convolutional layers and fully connected layers that consume significant computational resources due to matrix multiplication, matrix computation verification methods are used. For example, verification can be performed using dimension-reduced matrix computations, including vector-matrix multiplication or vector multiplication. The computational cost of matrix computation is lower than that of matrix multiplication in redundant verification methods, where redundant verification refers to performing the same calculation on the corresponding processing layer. Since over 99% of the computation in a neural network comes from matrix multiplication operations in structures such as convolutional layers and fully connected layers, and less than 1% comes from operations in structures such as pooling layers and activation layers, using a relatively low-computation method to verify matrix multiplication operations in structures such as convolutional layers and fully connected layers that consume the majority of computational resources can significantly reduce the computational cost of inference verification for the neural network. For operations in structures such as pooling layers and activation layers that consume very little computational resources, redundant verification can still be used to simplify the verification process. Alternatively, low-computation verification methods can be used to further reduce the computational cost of inference verification for the neural network.
[0069] According to the AI calculation verification method of this application embodiment, after the neural network completes the inference calculation and outputs the calculation result, the synchronous verification of one or more calculations in the neural network is also completed, and one or more verification flag bits are generated. The CPU reads one or more verification flag bits in memory and determines whether the calculation result output by the neural network is correct based on the one or more verification flag bits. If the calculation result is determined to be correct, the corresponding decision is executed. If the calculation result is determined to be incorrect, a fault is reported.
[0070] Figure 5 A schematic flowchart of the AI calculation verification method according to an embodiment of this application is shown, such as... Figure 5 The diagram includes steps 501 to 504, which will be described below.
[0071] S501, Obtain the parameters of the AI model for AI computation processed by the second computing unit. The AI model includes one or more first processing layers.
[0072] The second computing unit is the AI core that performs neural network inference calculations. Figure 5 The AI computation verification method shown is executed by a first computing unit. This first computing unit may not be the same as the second computing unit. The first computing unit can be another AI core on the AI chip where the second computing unit resides, or another computing unit on the AI chip with lower computing power. This other computing unit may or may not be designed for AI acceleration. The first computing unit can also be a computing unit on another AI chip, or it can be a CPU core in a CPU chip, where a CPU core is the smallest computing unit in a CPU chip. In one possible implementation, the first computing unit may also be the same computing unit as the second computing unit, meaning that this computing unit performs both AI computation and the AI computation verification method of this embodiment.
[0073] Since the second computing unit performs the inference computation of the neural network, the required AI model is loaded into the second computing unit. The AI model may include one or more processing layers for processing the data input to the AI model. The first processing layer consumes more computing resources; for a convolutional neural network model with a typical structure, it may be a convolutional layer or a fully connected layer. The AI model may include one or more first processing layers. For example, the AI model may include convolutional layers and fully connected layers, or it may include multiple convolutional layers and multiple fully connected layers. The above are just examples and are not intended to be limiting. Of course, the first processing layer can also be a pooling layer or an activation layer.
[0074] The first computing unit can obtain the AI model's parameters from the second computing unit. These parameters include weights, biases, etc., and may be, for example, a weight matrix. The AI model's parameters can be stored in system memory after offline training. The first computing unit can also obtain the AI model's parameters from system memory.
[0075] Steps S502 to S503 are performed on each of the one or more first processing layers:
[0076] S502, obtain the input data of the first processing layer from the second computing unit.
[0077] The first computing unit obtains the input data of the first processing layer from the second computing unit. Taking the convolutional neural network model with a typical structure in S501 as an example for processing image data, when the first processing layer is a convolutional layer, the input data of the first processing layer is the preprocessed image data; when the first processing layer is a fully connected layer, the input data of the first processing layer is the output data of the activation layer.
[0078] S503, the first processing layer is verified based on the parameters of the AI model and the input data of the first processing layer to obtain the verification flag bit of the first processing layer. The computational amount of the verification processing of the first processing layer is less than the computational amount of the second computing unit processing the input data through the first processing layer.
[0079] Specifically, taking convolutional and fully connected layers as examples in the first processing layer of the AI model, the data processing is converted into matrix calculations. Therefore, the parameters of the AI model obtained above include the weight matrix of the AI model, and the input data of the first processing layer includes the feature map matrix. The weight matrix includes multiple row vectors or multiple column vectors, and the feature map matrix includes multiple row vectors or multiple column vectors, each vector including multiple elements. The dimension of the matrix can be the number of rows or columns; the higher the dimension, the higher the complexity of the matrix calculation and the higher the computational resources consumed. The AI calculation verification method in this embodiment uses a dimension-reduced matrix calculation verification method to verify the calculation of the first processing layer to obtain the verification flag bit of the first processing layer. For example, it can be calculated using one or more matrices with fewer rows or columns than the weight matrix or feature map matrix. One possible implementation is as follows:
[0080] First, the first computational unit performs a first verification calculation on the weight matrix to obtain the first verification marker. Since the AI model's parameters are trained offline and stored in system memory, the parameters of the AI model participating in data processing can be verified offline; that is, the first verification marker can be an offline verification marker. The first verification calculation can use a reduced-dimensional matrix relative to the feature map matrix (a matrix with fewer rows or columns than the feature map matrix), and perform a matrix multiplication with the weight matrix to obtain the first verification marker. For example... Figure 6 As shown, a matrix multiplication operation is performed between the all-1 row vector and the weight matrix of the AI model to obtain the offline checkpoint (OC) for the weight matrix. The calculated offline checkpoint OC can be stored in memory and retrieved for use in subsequent verification processes. It should be understood that the matrix multiplication operation in this embodiment should satisfy the condition that the number of columns in the left matrix is the same as the number of rows in the right matrix, i.e. Figure 6 The number of columns in the all-one vector should be equal to the number of rows in the weight matrix, and other matrix multiplication operations in this embodiment should also satisfy this rule. Here, the all-one row vector has only one row, which is less than the number of rows in the feature map matrix in terms of row count. The matrix operation between the all-one row vector and the weight matrix is a dimension-reduced matrix operation compared to the matrix operation between the weight matrix and the feature map matrix. It should be noted that this is only an example and is not limited to this.
[0081] Taking the typical convolutional neural network model in S501 for image data processing as an example, such as... Figure 7 As shown, after each frame of image is input into the second computation unit, it is converted into a feature map matrix. The first computation unit performs a second check operation on the feature map matrix to obtain a second checkpoint marker, also known as a checkpoint feature map matrix (CF). The second check calculation can use a reduced-dimensional matrix relative to the weight matrix, a matrix with fewer rows or columns than the weight matrix, to perform matrix calculations with the feature map matrix to obtain the second checkpoint marker. Figure 7 As shown, a matrix multiplication operation is performed between the feature map matrix and the all-one column vector to obtain the feature map verification flag. Here, the all-one column vector has only one column, which is less than the number of columns in the weight matrix. Therefore, the matrix operation between the all-one column vector and the feature map matrix is a dimension-reduced matrix operation compared to the matrix operation between the weight matrix and the feature map matrix. It should be noted that this is merely an example and is not a limitation.
[0082] The first calculation unit obtains the pre-calculation verification flag bit based on the first verification flag bit (offline verification flag bit OC) and the second verification flag bit (feature map verification flag bit CF). For example... Figure 8As shown, the offline check flag OC and the feature map check flag CF are multiplied by a vector to obtain the check flag (CB_in) before computation.
[0083] Figure 9 The diagram shows that the second computation unit performs a convolution operation on the weight matrix and the feature map matrix to obtain the output matrix. This computation process is the normal data processing process of the first processing layer in the AI model, rather than a verification process.
[0084] The following is a brief introduction to convolution operations in convolutional layers. Convolution operations are performed by convolution operators, and a convolutional layer can contain multiple convolution operators, also known as convolution kernels. In image processing, a convolution operator acts as a filter to extract specific information from the input image matrix. Essentially, a convolution operator can be a weight matrix, which is usually predefined. During the convolution operation, the weight matrix typically processes the input image pixel by pixel (or two pixels by two pixels, depending on the stride) along the horizontal direction, thus extracting specific features from the image. The size of the weight matrix should be related to the image size. It's important to note that the depth dimension of the weight matrix is the same as the depth dimension of the input image; during the convolution operation, the weight matrix extends to the entire depth of the input image. Therefore, convolution with a single weight matrix produces a single-depth convolutional output. However, in most cases, a single weight matrix is not used; instead, multiple weight matrices of the same size (rows × columns) are applied, i.e., multiple identical matrices. The outputs of each weight matrix are stacked to form the depth dimension of the convolutional image. This depth can be understood as being determined by the "multiple" weights mentioned above. Different weight matrices can be used to extract different features from the image. For example, one weight matrix might be used to extract edge information, another to extract specific colors, and yet another to blur unwanted noise. These multiple weight matrices have the same size (rows × columns), and the resulting convolutional feature maps are also of the same size. These extracted feature maps are then merged to form the output of the convolution operation. The weight values in these weight matrices require extensive training in practical applications. The weight matrices formed by these trained weight values can be used to extract information from the input image, enabling the convolutional neural network to make correct predictions.
[0085] The first calculation unit obtains the output matrix from the second calculation unit, and then performs a third check calculation (also known as matrix multiplication) on the output matrix to obtain the post-calculation check flag. Specifically, as follows... Figure 10As shown, a convolution operation is performed between a matrix of all 1s and the output matrix. The matrix of all 1s must satisfy the condition that the number of columns is the same as the number of rows of the output matrix in order to perform matrix multiplication. From this, the check bit out (CB_out) can be obtained after the calculation.
[0086] The first calculation unit obtains the check bit (CB) based on the check flag bit CB_in before calculation and the check flag bit CB_out after calculation. The check flag bit can be obtained by subtracting CB_in from CB_out or by dividing CB_in from CB_out. The purpose is to compare the difference between CB_in and CB_out.
[0087] Figure 4 The above-mentioned methods are used for convolutional layers and fully connected layers. Figures 6 to 11 The matrix calculation verification method yields the verification flag CB, which is the result of the verification calculation. Figure 4 The check flag bit 1 and check flag bit 4c are in the code.
[0088] S504, determine whether the output of the AI model is correct based on the verification result. The verification result includes one or more verification flag bits of the first processing layer obtained according to steps S502 to S503.
[0089] The check flag indicates whether the check flag before and after the calculation are consistent. If at least one check flag in the check result indicates that the check flag before and after the calculation are inconsistent, the output result is incorrect. Figure 11 Taking the subtraction of CB_in and CB_out to obtain the verification flag CB as an example, CB_in is obtained from the offline verification flag OC and the feature map verification flag CF, representing the theoretical matrix calculation verification result of the output matrix, while CB_out is the actual matrix calculation verification result of the output matrix. If CB is 0, then CB_in and CB_out are the same, indicating that the theoretical matrix calculation verification result of the output matrix is the same as the actual matrix calculation verification result of the output matrix, and no error occurred in the first processing layer during the calculation based on the weight matrix and the feature map matrix; conversely, if CB is not 0, then CB_in and CB_out are different, indicating that the theoretical matrix calculation verification result of the output matrix is different from the actual matrix calculation verification result of the output matrix, and an error occurred in the first processing layer during the calculation based on the weight matrix and the feature map matrix.
[0090] according to Figures 6 to 11The process yields one or more verification flags for the first processing layer, denoted as the verification result. It can be seen that if at least one verification flag in the verification result indicates a discrepancy between the pre-calculation and post-calculation verification flags—for example, if at least one non-zero value exists for one or more verification flags of the first processing layer—an error occurred during the calculation based on the weight matrix and feature map matrix, resulting in an incorrect final output from the AI model. Conversely, if all verification flags in the verification result indicate a discrepancy between the pre-calculation and post-calculation verification flags—for example, if all verification flags of one or more first processing layers are zero—no errors occurred during the calculation based on the weight matrix and feature map matrix, resulting in a correct final output from the AI model.
[0091] Optionally, the verification flags of one or more of the first processing layers are stored in memory. After the AI model outputs the calculation result, the CPU reads the verification flags of one or more of the first processing layers from memory, and then determines whether the calculation result output by the AI model is correct based on these verification flags. If the calculation result output by the AI model is determined to be correct, the corresponding decision is executed, for example... Figure 2 Based on the calculations of the intelligent driving computing platform, a series of execution instructions are generated. The actuator controls the vehicle's steering, acceleration, deceleration, etc., according to the execution instructions. If the AI model's output calculation result is determined to be incorrect, a fault is reported to the second computing unit.
[0092] Figures 6 to 11 The method of verifying the reduced-dimensional matrix computation shown (hereinafter referred to as matrix computation verification) can be used to verify the computation of convolutional layers and fully connected layers. This is because both convolutional computation and fully connected computation are converted into matrix computation, and the method of verifying the reduced-dimensional matrix computation is applicable.
[0093] For one or more second processing layers in the AI model that are not verified using matrix calculations with dimensionality reduction, the AI calculation verification method in this application embodiment further includes performing redundancy verification on each of the one or more second processing layers to obtain verification flag bits for the second processing layers. Thus, the verification result also includes verification flag bits for one or more second processing layers. When determining whether the AI model output calculation result is correct, it is necessary to combine the verification flag bits of one or more first processing layers and the verification flag bits of one or more second processing layers for a comprehensive judgment. Specifically, redundancy verification involves performing the same calculations of one or more second processing layers on another chip or another computing unit based on the same input data and the same AI model, and then comparing whether the original calculation results of one or more second processing layers are consistent with the redundant calculation results to obtain verification flag bits for one or more second processing layers. The verification flag bits for one or more second processing layers can be the difference or quotient between the original calculation results and the redundant calculation results of one or more second processing layers. If the original calculation result and the redundant calculation result of one or more second processing layers are consistent, it means that no error occurred in the processing of data by one or more second processing layers. If the original calculation result and the redundant calculation result of one or more second processing layers are inconsistent, it means that an error occurred in the processing of data by one or more second processing layers, which will lead to an error in the final calculation result output by the AI model. Since pooling calculation and activation calculation only occupy a very small portion of resources, even using redundancy verification will not consume too many resources. The basis for determining that an error occurred in the processing of data by the second processing layer, rather than an error occurred in the redundant calculation, when the original calculation result and the redundant calculation result are inconsistent can be: in the redundancy verification, it can be set that as long as the original calculation result and the redundant calculation result are inconsistent, the original calculation is considered to have an error; or, the original calculation is performed by a normal chip or normal computing unit, and the redundant calculation is performed by a high-reliability chip or high-reliability computing unit. When the original calculation result and the redundant calculation result are inconsistent, since the chip or computing unit performing the redundant calculation is more reliable, the possibility of an error is lower, while the chip or computing unit performing the original calculation is less reliable, and the possibility of an error is higher. Therefore, when the original calculation result and the redundant calculation result are inconsistent, it is considered that the original calculation has an error.
[0094] When the output of the AI model is determined to be incorrect, it indicates that the AI chip containing the second computing unit has failed. This failure can be either transient or permanent. A transient failure only affects the calculation at the time of the transient failure but not subsequent calculations. A permanent failure, on the other hand, affects all calculations after the permanent failure. If the distinction between transient and permanent failures is not made, the system's security mechanism will automatically disable any second computing unit that is detected as having failed, even if it is still usable. This results in wasted resources and reduced system availability.
[0095] Therefore, the AI calculation verification method in this application embodiment further includes, when based on Figure 5 When the method in the code determines that the output of the AI model is incorrect, it uses a self-checking library to determine whether the failure of the second computational unit is transient or permanent. For example... Figure 12 As shown, when the second computing unit is performing AI calculations, the first computing unit simultaneously performs heterogeneous parallel verification. When the output result of the AI model is determined to be incorrect, that is, when the second computing unit is detected to have failed, the self-check library is run to determine whether the second computing unit has experienced a transient or permanent failure. If the second computing unit is determined to have experienced a transient failure, the output result of the AI model for that time is discarded, and the second computing unit is used to continue performing AI calculations. If the second computing unit is determined to have experienced a permanent failure, the failure of the second computing unit is reported.
[0096] Figure 13 This illustration shows a schematic diagram of the AI calculation verification method of this application for determining the specific failure state of the second calculation unit, as shown in the example. Figure 13 As shown, the CPU determines whether the AI model's output is erroneous based on multiple checksum flags in memory. The specific determination method is described above. Figure 5 The description in the previous embodiment will not be repeated here. If the output result of the AI model is determined to be incorrect, it indicates that the hardware has failed. Then, the self-test library is run to further determine whether the hardware has permanently failed. As mentioned above, STL is mainly used to identify permanent failures. Therefore, STL is called to perform a self-test on the failed hardware. If the self-test finds a fault, it means that the second computing unit has permanently failed. In this case, the second computing unit can no longer participate in the operation and needs to report the fault. If the self-test does not find a fault, it means that the second computing unit has experienced a transient failure, which will not affect the subsequent calculations. In this case, the erroneous output result only needs to be discarded, and the second computing unit can continue to perform subsequent calculations.
[0097] The AI computation verification method in this application uses a heterogeneous parallel verification approach to verify the AI model's computation. First, the verification is performed by a computing unit other than the one executing the AI computation. Compared to verification methods that periodically run a self-check library, the AI computation verification method in this application does not interfere with the normal inference computation of the AI model, thus not affecting the acceleration performance of the AI core. It also avoids the same AI core performing verification while executing AI computation, ensuring both the correctness of the AI model's output and the real-time performance of the AI model's inference computation. Second, the AI computation verification method in this application designs different verification methods for different processing layers in the AI model, maximizing the conservation of computing resources. This allows verification to be performed in computing units with lower computing power, reducing verification costs. After determining hardware failure, the AI computation verification method in this application can further determine the specific failure state of the hardware. If the computing unit only experiences a transient failure, it can continue to be used, thereby avoiding resource waste and improving the availability of the AI chip.
[0098] Figure 14 A schematic flowchart of another AI calculation verification method according to an embodiment of this application is shown, such as... Figure 14 As shown, it includes steps 1401 and 1402.
[0099] S1401, Obtain the verification result of the AI model's output from the second computing unit. The verification result indicates that the output result is incorrect.
[0100] S1402, run the self-test library to determine whether the state of the second computing unit is transient failure or permanent failure.
[0101] Figure 14 The method is executed by a first computing unit. The first computing unit can be another AI core on the AI chip where the second computing unit is located, or another computing unit with lower computing power on the AI chip where the second computing unit is located. This other computing unit may or may not be designed for AI acceleration. The first computing unit can also be a computing unit on another AI chip, or it can be a CPU core in a CPU chip, where a CPU core is the smallest computing unit in a CPU chip. In one possible implementation, the first computing unit can also be the same computing unit as the second computing unit, meaning that this computing unit performs both AI computation and the verification method of the AI computation in this embodiment. The verification result of the AI model's output is obtained from the second computing unit. This verification result can be obtained from... Figure 5 The verification result obtained by the method shown can also be obtained by any existing verification method.
[0102] When the self-test library's execution result is no fault, the second computing unit's state is transient failure; when the self-test library's execution result is a fault, the second computing unit's state is permanent failure. When the second computing unit's state is transient failure, the output result is discarded; when the second computing unit's state is permanent failure, the failure status of the second computing unit is reported.
[0103] The AI computation verification method in this application embodiment uses CPU-executed system scheduling to call STL to perform a self-check on AI cores experiencing hardware failures. It determines whether the AI core has experienced a permanent or transient failure. If the self-check detects no fault, it indicates a transient failure that does not affect subsequent computation, and the AI core can continue to participate in system operations. If the self-check detects a fault, it indicates a permanent failure, and the AI core cannot continue to participate in computation, requiring a fault report. This avoids directly disabling failed AI cores, reducing resource waste and improving the availability of the AI chip.
[0104] The above, combined with Figures 1 to 14 The verification method for AI calculations in the embodiments of this application is described in detail below. Figure 15 and Figure 16 This application provides a detailed description of the AI computation verification device provided in its embodiments. The AI computation verification device can be the first computation unit described above, used to perform verification of the AI computation. It should be understood that the description of the device embodiments corresponds to the description of the method embodiments; therefore, any content not described in detail can be found in the above method embodiments, and for the sake of brevity, will not be repeated here.
[0105] Figure 15 This is a schematic block diagram of an AI calculation verification device provided in an embodiment of this application. The device 1500 can specifically be a chip, an intelligent driving hardware platform, etc. The device 1500 includes a transceiver module 1510 and a processing module 1520. The transceiver module 1510 can implement corresponding communication functions, and the processing module 1520 is used for data processing. The transceiver module 1510 can also be referred to as a communication interface or a communication unit.
[0106] Optionally, the device 1500 may further include a storage module for storing instructions and / or data, and the processing module 1520 may read the instructions and / or data from the storage module to enable the device to implement the aforementioned method embodiments.
[0107] The device 1500 can be used to perform the actions in the above method embodiments. Specifically, the transceiver module 1510 is used to perform the transceiver-related operations in the above method embodiments, and the processing module 1520 is used to perform the processing-related operations in the above method embodiments.
[0108] The device 1500 can implement steps or processes corresponding to those in the method embodiments according to the present application, and the device 1500 may include methods for performing... Figure 5 , Figure 14 The modules of the method described above. Furthermore, each module in the device 1500 and the other operations and / or functions described above are respectively for implementing... Figure 5 , Figure 14 The corresponding flow of the method embodiment in the second node side.
[0109] Among them, when the device 1500 is used to perform Figure 5 When performing method 500, the transceiver module 1510 can be used to execute steps 501 and 502 in method 500; the processing module 1520 can be used to execute processing steps 503 and 504 in method 500.
[0110] Specifically, the transceiver module 1510 is used to obtain the parameters of the AI model processed by the second computing unit for AI calculation. The AI model includes one or more first processing layers. The transceiver module 1510 performs the following verification process on each of the one or more first processing layers to obtain a verification flag bit for each of the one or more first processing layers. The transceiver module 1510 is also used to obtain input data from the second computing unit for the first processing layer. The processing module 1520 is used to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain the verification flag bit of the first processing layer. The computational amount of the verification processing on the first processing layer is less than the computational amount of the second computing unit processing the input data through the first processing layer. The processing module 1520 is also used to determine whether the output result of the second computing unit's AI calculation is correct based on the verification result. The verification result includes the verification flag bit of each of the one or more first processing layers.
[0111] In some possible implementations, the AI model further includes one or more second processing layers, and the processing module 1520 is further configured to: perform redundancy checks on each of the one or more second processing layers to obtain a check flag bit for each of the one or more second processing layers; the check result also includes the check flag bit for each of the one or more second processing layers.
[0112] In some possible implementations, the parameters of the AI model include a weight matrix, the input data of the first processing layer includes a feature map matrix, and the processing module 1520 is specifically used for: obtaining a first verification flag bit, which is obtained by performing a first verification calculation on the weight matrix; obtaining a second verification flag bit, which is obtained by performing a second verification calculation on the feature map matrix; obtaining a pre-calculation verification flag bit based on the first and second verification flag bits; obtaining an output matrix from the second calculation unit, which is obtained by the second calculation unit performing calculations on the weight matrix and the feature map matrix in the first processing layer; performing a third verification calculation on the output matrix to obtain a post-calculation verification flag bit; and obtaining a verification flag bit based on the pre-calculation and post-calculation verification flag bits.
[0113] In some possible implementations, the verification flag indicates whether the verification flag before calculation and the verification flag after calculation are consistent. The processing module 1520 is specifically used to: if at least one verification flag in the verification result indicates that the verification flag before calculation and the verification flag after calculation are inconsistent, then the output result is incorrect.
[0114] In some possible implementations, the first processing layer is a convolutional layer or a fully connected layer.
[0115] In some possible implementations, when the output result is determined to be incorrect, the state of the second computing unit includes transient failure and permanent failure.
[0116] In some possible implementations, when the output result is determined to be incorrect, the processing module 1520 is also used to: determine whether the state of the second computing unit is transient failure or permanent failure by running the self-test library.
[0117] In some possible implementations, the transceiver module 1510 is also used to: report the failure status of the second computing unit when the status of the second computing unit is permanent failure.
[0118] When the device 1500 is used to perform Figure 14 When using method 1400, the transceiver module 1510 can be used to execute step 1401 in method 1400; the processing module 1520 can be used to execute processing step 1402 in method 1400.
[0119] Specifically, the transceiver module 1510 is used to obtain the verification result of the output result of the AI model processed by the second computing unit, and the verification result determines that the output result is incorrect; the processing module 1520 is used to run the self-test library to determine whether the state of the second computing unit is transient failure or permanent failure.
[0120] In some possible implementations, when the result of running the self-test library is no fault, the state of the second computing unit is transient failure; when the result of running the self-test library is a fault, the state of the second computing unit is permanent failure.
[0121] In some possible implementations, the device 1500 is also used to: discard the output result when the state of the second computing unit is transient failure; and report the failure state of the second computing unit when the state of the second computing unit is permanent failure.
[0122] It should be understood that the specific process of each module performing the above-mentioned steps has been described in detail in the above method embodiments, and will not be repeated here for the sake of brevity.
[0123] like Figure 16 As shown in the figure, this application embodiment also provides an AI calculation verification device 1600. Figure 16 The AI computing device 1600 shown may include a memory 1610, a processor 1620, and a communication interface 1630. The memory 1610, processor 1620, and communication interface 1630 are connected via internal interconnection. The memory 1610 stores instructions, and the processor 1620 executes the instructions stored in the memory 1620 to control the communication interface 1630 to receive input samples or send prediction results. Optionally, the memory 1610 may be coupled to the processor 1620 via an interface, or it may be integrated with the processor 1620.
[0124] It should be noted that the aforementioned communication interface 1630 uses a transceiver device, such as, but not limited to, a transceiver, to enable communication between the communication device 1600 and other devices or communication networks. The aforementioned communication interface 1630 may also include an input / output interface.
[0125] In implementation, each step of the above method can be completed by the integrated logic circuitry of the processor 1620 or by software instructions. The method disclosed in the embodiments of this application can be directly implemented by a hardware processor, or by a combination of hardware and software modules within the processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory 1610, and the processor 1620 reads information from memory 1610 and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, detailed descriptions are omitted here.
[0126] It should be understood that in the embodiments of this application, the processor can be a central processing unit (CPU), or it can be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor.
[0127] It should also be understood that, in embodiments of this application, the memory may include read-only memory and random access memory, and provides instructions and data to the processor. A portion of the processor may also include non-volatile random access memory. For example, the processor may also store device type information.
[0128] This application also provides a chip, characterized in that the chip includes a first computing unit, the first computing unit being used to perform the above-described... Figure 5 or Figure 14 The method in the middle.
[0129] Optionally, the chip also includes a second computing unit for performing AI calculations.
[0130] The embodiments also provide a computer-readable medium, characterized in that the computer-readable medium stores program code, which, when run on a computer, causes the computer to perform... Figure 5 or Figure 14 The method in the middle.
[0131] The embodiment also provides a computing device, including a first computing unit and a second computing unit. The second computing unit is used to process AI calculations based on an AI model, and the first computing unit performs... Figure 5 or Figure 14 The method described above. The processing power of the first computing unit is less than or equal to the processing power of the second computing unit. The first computing unit is at least one of a computing unit in an AI chip, a computing unit in a CPU chip, or a computing unit in a GPU chip, and the second computing unit is a computing unit in an AI chip.
[0132] As used in this specification, the terms "component," "module," "system," etc., are used to refer to computer-related entities, hardware, firmware, combinations of hardware and software, software, or software in execution. For example, a component can be, but is not limited to, a process running on a processor, a processor, an object, an executable file, an execution thread, a program, and / or a computer. As illustrated, applications running on computing devices and computing devices can both be components. One or more components may reside in a process and / or an execution thread, and components may be located on a single computer and / or distributed among two or more computers. Furthermore, these components can be executed from various computer-readable media on which various data structures are stored. Components can communicate, for example, via local and / or remote processes based on signals having one or more data packets (e.g., data from two components interacting with another component between a local system, a distributed system, and / or a network, such as the Internet interacting with other systems via signals).
[0133] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
[0134] Those skilled in the art will understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0135] In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0136] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0137] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0138] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0139] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A verification method for artificial intelligence (AI) calculations, characterized in that, The method is executed by the first computing unit, and the method includes: Obtain the parameters of the AI model for processing the AI calculation by the second computing unit, wherein the AI model includes one or more first processing layers; The following verification process is performed on each of the one or more first processing layers to obtain the verification flag bit of each of the one or more first processing layers: The input data of the first processing layer is obtained from the second computing unit; The first processing layer is verified based on the parameters of the AI model and the input data of the first processing layer to obtain the verification flag bit of the first processing layer. The computational amount of the verification processing of the first processing layer is less than the computational amount of the second computing unit processing the input data through the first processing layer. Based on the verification result, it is determined whether the output result of the AI calculation processed by the second computing unit is correct. The verification result includes the verification flag bit of each of the one or more first processing layers. Wherein, the first computing unit is at least one of the computing units in an AI chip, a computing unit in a CPU chip, or a computing unit in a GPU chip, and the second computing unit is a computing unit in an AI chip.
2. The method according to claim 1, characterized in that, The AI model further includes one or more second processing layers, and the method further includes: Redundancy check is performed on each of the one or more second processing layers to obtain the check flag bit of each of the one or more second processing layers; The verification result also includes the verification flag bit of each of the one or more second processing layers.
3. The method according to claim 1 or 2, characterized in that, The parameters of the AI model include a weight matrix, and the input data of the first processing layer includes a feature map matrix. The step of performing verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain verification marker bits for the first processing layer includes: Obtain the first verification flag bit, which is obtained by performing a first verification calculation on the weight matrix; Obtain the second verification flag bit, which is obtained by performing a second verification calculation on the feature map matrix; Obtain the pre-calculation check flag bit based on the first check flag bit and the second check flag bit; The output matrix is obtained from the second computing unit, and the output matrix is calculated by the second computing unit on the weight matrix and the feature map matrix in the first processing layer; Perform a third check calculation on the output matrix to obtain the post-calculated check flag. The verification flag bit is obtained based on the pre-calculation verification flag bit and the post-calculation verification flag bit.
4. The method according to claim 3, characterized in that, The verification flag indicates whether the pre-calculation verification flag and the post-calculation verification flag are consistent. Determining whether the output result of the second calculation unit processing the AI calculation is correct based on the verification result includes: If at least one of the verification flag bits in the verification result indicates that the pre-calculation verification flag bit and the post-calculation verification flag bit are inconsistent, then the output result is incorrect.
5. The method according to claim 1 or 2, characterized in that, The first processing layer is a convolutional layer or a fully connected layer.
6. The method according to claim 1 or 2, characterized in that, When the output result is determined to be incorrect, the state of the second calculation unit includes transient failure and permanent failure.
7. The method according to claim 6, characterized in that, When the output result is determined to be incorrect, the method further includes: The self-test library is used to determine whether the state of the second computing unit is transient or permanent failure.
8. The method according to claim 7, characterized in that, The method further includes: When the second computing unit is permanently failed, the failure status of the second computing unit is reported.
9. The method according to claim 7 or 8, characterized in that, The step of determining whether the state of the second computing unit is a transient failure or a permanent failure by running the self-test library includes: When the running self-test library shows no faults, the second computing unit is in a transient failure state. When the self-test library's running result indicates a fault, the second computing unit becomes permanently disabled.
10. The method according to claim 7 or 8, characterized in that, The method further includes: When the state of the second computing unit is transient failure, the output result is discarded.
11. A verification device for AI calculations, characterized in that, Applied to a first computing unit, the device includes: A transceiver unit is used to acquire parameters of an AI model processed by a second computing unit for AI computation, wherein the AI model includes one or more first processing layers; The following verification process is performed on each of the one or more first processing layers to obtain the verification flag bit of each of the one or more first processing layers: The transceiver unit is further configured to obtain input data of the first processing layer from the second computing unit; The processing unit is used to perform verification processing on the first processing layer based on the parameters of the AI model and the input data of the first processing layer to obtain the verification flag bit of the first processing layer, wherein the computational amount of the verification processing of the first processing layer is less than the computational amount of the second calculation unit processing the input data through the first processing layer. The processing unit is further configured to determine whether the output result of the AI calculation processed by the second calculation unit is correct based on the verification result, wherein the verification result includes the verification flag bit of each of the one or more first processing layers; Wherein, the first computing unit is at least one of the computing units in an AI chip, a computing unit in a CPU chip, or a computing unit in a GPU chip, and the second computing unit is a computing unit in an AI chip.
12. The apparatus according to claim 11, characterized in that, The AI model further includes one or more second processing layers, and the processing unit is further configured to: Redundancy check is performed on each of the one or more second processing layers to obtain the check flag bit of each of the one or more second processing layers; The verification result also includes the verification flag bit of each of the one or more second processing layers.
13. The apparatus according to claim 11 or 12, characterized in that, The parameters of the AI model include a weight matrix, and the input data of the first processing layer includes a feature map matrix. The processing unit is specifically used for: Obtain the first verification flag bit, which is obtained by performing a first verification calculation on the weight matrix; Obtain the second verification flag bit, which is obtained by performing a second verification calculation on the feature map matrix; Obtain the pre-calculation check flag bit based on the first check flag bit and the second check flag bit; The output matrix is obtained from the second computing unit, and the output matrix is calculated by the second computing unit on the weight matrix and the feature map matrix in the first processing layer; Perform a third check calculation on the output matrix to obtain the post-calculated check flag. The verification flag bit is obtained based on the pre-calculation verification flag bit and the post-calculation verification flag bit.
14. The apparatus according to claim 13, characterized in that, The verification flag indicates whether the pre-calculation verification flag and the post-calculation verification flag are consistent. The processing unit is specifically used for: If at least one of the verification flag bits in the verification result indicates that the pre-calculation verification flag bit and the post-calculation verification flag bit are inconsistent, then the output result is incorrect.
15. The apparatus according to claim 11 or 12, characterized in that, The first processing layer is a convolutional layer or a fully connected layer.
16. The apparatus according to claim 11 or 12, characterized in that, When the output result is determined to be incorrect, the state of the second calculation unit includes transient failure and permanent failure.
17. The apparatus according to claim 16, characterized in that, When the output result is determined to be incorrect, the processing unit is further configured to: The self-test library is used to determine whether the state of the second computing unit is transient or permanent failure.
18. The apparatus according to claim 17, characterized in that, The transceiver unit is also used for: When the second computing unit is permanently failed, the failure status of the second computing unit is reported.
19. The apparatus according to claim 17 or 18, characterized in that, When the running self-test library shows no faults, the second computing unit is in a transient failure state. When the self-test library's running result indicates a fault, the second computing unit becomes permanently disabled.
20. The apparatus according to claim 17 or 18, characterized in that, The device is also used for: When the state of the second computing unit is transient failure, the output result is discarded.
21. A chip, characterized in that, It includes a first computing unit, which is used to perform the method as described in any one of claims 1 to 10.
22. The chip according to claim 21, characterized in that, The chip also includes a second computing unit, which is used to perform AI calculations.
23. A computer-readable medium, characterized in that, The computer-readable medium stores program code that, when run on a computer, causes the computer to perform the method as described in any one of claims 1 to 10.
24. A computing device, characterized in that, The computing device includes a first computing unit and a second computing unit, the second computing unit being used to process AI calculations based on an AI model, and the first computing unit performing a verification of the second computing unit using the method described in any one of claims 1 to 10.
25. The computing device according to claim 24, characterized in that, The processing power of the first computing unit is less than or equal to the processing power of the second computing unit.
26. The computing device according to claim 24 or 25, characterized in that, The first computing unit is at least one of the computing units in an AI chip, a computing unit in a central processing unit (CPU) chip, or a computing unit in a graphics processing unit (GPU) chip, and the second computing unit is a computing unit in an AI chip.