A memory exception detection method and related device

By dividing the memory anomaly detection process of the AI ​​model into two stages—the whole network and a single operator—and using the anomaly detection system to obtain the memory access and allocation information of the operator, the problems of memory exhaustion and complex localization are solved, achieving efficient and lightweight memory anomaly detection.

CN122309200APending Publication Date: 2026-06-30HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2024-12-28
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies require a large amount of additional memory for memory anomaly detection in AI models, which can lead to memory exhaustion, affecting model training or inference, and the localization process is complex and time-consuming.

Method used

The memory anomaly detection process is divided into two stages: identifying abnormal operators in the whole network and detecting anomalies in a single operator. The anomaly detection system obtains the actual memory access and memory allocation information of the operators, thereby reducing the memory pressure on the bounding operators in the whole network and reducing the risk of overall memory exhaustion.

Benefits of technology

It enables timely detection and handling of abnormal operators without increasing additional memory usage, reducing the risk of memory exhaustion, simplifying the localization process, and improving detection efficiency and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309200A_ABST
    Figure CN122309200A_ABST
Patent Text Reader

Abstract

A memory anomaly detection method includes: sending a startup command to an artificial intelligence (AI) application runtime system to launch the AI ​​application; during the AI ​​application's operation, identifying anomaly operators based on the actual memory access information and memory allocation information of at least one operator among multiple operators; detecting the memory access behavior of the anomaly operators based on their input / output information during the AI ​​application's operation; and obtaining detection results. This method divides the memory anomaly detection process into two stages: delimiting operators and single-operator detection, transforming the entire network problem into a single-operator problem. The entire detection process requires no compiler or hardware support, is independent of source code, and is applicable to scenarios with only binary code, exhibiting high availability. Furthermore, this method transforms the memory pressure of performing memory anomaly detection on the entire network into the memory pressure of the delimiting operator stage within the entire network, minimizing the impact of the detection process on the overall network's memory.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence (AI) technology, and in particular to a memory anomaly detection method, anomaly detection system, computing device cluster, computing card, computer-readable storage medium, and computer program product. Background Technology

[0002] With the rapid development of AI technology, new AI models (especially large AI models) or AI networks are constantly emerging. AI models, represented by large AI models, can usually be built using AI frameworks. An AI framework is a set of standard interfaces, feature libraries, and toolkits for designing, training, and validating AI models, integrating algorithm encapsulation, data retrieval, and the ability to utilize computing resources. To improve the performance of AI models, high-performance operators can be introduced on top of AI frameworks during AI model development. For example, a large operator formed by fusing multiple independent "small operators" can be called a fusion operator.

[0003] While introducing fusion operators improves performance, it can also introduce functional issues, such as computational errors, leading to a decrease in the accuracy of AI models. Related research indicates that over 50% of accuracy problems are caused by operators, and over 40% of operator-induced accuracy problems are due to memory issues. When AI models exhibit accuracy anomalies, multi-domain collaboration is required for anomaly localization, a complex and time-consuming process.

[0004] Related technologies offer memory anomaly detection solutions, but these solutions require significant additional memory and may even lead to memory exhaustion. AI model training and inference also utilize as much memory as possible to improve concurrency. Enabling memory anomaly detection can reduce the memory used for AI model training and inference, thus impacting the learning process. Therefore, minimizing the memory usage introduced by memory anomaly detection has become a key concern. Summary of the Invention

[0005] This application provides a memory anomaly detection method that divides the memory anomaly detection process into two stages: identifying anomalous operators in the AI ​​model and detecting single-operator anomalies, thereby transforming the entire network problem into a single-operator problem. The memory pressure of performing memory anomaly detection on the entire network is converted into the memory pressure of the stage where anomalous operators are identified within the entire network. By ensuring that the memory usage of the stage where anomalous operators are identified within the entire network is kept lightweight, the risk of memory exhaustion in the entire process can be significantly reduced. This application also provides an anomaly detection system, computing device cluster, computing card, computer-readable storage medium, and computer program product corresponding to the above method.

[0006] Firstly, this application provides a memory anomaly detection method. This method is applied to an anomaly detection system. The anomaly detection system is used to detect memory anomalies, such as out-of-bounds access or other memory anomalies. Furthermore, the anomaly detection system can also be used for memory anomaly localization. The anomaly detection system can be standalone software with memory anomaly detection functionality, or it can be integrated into other software as a plugin, component, applet, or functional module. The anomaly detection system can also be a computing device cluster or computing card with anomaly detection capabilities, which executes the memory anomaly detection method of this application during runtime.

[0007] The anomaly detection system can collaborate with the AI ​​application runtime system to detect memory anomalies. The AI ​​application runtime system can be the system that runs the AI ​​application. Similar to the anomaly detection system, the AI ​​application runtime system can include software systems, such as training platforms, inference platforms, or AI platforms that integrate training and inference. In some examples, the AI ​​application runtime system can also include hardware systems, such as computing cards or computing device clusters used to train AI models, or computing cards or computing device clusters used for inference using AI models.

[0008] Specifically, the anomaly detection system sends a startup command to the AI ​​application's runtime system to launch the AI ​​application. The AI ​​application is built upon an AI model, which includes multiple operators. During the AI ​​application's operation, the anomaly detection system identifies anomalous operators based on the actual memory access information and memory allocation information of at least one of the operators. The actual memory access information includes the address range of the actually accessed memory space, and the memory allocation information includes the address range of the allocated memory space. Then, the anomaly detection system can detect the memory access behavior of the anomalous operators based on their input and output information during the AI ​​application's operation, thus obtaining the detection results.

[0009] This method divides the memory anomaly detection process into two stages: identifying anomalous operators in the AI ​​model (also known as network-wide bounding operators) and single-operator anomaly detection. Single-operator anomaly detection can be used to detect the memory access behavior of anomalous operators, for example, by using multiple anomaly detection algorithms to detect other memory anomalies, obtaining more detailed detection results. This transforms the network-wide problem into a single-operator problem. The memory pressure of network-wide memory anomaly detection is converted into the memory pressure of the network-wide bounding operator stage. Simply ensuring the lightweight memory usage of the network-wide bounding operator stage can significantly reduce the risk of memory exhaustion for the entire process. If there is no additional working memory usage in the network-wide bounding operator stage, it can even guarantee that the entire process will not run out of memory.

[0010] In some possible implementations, the anomaly detection system can obtain memory allocation information for at least one of multiple operators and provide this information to the AI ​​application runtime system. The AI ​​application runtime system then determines the anomaly detection system based on the memory allocation information of the at least one operator and the actual memory access information during the operation of that operator. Accordingly, the anomaly detection system can obtain anomalous operators from the AI ​​application runtime system.

[0011] In this method, the anomaly detection system can provide the memory allocation information of the obtained operators to the AI ​​application runtime system, which then collaborates with the anomaly detection system to detect abnormal operators. In this way, the AI ​​application runtime system can detect abnormal operators in real time during the operation of operators, thus enabling timely anomaly handling.

[0012] In some possible implementations, the anomaly detection system can transmit memory allocation information and out-of-bounds information (such as out-of-bounds identifiers) through shared storage. Specifically, the anomaly detection system can write the memory allocation information of at least one operator into the shared storage, enabling the AI ​​application runtime system to retrieve the memory allocation information from the shared storage. Correspondingly, the anomaly detection system can read the out-of-bounds identifiers of the at least one operator written by the AI ​​application runtime system in the shared storage, and then determine the anomaly operator based on the out-of-bounds identifiers of the at least one operator.

[0013] In this method, the anomaly detection system and the AI ​​application running system transmit memory allocation information and out-of-bounds information through shared storage, thereby realizing operator delimitation. The whole method is easy to implement and has high availability.

[0014] In some possible implementations, the anomaly detection system can receive anomaly operators returned by the AI ​​application runtime system through an application programming interface (API). Alternatively, the anomaly detection system can receive error messages or interruption messages reported by the AI ​​application runtime system. The error messages or interruption messages are used to indicate the anomaly operators.

[0015] In this method, the anomaly detection system obtains anomaly operators from the AI ​​application runtime system through APIs or error / interruption information, without consuming a large amount of resources and has high availability.

[0016] In some possible implementations, the anomaly detection system can obtain memory allocation information of at least one of multiple operators, obtain actual memory access information of at least one operator during the operation of the AI ​​application running system, and determine the abnormal operator based on the memory allocation information and the actual memory access information.

[0017] In this method, the anomaly detection system detects anomalies based on the memory allocation information and actual memory access information of the operators, thus avoiding the impact of anomaly detection on the AI ​​application's operating system. For example, it avoids anomaly detection from consuming a large amount of resources of the AI ​​application's operating system, thereby affecting the operation of the AI ​​application.

[0018] In some possible implementations, the anomaly detection system can read the actual memory access information of at least one operator written by the AI ​​application runtime system during operation in the shared storage. In this method, the anomaly detection system and the AI ​​application runtime system transmit the actual memory access information through the shared storage, thereby enabling the anomaly detection system to perform operator delimitation. The entire method is easy to implement and has high availability.

[0019] In some possible implementations, when the AI ​​application starts successfully, the kernel function of at least one operator of the AI ​​model runs in the AI ​​application runtime system. The actual memory access information of at least one operator is provided by the kernel function of at least one operator.

[0020] This method provides actual memory access information for at least one operator through a kernel function, enabling real-time out-of-bounds detection during operator execution and timely detection of abnormal operators.

[0021] In some possible implementations, the kernel function can provide actual memory access information through static instrumentation. Specifically, the kernel function's source code includes detection code used to detect abnormal operators or retrieve the actual memory access information of at least one operator to detect abnormal operators. Thus, after the kernel function is compiled and executed on the computing card, the detection code can be executed to detect abnormal operators or retrieve the actual memory access information of at least one operator to detect abnormal operators.

[0022] In some possible implementations, the kernel function can provide actual memory access information through dynamic instrumentation. Specifically, the anomaly detection system detects that the kernel function of at least one operator of the AI ​​model has been scheduled from the AI ​​application runtime system, and integrates the detection function into the kernel function through dynamic instrumentation. The detection function is used to detect anomalous operators or provide actual memory access information for at least one operator to detect anomalous operators.

[0023] This allows for flexible activation of the detection function, meeting diverse business needs and offering high flexibility.

[0024] In some possible implementations, the AI ​​application runtime system includes a computing card containing the kernel function of at least one operator for running the AI ​​model. The actual memory access information of the at least one operator includes the address range accessed by the kernel function of the at least one operator when it issues a memory access instruction. The computing card stores detection code used to obtain the actual memory access information of the at least one operator in response to the memory access instruction issued by the kernel function of the at least one operator, and to detect abnormal operators based on or provide the actual memory access information of the at least one operator to detect abnormal operators.

[0025] Alternatively, the detection code can be embedded in the computing card as firmware. The computing card can trigger abnormal operator detection for memory access instructions initiated by the kernel function, or provide actual memory access information for detecting abnormal operators. This enables real-time detection of abnormal operators and timely handling of anomalies.

[0026] In some possible implementations, the anomaly detection system can also deduplicate the anomaly operators based on their input and output information during the AI ​​application's operation, obtaining deduplicated anomaly operators. Correspondingly, the anomaly detection system can detect the memory access behavior of the deduplicated anomaly operators and obtain the detection results.

[0027] This method can further reduce the number of operators in the next stage of single-operator anomaly detection by deduplicating the anomaly operators, thereby reducing the computational pressure in the next stage and improving the efficiency of memory anomaly detection.

[0028] In some possible implementations, the AI ​​application runtime system includes a first computing card and a second computing card. Accordingly, the anomaly detection system can also verify the anomaly operators by reconstructing the execution results of the use cases on the first and second computing cards based on the input-output information of the anomaly operators during the AI ​​application's runtime, thus obtaining the verified anomaly operators. Furthermore, the anomaly detection system can detect the memory access behavior of the verified anomaly operators based on the input-output information of the verified anomaly operators during the AI ​​application's runtime, thereby obtaining detection results.

[0029] This method verifies the abnormal operators determined based on memory allocation information and actual memory access information to obtain a list of abnormal operators. This can remove misidentified abnormal operators and improve the accuracy of detecting abnormal operators.

[0030] In some possible implementations, the input and output information of the anomaly operator during the operation of the AI ​​application includes input and output descriptions or full input and output information. Using input and output descriptions enables efficient memory anomaly detection, while using full input and output information enables comprehensive memory anomaly detection.

[0031] Secondly, this application provides an anomaly detection system. The anomaly detection system includes:

[0032] An operator delimitation subsystem is used to send a startup command to the AI ​​application runtime system. The startup command is used to start the AI ​​application, which is built based on an AI model. The AI ​​model includes multiple operators. During the operation of the AI ​​application, abnormal operators are determined based on the actual memory access information and memory allocation information of at least one of the multiple operators. The actual memory access information includes the address range of the memory space actually accessed, and the memory allocation information includes the address range of the allocated memory space.

[0033] The operator detection subsystem is used to detect the memory access behavior of the abnormal operators based on the input and output information of the abnormal operators during the operation of the AI ​​application, and obtain the detection results.

[0034] In some possible implementations, the operator delimiting subsystem is specifically used for:

[0035] Obtain the memory allocation information of at least one of the plurality of operators;

[0036] Provide the AI ​​application runtime system with memory allocation information for at least one operator;

[0037] Anomalies are obtained from the AI ​​application running system. The anomalies are determined by the AI ​​application running system based on the memory allocation information of the at least one operator and the actual memory access information during the operation of the at least one operator.

[0038] In some possible implementations, the operator delimiting subsystem is specifically used for:

[0039] The memory allocation information of the at least one operator is written into the shared storage so that the AI ​​application running system can obtain the memory allocation information from the shared storage.

[0040] Read the out-of-bounds identifier of at least one operator written by the AI ​​application runtime system in the shared storage;

[0041] The abnormal operator is determined based on the out-of-bounds identifier of the at least one operator.

[0042] In some possible implementations, the operator delimiting subsystem is specifically used for:

[0043] Receive the exception operator returned by the AI ​​application runtime system through the application programming interface (API); or,

[0044] The system receives error messages or interruption messages reported by the AI ​​application's operating system. These error messages or interruption messages are used to indicate abnormal operators.

[0045] In some possible implementations, the operator delimiting subsystem is specifically used for:

[0046] Obtain the memory allocation information of at least one of the plurality of operators;

[0047] Obtain the actual memory access information of the at least one operator during the operation of the AI ​​application running system;

[0048] The exception operator is determined based on the memory allocation information and the actual memory access information.

[0049] In some possible implementations, the operator delimiting subsystem is specifically used for:

[0050] Read the actual memory access information of the at least one operator written by the AI ​​application running system during the operation of the shared storage.

[0051] In some possible implementations, when the AI ​​application starts successfully, the AI ​​application runtime system runs the kernel function of at least one operator of the AI ​​model, and the actual memory access information of the at least one operator is provided by the kernel function of the at least one operator.

[0052] In some possible implementations, the source code of the kernel function includes detection code for detecting anomalous operators or providing actual memory access information of the at least one operator to detect anomalous operators.

[0053] In some possible implementations, the operator delimiting subsystem is also used for:

[0054] The AI ​​application runtime system detects that at least one operator's kernel function of the AI ​​model is scheduled, and a detection function is dynamically instrumented into the kernel function. The detection function is used to detect abnormal operators or provide actual memory access information of the at least one operator to detect abnormal operators.

[0055] In some possible implementations, the AI ​​application running system includes a computing card for running the kernel function of at least one operator of the AI ​​model. The actual memory access information of the at least one operator includes the address range accessed by the kernel function of the at least one operator initiating a memory access instruction to the memory space. The computing card stores detection code, which is used to obtain the actual memory access information of the at least one operator in response to the memory access instruction initiated by the kernel function of the at least one operator, and to detect abnormal operators based on the actual memory access information of the at least one operator or to provide the actual memory access information of the at least one operator to detect abnormal operators.

[0056] In some possible implementations, the operator delimiting subsystem is also used for:

[0057] Based on the input and output information of the anomaly operator during the operation of the AI ​​application, the anomaly operator is deduplicated to obtain the deduplicated anomaly operator;

[0058] The operator detection subsystem is specifically used for:

[0059] The memory access behavior of the abnormal operators after deduplication is detected, and the detection results are obtained.

[0060] In some possible implementations, the AI ​​application running system includes a first computing card and a second computing card, and the operator delimiting subsystem is further used for:

[0061] The execution results of the reconstructed use cases on the first computing card and the second computing card are obtained based on the input and output information of the anomaly operator during the operation of the AI ​​application. The anomaly operator is then verified to obtain the verified anomaly operator.

[0062] The operator detection subsystem is specifically used for:

[0063] Based on the input and output information of the verified anomaly operator during the operation of the AI ​​application, the memory access behavior of the anomaly operator is detected, and the detection result is obtained.

[0064] Thirdly, this application provides a computing device cluster. The computing device cluster includes at least one computing device, and the at least one computing device includes a first computing card, a second computing card, and at least one memory. The first computing card, the second computing card, and the at least one memory communicate with each other. The first computing card or the second computing card is used to execute instructions stored in the at least one memory to cause the computing device cluster to perform the memory anomaly detection method as described in the first aspect or any implementation thereof.

[0065] Fourthly, this application provides a computing card. The computing card can be a neural processing unit (NPU), a graphics processing unit (GPU), or a tensor processing unit (TPU). The computing card includes a computing core and memory. The computing core is a module in the computing card used to implement computing capabilities. The computing cores of different types of computing cards can have different structures. For example, the computing core of an NPU can be an AI core, which includes matrix computation units, vector computation units, and scalar computation units designed for neural networks. The computing core of a GPU includes multiple stream processors, which are used for parallel processing of data computation tasks. The memory of the computing card is also called device memory or video memory. The computing core is used to execute computer-readable instructions loaded into the memory to perform the memory anomaly detection method as described in the first aspect of this application or any implementation thereof.

[0066] Fifthly, this application provides a computer-readable storage medium storing instructions that instruct a computing device cluster or computing card to execute the memory anomaly detection method described in the first aspect or any implementation thereof.

[0067] In a sixth aspect, this application provides a computer program product containing instructions that, when run on a computing device or a cluster of computing devices, causes the cluster of computing devices or a computing card to execute the memory anomaly detection method described in the first aspect or any implementation thereof.

[0068] Based on the implementation methods provided in the above aspects, this application can be further combined to provide more implementation methods. Attached Figure Description

[0069] To more clearly illustrate the technical methods of this application, the accompanying drawings used will be briefly described below.

[0070] Figures 1A to 1C A schematic diagram illustrating the deployment method of the anomaly detection system provided in this application;

[0071] Figure 2 A flowchart of a memory anomaly detection method provided in this application;

[0072] Figure 3 A schematic diagram illustrating the merging of address ranges of accessed memory space provided in this application;

[0073] Figure 4 A flowchart for full memory detection on a single card is provided for this application;

[0074] Figure 5 An interactive flowchart of a network-wide delimitation operator is provided in this application;

[0075] Figure 6 A flowchart illustrating a network-wide delimitation operator provided in this application;

[0076] Figure 7 An interactive flowchart of another network-wide delimitation operator provided in this application;

[0077] Figure 8 A flowchart illustrating a network-wide delimitation operator provided in this application;

[0078] Figure 9 A schematic diagram of a dynamic pile insertion method provided in this application;

[0079] Figure 10 A schematic diagram showing a memory anomaly detection algorithm embedded in a computing card, as provided in this application;

[0080] Figure 11 A flowchart of a memory anomaly detection method provided in this application;

[0081] Figure 12 A schematic diagram of the structure of a computing device provided in this application;

[0082] Figure 13 This application provides a schematic diagram of the structure of a computing device cluster;

[0083] Figure 14 This is a schematic diagram of the structure of a computing card provided in this application. Detailed Implementation

[0084] The terms "first" and "second" used in the embodiments of this application are for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Therefore, a feature defined with "first" and "second" may explicitly or implicitly include one or more of that feature.

[0085] First, some technical terms involved in the embodiments of this application will be introduced.

[0086] Artificial intelligence (AI) models are mathematical models with reasoning capabilities built using techniques such as machine learning (ML) and deep learning. AI models can infer output data based on input data. For example, in image processing, an AI model can be an image classification model; the input data can be the image to be classified, and the output data can be the classification result. As another example, in natural language processing (NLP), an AI model can be an intelligent question-answering model, including but not limited to large language models (LLMs); the input data can be the question text, and the output data can be the response text.

[0087] An operator is a computational unit in an AI model, representing a mapping from one function space to another. Broadly speaking, any operation performed on a function can be considered an operator. In the field of AI, an operator can be any computational function involved in an AI model. For example, the convolution function in a convolution layer can be considered an operator, i.e., a convolution operator. Similarly, the function used to sum the weights in a fully-connected layer (FC layer) can also be considered an operator, i.e., a fully connected operator.

[0088] The definition of an operator can include its inputs, outputs, parameters, and constraints on those parameters. The implementation of an operator includes how it is implemented on a specific device, typically represented by a kernel function. For computation operators, the kernel function is also called a computation kernel. A kernel function can be code that runs on a device to perform a specific task. This device can be a computing card, including but not limited to a central processing unit (CPU), graphics processing unit (GPU), neural network processing unit (NPU), and tensor processing unit (TPU). An operator can include one or more kernels; for example, an operator can include kernels executing on different computing cards.

[0089] To improve the performance of AI models, high-performance fusion operators can be introduced. A fusion operator is an operator formed by combining multiple independent operators. On one hand, fusion operators can improve performance through the parallelization of multiple small operators; on the other hand, combining multiple independent operators into a single operator can reduce kernel launch overhead, further improving performance. While fusion operators improve performance, they can also introduce functional problems, such as computational errors, leading to a decrease in the accuracy of the AI ​​model. Related research shows that more than 50% of accuracy problems are caused by operators, and more than 40% of operator-induced accuracy problems are caused by memory issues. When an AI model (e.g., an AI network, also known as the entire network) experiences accuracy anomalies, multiple domains need to collaborate on anomaly localization, a complex and time-consuming process.

[0090] Related technologies provide several memory anomaly detection tools and methods based on these tools. These tools detect memory anomalies by tracking data; however, they often require a large amount of memory. In some cases, enabling memory anomaly detection can lead to memory exhaustion. Consequently, AI applications may struggle to obtain sufficient memory for training or inference of AI models, potentially causing a crash.

[0091] In view of this, this application provides a memory anomaly detection method. This method can be executed by an anomaly detection system. The anomaly detection system is used to detect memory anomalies, such as detecting out-of-bounds errors or other memory anomalies. Furthermore, the anomaly detection system can also be used for memory anomaly localization, such as locating the line of code that caused the memory anomaly. The anomaly detection system can be standalone software with memory anomaly detection capabilities, or it can be integrated into other software as a plugin, component, app, or functional module. For example, the anomaly detection system can be integrated into a training platform or inference platform as a plugin. The above software can be provided to customers as a client software package for self-deployment. Alternatively, the above software can also be provided to users as a cloud service. The cloud platform can open the cloud service's application programming interface (API), and users who subscribe to the cloud service can call the API to use the memory anomaly detection capability. It should be noted that some or all of the above software's capabilities can also be embedded in hardware, such as as firmware embedded in a chip or computing device cluster. The chip can be a computing card such as an NPU, GPU, or TPU. It should also be noted that the above products can be combined and provided to users. In some possible implementations, the anomaly detection system can be a cluster of computing devices or a computing card with anomaly detection capabilities, and the memory anomaly detection method of this application is executed when the computing device cluster or computing card is running.

[0092] The anomaly detection system can collaborate with the AI ​​application runtime system to detect memory anomalies. The AI ​​application runtime system can be any system running an AI application. Similar to the anomaly detection system, the AI ​​application runtime system can include software systems, such as training platforms, inference platforms, or AI platforms that integrate training and inference. In some examples, the AI ​​application runtime system may also include hardware systems, such as computing cards or computing device clusters used for training AI models, or computing cards or computing device clusters used for inference using AI models. It should be noted that computing cards used for training are also called training cards, and clusters built based on training cards are called training clusters; similarly, computing cards used for inference are also called inference cards, and clusters built based on inference cards are called inference clusters.

[0093] In practice, a startup command is sent to launch the AI ​​application. Specifically, the anomaly detection system sends the startup command to the AI ​​application's runtime system. The AI ​​application can be built based on an AI model, including but not limited to model training applications or model inference applications. The AI ​​model includes multiple operators. During the operation of the AI ​​application, the anomaly detection system identifies the anomalous operator based on the actual memory access information and memory allocation information of at least one of the multiple operators. The actual memory access information includes the address range of the actually accessed memory space, and the memory allocation information includes the address range of the allocated memory space. Then, the anomaly detection system can detect the memory access behavior of the anomalous operator based on its input and output information during application runtime, and obtain the detection result.

[0094] This method divides the memory anomaly detection process into two stages: identifying anomalous operators in the AI ​​model (also known as bounding operators in the entire network) and single-operator anomaly detection. Single-operator anomaly detection can be used to detect the memory access behavior of anomalous operators, for example, by detecting other memory anomalies through multiple anomaly detection algorithms to obtain more detailed detection results. This transforms the entire network problem into a single-operator problem. The memory pressure of performing memory anomaly detection on the entire network is converted into the memory pressure of the bounding operator stage within the entire network. Simply ensuring the lightweight memory usage of the bounding operator stage can significantly reduce the risk of memory exhaustion in the entire process. If there is no additional working memory used in the bounding operator stage, it can even be guaranteed that there is no risk of memory exhaustion for the entire process.

[0095] To make the technical solution of this application clearer and easier to understand, several illustrative deployment methods of the anomaly detection system are introduced below. When the anomaly detection system and the AI ​​application runtime system are software systems, they can be deployed in the same process to share hardware resources, achieve efficient data interaction, and reduce resource consumption caused by cross-process communication and frequent process context switching. This improves resource utilization. Alternatively, the anomaly detection system and the AI ​​application runtime system can be deployed in different processes, such as running as independent processes on different computing devices (e.g., servers) or in different process spaces on the same server. These are described in detail below with reference to the accompanying drawings.

[0096] See Figures 1A to 1C The diagram illustrates the deployment of an anomaly detection system. The anomaly detection system 10 can detect AI models that generate anomalies, i.e., anomaly models. In some examples, the AI ​​model can be deployed by the AI ​​application runtime system within a computing device cluster, which may include multiple computing devices, such as multiple servers. In other examples, the AI ​​model can be deployed by the AI ​​application runtime system on a server.

[0097] For cluster scenarios, the anomaly detection system 10 can be deployed on a separate server or on a shared server with the object being detected. Figure 1A The example illustrates how anomaly detection system 10 can be deployed on a separate server. Figure 1A As shown, the computing device cluster includes multiple servers, such as server 1 to server N, and each server includes M computing cards, specifically computing cards 1 to M. Here, M and N are positive integers greater than 1. The AI ​​application runtime system can deploy AI applications built based on AI models on the aforementioned computing device cluster. The AI ​​model can be trained or inferred within the computing device cluster. The anomaly detection system 10 can be deployed on a server independent of server 1 to server N. When the AI ​​model generates an anomaly, such as an accuracy anomaly, the anomaly detection system 10 deployed on the independent server can send a start command to the AI ​​application runtime system to start the AI ​​application on the computing device cluster (e.g., server 1 to N). During the AI ​​application's operation, based on the actual memory access information and memory allocation information of at least one operator among multiple operators, the abnormal operator is identified. Then, based on the input and output information of the abnormal operator during application operation, the memory access behavior of the abnormal operator is detected, and the detection result is obtained. In this way, multiple technologies can be combined to solve problems in stages, achieving lightweight localization capabilities for memory problems and increasing the detection success rate in cluster scenarios.

[0098] In server scenarios, the server can include multiple AI applications. The anomaly detection system 10 can be deployed on the same computing card as the AI ​​applications, or on different computing cards. For example, the anomaly detection system 10 can be deployed on the same or different CPUs as the AI ​​applications. Alternatively, the anomaly detection system 10 can be deployed on the same or different NPUs as the AI ​​applications. Figure 1B The example illustrates this by deploying the anomaly detection system 10 and the AI ​​application on the same computing card. Specifically, the AI ​​application runtime system can be deployed locally on a server, and the AI ​​application can be deployed on computing card M. Correspondingly, both the anomaly detection system 10 and the AI ​​application can be deployed on computing card M. In this case, the anomaly detection system 10 can indirectly enable its detection capabilities through the AI ​​application; for example, the anomaly detection system 10 can control the activation and deactivation of its detection capabilities through the AI ​​application. In some examples, the anomaly detection system 10 can also directly enable its detection capabilities. For instance, the anomaly detection system 10 can be embedded in hardware, such as a chip embedded in a computing card, thereby directly enabling its detection capabilities.

[0099] The following example illustrates the deployment of out-of-bounds detection capabilities during the operator bounding stage of the entire network onto a computing card. For example... Figure 1CAs shown, the detection code used to detect abnormal operators, such as the out-of-bounds detection algorithm, can be deployed on the computing card and used in conjunction with an external detection switch. The detection switch can be enabled via API in the AI ​​application or enabled by default. The detection switch is an optional configuration item; the anomaly detection system 10 can also choose not to set a detection switch and enable out-of-bounds detection capability by default. It should be noted that the anomaly detection system 10 can also control the AI ​​application, injecting out-of-bounds detection algorithms and other anomaly detection algorithms into the AI ​​application. The AI ​​application passes these algorithms to the kernel function (denoted as kernel) of the operator running on the computing card, enabling the program running on the computing card (such as the kernel) to have out-of-bounds detection capability. The kernel, also known as the operator kernel or core, is the kernel function implemented by the operator, typically a code block that executes specific computational tasks at the underlying level.

[0100] The above Figures 1A to 1C The deployment of the anomaly detection system 10 is illustrated using the software system 10 as an example. In other possible implementations of this application, the anomaly detection system 10 can also be deployed in other ways. Furthermore, the anomaly detection system 10 and the AI ​​application runtime system can also be hardware systems, for example in... Figure 1A In this context, the anomaly detection system 10 may also include servers independent of server 1 to N. For example, in... Figure 1B In this context, the anomaly detection system may include a server.

[0101] based on Figures 1A to 1C The present application also provides a memory anomaly detection method in addition to the anomaly detection system 10 shown. The memory anomaly detection method of this application will be described in detail below with reference to the accompanying drawings.

[0102] See Figure 2 The flowchart shown illustrates a memory anomaly detection method. This method can be executed by an anomaly detection system 10. The anomaly detection system 10 divides the network-wide memory anomaly detection into two stages: Stage 1 is used for operator delimitation across the entire network, and Stage 2 is used for single operator detection, thereby achieving memory anomaly detection. The method specifically includes the following steps:

[0103] S202, the anomaly detection system 10 sends a start command to the AI ​​application running system.

[0104] The startup command is used to launch an AI application. AI applications can be built upon AI models, such as model inference applications or model training applications. An AI model includes multiple operators. An operator is a computational unit within an AI model, typically corresponding to a network layer. For example, an AI model based on a convolutional neural network (CNN) may include convolution operators, pooling operators, and fully connected operators. Here, convolution operators correspond to the convolutional layers of the AI ​​model, pooling operators correspond to the pooling layers, and fully connected operators correspond to the fully connected layers.

[0105] An AI application runtime system can be a system that runs AI applications, such as a training platform for training AI models, an inference platform for using AI models for inference, or an AI platform that integrates training and inference. Upon receiving a startup command, the AI ​​application runtime platform can execute the AI ​​application's code, thereby launching the AI ​​application. During the operation of the AI ​​application, it can train AI models or use AI models for inference.

[0106] In practice, the anomaly detection system 10 can send a startup command to the AI ​​application runtime platform when an AI model generates an anomaly. Specifically, when the anomaly detection system 10 detects an abnormal model (such as an AI model that generates an accuracy error), it can execute the code of the AI ​​application corresponding to the abnormal model, thereby starting the AI ​​application.

[0107] S204. During the operation of the AI ​​application, the anomaly detection system 10 determines the abnormal operator based on the actual memory access information of at least one operator and the memory allocation information of at least one operator.

[0108] Exceptional operators can include those that cause out-of-bounds access. In this application, out-of-bounds access can occur when the device uses memory space that has not been allocated on the host side. If an operator (e.g., the operator's kernel) actually accesses a memory space on the device side whose address range exceeds the address range of the allocated memory space, then the operator can be considered to have caused an out-of-bounds access and is thus an exception operator.

[0109] During the operation of an AI application, the host-side service is responsible for requesting and allocating memory for the operator's execution. For out-of-bounds detection, the host side records memory allocation information. This information includes the address range of the allocated memory space, such as the address range of memory space allocated by the host side of the AI ​​application for the operator. The address range can be represented by a start address and length, or by a start address and an end address. In some examples, the memory allocation information may include the start and end addresses of a series of pointers. On the device side, the operator or its kernel performs numerous memory movement operations during execution. The operator or its kernel can record actual memory access information, including the address range of the actually accessed memory space. The address range of the actually accessed memory space can be a set of address ranges accessed by multiple memory movement operations. Considering that some memory movement operations access consecutive address ranges, this application also supports merging consecutive address ranges. Figure 3 As shown, 7 address ranges are used during the operator execution process, and only one merged address range needs to be recorded after the operator is completed.

[0110] During the operation of an AI application, for at least one of multiple operators, the anomaly detection system 10 can compare the actual memory access information with the memory allocation information to detect whether an operator out-of-bounds behavior has occurred and identify the abnormal operator. Taking one operator among multiple operators as an example, the anomaly detection system 10 can compare the starting address of the actual access with the starting address of the memory allocation, and compare the ending address of the actual access with the ending address of the memory allocation. If the starting address of the actual access is less than the starting address of the memory allocation, or the ending address of the actual access is greater than the ending address of the memory allocation, then an operator out-of-bounds behavior has occurred, and the operator is an abnormal operator.

[0111] In some possible implementations, the anomaly detection system 10 can identify situations where operators influence each other. When such influence exists, it indicates that the operator is abnormal. Specifically, the anomaly detection system 10 can compare the actual memory access information of one operator with the memory allocation information of other running operators to determine whether the memory space actually accessed by one operator overlaps with the memory space allocated by other running operators. If they overlap, it indicates that there is mutual influence between the operators, and the anomaly detection system 10 determines that the operator accessing the memory space allocated by other operators has committed operator out-of-bounds behavior. The anomaly detection system 10 can identify the operator accessing the memory space allocated by other operators as an abnormal operator.

[0112] In specific implementation, the anomaly detection system 10 can obtain memory allocation information for at least one of multiple operators and provide this information to the AI ​​application runtime system. The AI ​​application runtime system contains the kernel of at least one operator running an AI model. The anomaly detection system 10 can provide the memory allocation information of at least one operator to the kernel of the at least one operator. The AI ​​application runtime system can determine abnormal operators based on the memory allocation information of the at least one operator and the actual memory access information during the operation of the at least one operator. Taking the kernel of one operator as an example, this kernel obtains the memory allocation information of the operator from the anomaly detection system 10, compares the memory allocation information of the operator with the actual memory access information of the operator during operation, and thus determines whether the operator has exceeded the bounds. If it exceeds the bounds, it indicates that the operator is an abnormal operator. Then, the anomaly detection system 10 can obtain the abnormal operator from the AI ​​application runtime system.

[0113] The anomaly detection system 10 can provide memory allocation information for at least one operator to the AI ​​application runtime system (e.g., the kernel of an operator running within the AI ​​application runtime system) in various ways. For example, the anomaly detection system 10 can write the memory allocation information of at least one operator to shared memory, enabling the AI ​​application runtime system (e.g., the kernel of an operator running within the AI ​​application runtime system) to retrieve the memory allocation information from the shared memory. Shared memory can include, but is not limited to, shared memory, registers, and pipes. A pipe is an inter-process communication (IPC) mechanism used to directly transmit data or signals between at least two processes. Another example is that the anomaly detection system 10 can pass the memory allocation information of at least one operator to the AI ​​application runtime system via an API; specifically, it can pass the memory allocation information of each operator to the kernel of the corresponding operator within the AI ​​application runtime system via an API.

[0114] When shared storage is used, the AI ​​application runtime system can write at least one operator's out-of-bounds identifier into the shared storage. For example, the kernel running in the AI ​​application runtime system determines that an operator is out of bounds based on the operator's memory allocation information and actual memory access information, and can write the operator's out-of-bounds identifier into the shared storage. Correspondingly, the anomaly detection system 10 can read the out-of-bounds identifier of at least one operator written by the AI ​​application runtime system in the shared storage, and determine the abnormal operator based on the out-of-bounds identifier of at least one operator.

[0115] When using an API, the anomaly detection system 10 can receive anomaly operators returned by the AI ​​application runtime system via the API. It should be noted that when the AI ​​application runtime system determines that an operator has exceeded its bounds, it can also directly throw an interruption or error message. The anomaly detection system 10 can also receive error messages or interruption messages reported by the AI ​​application runtime system. These error messages or interruption messages are used to indicate the anomaly operator.

[0116] In some other possible implementations, the anomaly detection system 10 can obtain memory allocation information of at least one of the multiple operators, and obtain actual memory access information of at least one operator during operation from the AI ​​application runtime system. For example, the anomaly detection system 10 can obtain the actual memory access information of each operator during operation from the kernel of each operator of the AI ​​model running in the AI ​​application runtime system, and then determine the abnormal operator based on the memory allocation information and the actual memory access information.

[0117] The anomaly detection system 10 can obtain the actual memory access information of at least one operator during the operation of the AI ​​application runtime system in various ways. For example, the AI ​​application runtime system can write the actual memory access information of at least one operator during the operation of the AI ​​application runtime system to the shared storage, and the anomaly detection system 10 can read the actual memory access information of at least one operator written by the AI ​​application runtime system during the operation of the AI ​​application runtime system from the shared storage.

[0118] S206, Anomaly Detection System 10 records the input and output information of the operator during the operation of the AI ​​application.

[0119] Input / output information refers to information related to the input / output (IO) behavior of the operator during the operation of the AI ​​application. In some possible implementations, input / output information may include input / output descriptions, which may include at least one of IO type, IO address or IO shape, IO size, and IO time. IO type may include read or write. IO address may include the IO start address. Further, the IO address may also include the IO end address. IO shape describes the pattern of data flow between the IO device and the computer system. The data flow pattern may include frequent, small-batch bursts of data transfer, or stable, large-volume block transfers. IO time may include at least one of the IO start timestamp, IO completion timestamp, or IO duration. In other possible implementations, input / output information may include full information about the input and output. Full information may include the input / output description and the data structure and attributes of the input / output data. The data attributes of the input / output data may include at least one of value range, digest, and hash value. Both input / output descriptions and full information about the input and output are used to reproduce the operator behavior in Phase 2.

[0120] In practice, the anomaly detection system 10 can utilize the logging functionality provided by programming languages ​​to record the input and output information of operators during the operation of the AI ​​application. For example, the anomaly detection system 10 can use Python's logging module to record the input and output information of operators during application operation.

[0121] S208, the anomaly detection system 10 performs deduplication on the anomaly operators based on the input and output information of the anomaly operators during the operation of the AI ​​application, and obtains the deduplicated anomaly operators.

[0122] Specifically, the AI ​​model may include repeated operators, such as repeated convolution operators and repeated matrix multiplication operators. The anomaly detection system 10 can deduplicate the anomaly operators, so that in stage 2, only the deduplicated anomaly operators can be detected, reducing the number of operators detected in stage 2 and reducing the pressure on stage 2 to detect and locate anomalies in individual operators.

[0123] The anomaly detection system 10 can extract key features of anomaly operators based on their input and output information during the AI ​​application's operation. These key features may include the operator's function, the data type and format of the input and output data, and the operator's parameter settings. The anomaly detection system 10 can then identify duplicate operators through feature matching. For each group of duplicate operators, the anomaly detection system 10 can retain one operator from the group and remove the other duplicate operators. The remaining operator in each group of duplicate operators is the deduplicated anomaly operator.

[0124] Feature matching is only one implementation of operator deduplication. In other possible implementations of this application, the anomaly detection system 10 can also deduplicatize anomalous operators in other ways. For example, the anomaly detection system 10 can analyze the input and output information of anomalous operators during the operation of the AI ​​application to obtain the call relationships and execution paths between operators, and deduplicatize anomalous operators based on the call relationships and execution paths. Alternatively, the anomaly detection system 10 can also calculate hash values ​​based on the binary code of the operators, and then deduplicatize anomalous operators based on the hash values.

[0125] It should be noted that S208 described above is an optional step in the embodiments of this application, and the memory anomaly detection method of this application may also omit S208. For example, the anomaly detection system 10 can directly detect the memory access behavior of the anomaly operator and obtain the detection result.

[0126] S210 and the anomaly detection system 10 detect the memory access behavior of the deduplicated abnormal operators and obtain the detection results.

[0127] Taking one operator from the deduplicated exception operators as an example, the exception detection system 10 can construct test cases for the operator based on its input and output information, and then perform a full detection of the operator's memory access behavior based on the test cases to obtain the detection results. The full detection can involve using multiple memory exception detection algorithms to detect various memory exceptions. In practical applications, the exception detection system 10 can also use one or more memory exception detection algorithms to detect the corresponding type of memory exception.

[0128] It should be noted that in cluster or server scenarios, for full memory detection of a single operator, the anomaly detection system 10 can use the resources of multiple computing cards for parallel scheduling. For example, it can use the resources of all computing cards in the cluster or server for parallel scheduling to improve overall efficiency.

[0129] For ease of description, this application uses full memory testing on a single card as an example. Full memory testing on a cluster or server is merely a difference in system integration; the full memory testing on a single card can be referenced.

[0130] See Figure 4 The flowchart shown illustrates a full memory detection process on a single card. This full memory detection can be performed by the operator detection subsystem 104 in the anomaly detection system 10. The operator detection subsystem 104 performs full memory detection on each anomaly operator output by the operator delimitation subsystem 102. The operator detection subsystem 104 includes a unified information perception module 1042 and an anomaly detection module 1044. Furthermore, the operator detection subsystem 104 may also include a result display module 1046.

[0131] Specifically, during the execution of use cases for exception operators, the unified information perception module 1042 obtains the operation information of the AI ​​framework's memory operations from the host. This operation information can include the original structure of the exception operator's parameters on the AI ​​framework side. This original structure can be converted into a structure suitable for processing on the device side (e.g., a computing card). For example, when the exception operator includes 100 parameters, the original structure can be represented by a matrix of [2, 100], where each row of the matrix represents the starting address and length of a parameter. The original structure can be converted at runtime to obtain the converted structure. The converted structure can include the starting address of the memory allocated for the parameters and the computation specification. The computation specification can be the number of computation rounds or the number of iterations. For example, the computation specification can be 10 rounds of a for loop, with each round iterating 10 times. The unified information perception module 1042 can also record the mapping relationship between the AI ​​framework's API and the exception operators. The unified information perception module 1042 is also used to obtain the operation information of the memory operations of at least one exception operator from the computing card. The memory operation information of at least one exception operator may include the structure of the exception operator's parameters after runtime transformation, such as the starting address and calculation specifications of the parameters.

[0132] In this system, the AI ​​framework code runs on the host side, while the exception operator code runs on the device side. Device memory can be allocated and released by the AI ​​framework code and used by the exception operator's kernel. Therefore, stubs can be set in both the AI ​​framework code and the exception operator code (such as the kernel). These stubs monitor memory operation information (memory usage, etc.) on both sides, thereby achieving exception detection. A stub, or simply a "stub," is a program segment used to implement a specific function. Stubs can simulate the behavior of existing programs or serve as a temporary replacement for code to be developed. Stubs can include software stubs or hardware stubs. Figure 4 This is illustrated using a software stub as an example.

[0133] The anomaly detection module 1044 performs anomaly detection on the abnormal operators based on the memory operation information of the AI ​​framework and the memory operation information of at least one abnormal operator, and obtains anomaly detection results. For example, the anomaly detection module 1044 summarizes the memory operation information of the AI ​​framework and the memory operation information of the abnormal operators, and inputs the summarized memory operation information into an operator detection tool for detection to obtain anomaly detection results. Furthermore, the anomaly detection module 1044 can also perform anomaly detection on the AI ​​framework based on the memory operation information of the AI ​​framework to obtain anomaly detection results. During anomaly detection, the anomaly detection module 1044 can call at least one detection algorithm, such as a memory out-of-bounds detection algorithm and a contention detection algorithm, to detect memory out-of-bounds anomalies, memory contention problems, uninitialized memory problems, or memory synchronization problems.

[0134] The results display module 1046 can display anomaly detection results. This module supports multiple display methods. One method is to output the anomaly API to a log file; another is to present the anomaly API to the user (this method can also be called screen display); and yet another method is to provide the anomaly API to the integrated application (or upper-level integrated module) via the results display API.

[0135] In some possible implementations, the operator detection subsystem 104 may further include an operation reconstruction module 1043. The operation reconstruction module 1043 is used to extract key information from the information obtained from the unified information perception module 1042 and reconstruct the operation based on the extracted key information. Specifically, the operation reconstruction module 1043 can process or reduce the data obtained from the unified information perception module 1042 for different detection tasks to obtain key information. The key information may be related to the detection task or anomaly detection algorithm; different detection tasks or different anomaly detection algorithms may require different information. For example, memory detection only concerns memory; therefore, the operation reconstruction module 1043 can extract the memory allocated to the operator and then reconstruct the operation information of the memory operation based on the memory allocated to the operator. The reconstructed operation information of the memory operation can be used for anomaly detection, for example, as input to an operator detection tool to perform anomaly detection for heterogeneous operators, thereby identifying anomalies in the process from the AI ​​framework to the heterogeneous operator kernel or anomalies within the heterogeneous operator kernel itself. Figure 4 In the example, the operator detection subsystem 104 achieves collaborative detection of anomalies by uniformly summarizing and processing the operation information of AI framework-side memory operations and operator-side memory operations.

[0136] The anomaly detection system 10 can add debugging information to the code of the test cases after constructing the test cases for the anomaly operator, during the compilation process. This debugging information is used to locate the line of code causing the anomaly. Alternatively, the anomaly detection system 10 can recompile the kernel so that the code line causing the anomaly can be located when detecting the memory access behavior of the operator. Accordingly, the detection results can include the line of code causing the anomaly. It should be noted that the anomaly detection system 10 can also use dynamic instrumentation technology to directly detect based on binary code (without debugging information), and the detection results may not include the line of code causing the anomaly. The anomaly detection system 10 can also report the detection results of the cluster / single machine to the maintenance platform, promptly notifying users of the anomaly, or directly using historical version fix code (e.g., patches for the anomaly operator) to achieve timely repair.

[0137] The above S208 to S210 are illustrative implementations of this application for detecting the memory access behavior of abnormal operators based on the input and output information of abnormal operators during the operation of AI applications and obtaining detection results. In actual applications, the memory access behavior of abnormal operators can also be detected by other means.

[0138] Based on the above description, the memory anomaly detection method provided in this application divides the memory anomaly detection process into two stages. Stage 1 is used to identify anomalous operators in the AI ​​model, and Stage 2 is used for single-operator anomaly detection. This transforms the entire network problem into a single-operator problem, converting the memory pressure of memory anomaly detection for the entire network into the memory pressure of the bounding operator stage within the entire network. Utilizing lightweight memory technology, lightweight memory usage or no additional working memory is achieved, minimizing the impact of the detection process on the overall network memory. Furthermore, this method does not require compiler or hardware cooperation during bounds detection, is independent of source code, and is applicable to scenarios with only binary code, exhibiting high availability.

[0139] To address the issue of boundary operators in the entire network during Phase 1, this application provides multiple implementation methods. The implementation methods for boundary operators in the entire network can be divided into two categories. One category involves the AI ​​application runtime system acquiring memory allocation information and actual memory access information, detecting whether operators have exceeded their boundaries based on this information, and returning the boundary detection result to the anomaly detection system 10. It should be noted that when the AI ​​application starts successfully, the actual memory access information of at least one operator running in the AI ​​application runtime system can be provided by the kernel of at least one operator. Based on this, the AI ​​application runtime system acquiring memory allocation information and actual memory access information can be achieved by the kernel of each operator in the AI ​​application runtime system acquiring their respective memory allocation information and actual memory access information. The other category involves the anomaly detection system 10 acquiring memory allocation information and actual memory access information from the AI ​​application runtime system, and detecting whether operators have exceeded their boundaries based on this information. Specifically, the anomaly detection system 10 acquiring actual memory access information from the AI ​​application runtime system can be achieved by acquiring the actual memory access information from the kernel of the operator running in the AI ​​application runtime system. The two implementation methods described above will be explained in detail below with reference to the accompanying drawings.

[0140] First, see Figure 5 The diagram shown illustrates the interaction flowchart of a delimiting operator in a network, applied to an anomaly detection system 10. The anomaly detection system 10 includes an operator delimiting subsystem 102 and an operator detection subsystem 104. The specific implementation of the operator detection subsystem 104 can be found in [reference needed]. Figure 4 The description of the relevant content of the illustrated embodiment focuses on the operator delimiting subsystem 102. The operator delimiting subsystem 102 includes an information collection module 1022 and a data analysis module 1024. Further, the operator delimiting subsystem 102 may also include a result display module 1026. The process of delimiting operators in the entire network may include the following steps:

[0141] S502, during AI application runtime, provides kernel memory allocation information to operator delimitation subsystem 102.

[0142] Specifically, when the AI ​​application on the host side is running, it can allocate memory for the kernel of the operators in the AI ​​model. This memory can be device memory. The memory allocation information can include the address range of the memory space allocated to the kernel, which includes a start address and an end address, or the address range includes a start address and a length.

[0143] Specifically, AI applications can provide kernel memory allocation information to the information collection module 1022 in the operator delimitation subsystem 102 in various ways. For example, the AI ​​application can use the memory allocation information as an interface parameter to provide the memory allocation information to the information collection module 1022 in the operator delimitation subsystem 102. Another example is that the AI ​​application can write the kernel memory allocation information to the shared storage of the application and the anomaly detection system 10, such as shared memory or registers accessible to both the AI ​​application and the anomaly detection system 10. The anomaly detection system 10 then obtains the kernel memory allocation information from the shared memory or registers.

[0144] S504, operator delimiting subsystem 102 requests additional memory and writes the kernel's memory allocation information into the additional memory.

[0145] The additional memory refers to the device memory of the computing card. This additional memory is accessible to both the kernel and the operator delimiting subsystem 102, and is shared memory between the kernel and the anomaly detection system 10 (e.g., the operator delimiting subsystem 102 within the anomaly detection system 10). For ease of understanding, this application also provides an example.

[0146] See Figure 6 The diagram illustrates a process flow for a delimiting operator in a network. The anomaly detection system 10 or the operator delimiting subsystem 102 detects the kernel of the operator and sends it to the task queue (stream) of the computing card. Additional memory, denoted as space X, is allocated for each stream. The AI ​​application on the host side records the kernel's memory allocation information and writes it into space X. The AI ​​application can also transmit the kernel's memory allocation information to the anomaly detection system 10, where the operator delimiting subsystem 102 writes the kernel's memory allocation information to a specified location in space X.

[0147] It should be noted that shared memory is only one specific implementation of shared storage. In other possible implementations of this application, the operator delimiting subsystem 102 can also write the kernel's memory allocation information into other types of shared storage. For example, the operator delimiting subsystem 102 can also write the kernel's memory allocation information into registers or other hardware information storage units.

[0148] S506: During kernel runtime, based on actual memory access information and memory allocation information in additional memory, it analyzes whether memory access is out of bounds. If memory access is out of bounds, S508 is executed.

[0149] In this application, the kernel supports a memory anomaly detection algorithm; based on this, see [link to relevant documentation]. Figure 6During kernel runtime, memory allocation information can be read from additional memory such as space X. By comparing the actual memory access information during kernel runtime with the aforementioned memory allocation information, it can determine whether an out-of-bounds error has occurred, thereby identifying out-of-bounds exception operators.

[0150] S508 and kernel write out-of-bounds identifiers in extra memory.

[0151] S510, the operator delimiting subsystem 102 reads the out-of-bounds identifier in the extra memory and obtains the exception operator.

[0152] Exception operators include out-of-bounds identifiers indicating out-of-bounds operations. The operator delimiting subsystem 102 can read out-of-bounds identifiers from additional memory to identify exception operators. For example, an out-of-bounds identifier of 1 or TRUE indicates an out-of-bounds operation. The operator delimiting subsystem 102 reads out-of-bounds identifiers of 1 or TRUE and identifies the corresponding operators as exception operators. Furthermore, the operator delimiting subsystem 102 can also display exception operators. For example, the data analysis module 1024 in the operator delimiting subsystem 102 can read out-of-bounds identifiers from additional memory to obtain exception operators, and the result display module 1026 in the operator delimiting subsystem 102 can display the exception operators.

[0153] Figure 5 The illustrated embodiment utilizes shared memory to transmit memory allocation information and out-of-bounds identifiers. Specifically, the anomaly detection system 10 writes the memory allocation information of at least one operator into the shared storage. When the kernel of at least one operator analyzes the memory allocation information and actual memory access information and finds that at least one operator has exceeded the bounds, it writes an out-of-bounds identifier. Then, the anomaly detection system 10 reads the out-of-bounds identifiers of at least one operator in the shared storage to obtain the abnormal operator among multiple operators.

[0154] In other possible implementations of this application's embodiments, the AI ​​application, kernel, and anomaly detection system 10 can also transmit memory allocation information via APIs, etc., and determine the anomaly operator by transmitting out-of-bounds analysis results or throwing error messages or interrupt messages through the API. For example, the operator delimitation subsystem 102 can receive the out-of-bounds analysis results returned by the kernel through the API and determine the anomaly operator based on the out-of-bounds analysis results. As another example, the operator delimitation subsystem 102 can receive error messages or interrupt messages reported by the kernel when it analyzes an out-of-bounds situation and determine the anomaly operator based on the error messages or interrupt messages.

[0155] Next, see Figure 7 The diagram shows another interactive flowchart of the delimiting operator in the whole network, applied to the anomaly detection system 10. The structure of the anomaly detection system 10 can be referred to Figure 4 , Figure 5The relevant descriptions of the illustrated embodiments will not be repeated here. The process of delimiting the network in the whole network may include the following steps:

[0156] S702, during AI application runtime, provides kernel memory allocation information to operator delimitation subsystem 102.

[0157] S704, Operator Bounding Subsystem 102 requests additional memory.

[0158] For specific implementations of S702 to S704 mentioned above, please refer to Figure 5 The relevant content is described in the embodiments.

[0159] S706 kernel writes the actual memory access information to additional memory during runtime.

[0160] In this application, the kernel supports a memory anomaly detection algorithm. Based on this, the kernel writes the actual memory access information to additional memory during runtime so that the operator delimiting subsystem 102 in the anomaly detection system 10 can read the actual memory access information from the additional memory and analyze whether the memory access information and memory allocation information are out of bounds.

[0161] S708, the operator delimiting subsystem 102 analyzes whether the memory is out of bounds based on the kernel's memory allocation information and the actual memory access information in the extra memory.

[0162] For ease of understanding, this application also provides an example.

[0163] See Figure 8 The diagram illustrates a process flow for a delimiting operator in a network. An anomaly detection system 10 initiates an AI application. The anomaly detection system 10 (e.g., the operator delimiting subsystem 102 within the anomaly detection system 10) senses the kernel's task queue (stream) sent to the computing card, allocating additional memory for each stream, denoted as space X. The host-side AI application records the kernel's memory allocation information. The device-side kernel monitors memory write operations in the common space, updating the start and end addresses of the memory space accessed by the kernel in space X, i.e., the kernel's actual memory access information. The operator delimiting subsystem 102 can obtain the kernel's actual memory access information from space X, as well as the memory allocation information recorded by the host-side AI application. By comparing the kernel's memory allocation information with the kernel's actual memory access information, it identifies out-of-bounds operators.

[0164] The kernel can support memory anomaly detection (such as out-of-bounds detection) in several ways. These will be explained in detail below.

[0165] The first implementation method is static instrumentation. Specifically, static instrumentation refers to including code branches in the kernel source code for detecting anomalous operators. In other words, the kernel source code includes detection code. The detection code is used to detect anomalous operators or to provide actual memory access information for at least one operator to detect anomalous operators. For example, the detection code can provide the anomaly detection system 10 with actual memory access information for at least one operator to collaboratively detect anomalous operators.

[0166] The second implementation method is dynamic instrumentation. Specifically, the anomaly detection system 10 can detect from the AI ​​application runtime system that the kernel of at least one operator of the AI ​​model has been scheduled, and integrate the detection function into the kernel through dynamic instrumentation. The detection function is used to detect abnormal operators or provide the actual memory access information of the at least one operator to detect abnormal operators. See also Figure 9 The diagram illustrates a dynamic instrumentation method. When the kernel is deployed or transmitted to the computing card, the anomaly detection system 10 can inject the memory anomaly detection algorithm into the kernel by changing the behavior of the operator deployment, such as interface hijacking or creating a code detection branch, thereby achieving the fusion of the detection function in the memory anomaly detection algorithm with the kernel.

[0167] The third implementation method involves embedding the code within the computing card. Specifically, the AI ​​application runtime system includes a computing card housing the kernel for running at least one operator of the AI ​​model. The actual memory access information of the at least one operator includes the address range accessed by the kernel of the at least one operator when it issues a memory access instruction to the memory space. The computing card stores detection code. This detection code can be the code for a memory anomaly detection algorithm, including but not limited to out-of-bounds detection algorithms. The detection code can be embedded in the computing card as firmware. The detection code is used to obtain the actual memory access information of the at least one operator based on the memory access instruction issued by the kernel of the at least one operator, and to detect abnormal operators based on or by providing the actual memory access information of the at least one operator to detect abnormal operators. Figure 10 As shown, the memory anomaly detection algorithm is embedded in the computing card. When the kernel runs on the computing card, the computing card executes the detection code to perform additional calculations on the kernel's memory access instructions, in order to work with the anomaly detection system 10 to determine the anomaly operator. The memory access instructions can be memory access-related instructions, including but not limited to Direct Memory Access (DMA) instructions and memory transfer instructions (load / store).

[0168] Furthermore, the anomaly detection system 10 can also verify the anomaly operators determined based on memory allocation information and actual memory access information to obtain a list of anomaly operators. This can remove misidentified anomaly operators and improve the accuracy of the delimited operators in stage 1. The AI ​​application running system includes a first computing card and a second computing card. The first computing card can be a CPU, and the second computing card can be an NPU, GPU, or TPU (xPU). The anomaly detection system 10 can verify the anomaly operators by using the execution results (e.g., CPU execution results) of the reconstructed use cases based on the input and output information of the anomaly operators during the AI ​​application's operation, along with the execution results (e.g., NPU execution results) on the first and second computing cards. The verified anomaly operators can be those that were previously identified as anomalies. Correspondingly, the anomaly detection system 10 can detect the memory access behavior of the verified anomaly operators based on their input and output information during the AI ​​application's operation, and obtain detection results. The following detailed description is provided in conjunction with an embodiment.

[0169] See Figure 11 The flowchart shown is a memory anomaly detection method. Figure 2 Based on the illustrated embodiment, the method may further include the following steps:

[0170] S1102, the anomaly detection system 10 distributes the input and output information of the anomaly operator to be verified during the operation of the AI ​​application to different NPUs.

[0171] The anomaly operator to be verified can be an out-of-bounds operator determined based on memory allocation information and actual memory access information. The anomaly detection system 10 can distribute the input and output information of the anomaly operator to be verified during the operation of the AI ​​application to different computing cards according to a random strategy or a load balancing strategy.

[0172] S1104, the anomaly detection system 10 obtains the execution results of the regenerated test cases in the NPU and the host-side CPU based on the input and output information. If the difference between the execution results of the regenerated test cases in the NPU and the execution results in the host-side CPU based on the input and output information of the target operator does not meet the standard requirements, S1106 is executed.

[0173] Specifically, after the anomaly detection system 10 sends the input and output information of the anomaly operator to be verified to the NPU, each NPU can regenerate the test cases of the anomaly operator based on the input and output information using a reconstruction algorithm. These test cases can then be executed on both the NPU and the host-side CPU to reproduce the behavior of the anomaly operator. Correspondingly, the anomaly detection system 10 can obtain the execution results of the test cases in the NPU and the CPU.

[0174] The execution result of the test case in the CPU can be used as a reference. The anomaly detection system 10 can measure the accuracy of the operator's execution in the NPU by the difference between the execution result of the test case in the NPU and the execution result of the test case in the CPU. In some possible implementations, the difference in execution results can be characterized by a difference or a ratio. Accordingly, the standard requirement can be that the absolute value of the difference is less than a threshold, or the ratio is greater than a first threshold and less than a second threshold.

[0175] If the absolute value of the difference between the execution results of the test cases generated for the target operator in the NPU and the host is greater than or equal to a threshold, the ratio of the execution results of the test cases generated for the target operator in the NPU and the host is less than or equal to a first threshold, or the ratio of the execution results of the test cases generated for the target operator in the NPU and the host is greater than or equal to a second threshold, it indicates that the difference between the execution results of the test cases generated for the target operator in the NPU and the host does not meet the standard requirements, the accuracy of the target operator in the NPU execution is low, and the target operator is verified as an abnormal operator.

[0176] In some possible implementations, the anomaly detection system 10 can compare the execution results of multiple use cases in the NPU and the execution results in the host-side CPU in parallel to improve the efficiency of anomaly operator verification.

[0177] It should be noted that, Figure 11 The embodiment uses the anomaly detection system 10 to verify whether the operator is abnormal based on the execution results of the use case in the NPU and the execution results of the use case in the host-side CPU. In actual applications, the AI ​​application runtime system can also perform anomaly operator verification based on the execution results of the use case in the NPU and the execution results of the use case in the host-side CPU. This application does not limit this.

[0178] S1106, The anomaly detection system 10 adds the target operator to the anomaly operator list.

[0179] The above S1104 to S1106 are a specific implementation of the anomaly detection system 10 determining the list of anomaly operators based on the execution results of the test cases in the NPU and the execution results of the test cases in the CPU. In actual applications, the anomaly detection system 10 can also verify whether an out-of-bounds operator is an anomaly operator through other methods.

[0180] Accordingly, the anomaly detection system 10 can detect the memory access behavior of the verified anomaly operators during the operation of the AI ​​application, and obtain detection results. For example, the anomaly detection system 10 can detect the memory access behavior of the anomaly operators in the anomaly operator list and obtain detection results.

[0181] Furthermore, for the anomaly operators selected in Phase 1, the anomaly detection system 10 can also detect the APIs associated with the anomaly operators during the non-masking communication of the AI ​​model, thereby improving the comprehensiveness and efficiency of anomaly detection.

[0182] It should be noted that the embodiments of the memory anomaly detection method provided in this application can also be combined or partially combined as needed, and this application does not impose any restrictions on this.

[0183] Based on the aforementioned memory anomaly detection method, this application also provides an anomaly detection system 10. The structure of the anomaly detection system 10 will be described below with reference to the accompanying drawings.

[0184] See Figure 4 or Figure 5 The diagram shows the structure of an anomaly detection system. The anomaly detection system 10 includes:

[0185] The operator delimiting subsystem 102 is used to send a startup command to the AI ​​application running system. The startup command is used to start the AI ​​application, which is built based on an AI model. The AI ​​model includes multiple operators. During the operation of the AI ​​application, abnormal operators are determined based on the actual memory access information and memory allocation information of at least one of the multiple operators. The actual memory access information includes the address range of the memory space actually accessed, and the memory allocation information includes the address range of the allocated memory space.

[0186] The operator detection subsystem 104 is used to detect the memory access behavior of the abnormal operator based on the input and output information of the abnormal operator during the operation of the AI ​​application, and obtain the detection result.

[0187] For example, the operator delimiting subsystem 102 and the operator detection subsystem 104 described above can be implemented in hardware or in software.

[0188] When implemented in software, the operator delimitation subsystem 102 and the operator detection subsystem 104 can be applications running on computing devices, such as computing engines. These applications can also be virtualized and provided to users as virtualization services. Virtualization services can include virtual machine (VM) services, bare metal server (BMS) services, or container services. Specifically, a VM service can be a service that uses virtualization technology to create a pool of virtual machine (VM) resources on multiple physical hosts, providing VMs for users to use on demand. A BMS service is a service that uses virtualization technology to create a pool of BMS resources on multiple physical hosts, providing BMS for users to use on demand. A container service is a service that uses virtualization technology to create a pool of container resources on multiple physical hosts, providing containers for users to use on demand. A VM is a simulated virtual computer, that is, a logical computer. A BMS is a scalable, high-performance computing service with computing performance indistinguishable from traditional physical machines, featuring secure physical isolation. A container is a kernel virtualization technology that provides lightweight virtualization to isolate user space, processes, and resources. It should be understood that the VM service, BMS service, and container service mentioned above are merely specific examples. In practical applications, virtualization services can also include other lightweight or heavyweight virtualization services, which are not specifically limited here.

[0189] When implemented in hardware, the operator delimiting subsystem 102 and the operator detection subsystem 104 may include at least one computing card, such as an NPU, and may also include at least one computing device, such as a server. Alternatively, the operator delimiting subsystem 102 and the operator detection subsystem 104 may be implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.

[0190] In some possible implementations, the operator delimiting subsystem 102 is specifically used for:

[0191] Obtain the memory allocation information of at least one of the plurality of operators;

[0192] Provide the AI ​​application runtime system with memory allocation information for at least one operator;

[0193] Anomalies are obtained from the AI ​​application running system. The anomalies are determined by the AI ​​application running system based on the memory allocation information of the at least one operator and the actual memory access information during the operation of the at least one operator.

[0194] In some possible implementations, the operator delimiting subsystem 102 is specifically used for:

[0195] The memory allocation information of the at least one operator is written into the shared storage so that the AI ​​application running system can obtain the memory allocation information from the shared storage.

[0196] Read the out-of-bounds identifier of at least one operator written by the AI ​​application runtime system in the shared storage;

[0197] The abnormal operator is determined based on the out-of-bounds identifier of the at least one operator.

[0198] In some possible implementations, the operator delimiting subsystem 102 is specifically used for:

[0199] Receive the exception operator returned by the AI ​​application runtime system through the application programming interface (API); or,

[0200] The system receives error messages or interruption messages reported by the AI ​​application's operating system. These error messages or interruption messages are used to indicate abnormal operators.

[0201] In some possible implementations, the operator delimiting subsystem 102 is specifically used for:

[0202] Obtain the memory allocation information of at least one of the plurality of operators;

[0203] Obtain the actual memory access information of the at least one operator during the operation of the AI ​​application running system;

[0204] The exception operator is determined based on the memory allocation information and the actual memory access information.

[0205] In some possible implementations, the operator delimiting subsystem 102 is specifically used for:

[0206] Read the actual memory access information of the at least one operator written by the AI ​​application running system during the operation of the shared storage.

[0207] In some possible implementations, when the AI ​​application starts successfully, the AI ​​application runtime system runs the kernel function of at least one operator of the AI ​​model, and the actual memory access information of the at least one operator is provided by the kernel function of the at least one operator.

[0208] In some possible implementations, the source code of the kernel function includes detection code for detecting anomalous operators or providing actual memory access information of the at least one operator to detect anomalous operators.

[0209] In some possible implementations, the operator delimiting subsystem 102 is further used for:

[0210] The AI ​​application runtime system detects that at least one operator's kernel function of the AI ​​model is scheduled, and a detection function is dynamically instrumented into the kernel function. The detection function is used to detect abnormal operators or provide actual memory access information of the at least one operator to detect abnormal operators.

[0211] In some possible implementations, the AI ​​application running system includes a computing card for running the kernel function of at least one operator of the AI ​​model. The actual memory access information of the at least one operator includes the address range accessed by the kernel function of the at least one operator initiating a memory access instruction to the memory space. The computing card stores detection code, which is used to obtain the actual memory access information of the at least one operator in response to the memory access instruction initiated by the kernel function of the at least one operator, and to detect abnormal operators based on the actual memory access information of the at least one operator or to provide the actual memory access information of the at least one operator to detect abnormal operators.

[0212] In some possible implementations, the operator delimiting subsystem 102 is further used for:

[0213] Based on the input and output information of the anomaly operator during the operation of the AI ​​application, the anomaly operator is deduplicated to obtain the deduplicated anomaly operator;

[0214] The operator detection subsystem 104 is specifically used for:

[0215] The memory access behavior of the abnormal operators after deduplication is detected, and the detection results are obtained.

[0216] In some possible implementations, the AI ​​application running system includes a first computing card and a second computing card, and the operator delimiting subsystem 102 is further used for:

[0217] The execution results of the reconstructed use cases on the first computing card and the second computing card are obtained based on the input and output information of the anomaly operator during the operation of the AI ​​application. The anomaly operator is then verified to obtain the verified anomaly operator.

[0218] The operator detection subsystem 104 is specifically used for:

[0219] Based on the input and output information of the verified anomaly operator during the operation of the AI ​​application, the memory access behavior of the anomaly operator is detected, and the detection result is obtained.

[0220] This application also provides a computing device 1200. For example... Figure 12 As shown, the computing device 1200 includes: a bus 1202, a first computing card 1204, a second computing card 1205, a memory 1206, and a communication interface 1208. The first computing card 1204, the second computing card 1205, the memory 1206, and the communication interface 1208 communicate with each other via the bus 1202. The computing device 1200 can be a server or a terminal device. It should be understood that this application does not limit the number of the first computing card 1204, the second computing card 1205, and the memory 1206 in the computing device 1200.

[0221] Bus 1202 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 12 The bus 1202 may be represented by a single line, but this does not mean that there is only one bus or one type of bus. The bus 1202 may include a path for transmitting information between various components of the computing device 1200 (e.g., memory 1206, first computing card 1204, second computing card 1205, communication interface 1208).

[0222] The first computing card 1204 may include a central processing unit (CPU), and the second computing card 1205 may include any one or more processors such as a graphics processing unit (GPU), a neural network processing unit (NPU), and a tensor processing unit (TPU). The computing device 1200 may include one or more second computing cards 1205. Figure 12 An example including a second computing card 1205 is provided.

[0223] The memory 1206 may include volatile memory, such as random access memory (RAM). The memory 1206 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD), or solid-state drive (SSD). The memory 1206 stores executable program code, which is executed by the first computing card 1204 or the second computing card 1205 to implement the aforementioned memory anomaly detection method. Specifically, the memory 1206 stores instructions for the anomaly detection system 10 to execute the memory anomaly detection method. For example, the memory 1206 may store instructions for implementing the functions of the operator delimitation subsystem 102 and the operator detection subsystem 104.

[0224] The communication interface 1208 uses transceiver modules such as, but not limited to, network interface cards and transceivers to enable communication between the computing device 1200 and other devices or communication networks.

[0225] This application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.

[0226] like Figure 13 As shown, the computing device cluster includes at least one computing device 1200. The memory 1206 of one or more computing devices 1200 in the computing device cluster may store instructions from the same anomaly detection system 10 for executing memory anomaly detection methods.

[0227] In some possible implementations, one or more computing devices 1200 in the computing device cluster can also be used to execute some of the instructions of the exception detection system 10 for executing the memory exception detection method. In other words, a combination of one or more computing devices 1200 can jointly execute the instructions of the exception detection system 10 for executing the memory exception detection method.

[0228] It should be noted that the memory 1206 in different computing devices 1200 in the computing device cluster can store different instructions for executing some functions of the anomaly detection system 10.

[0229] This application also provides a computing card. The computing card can be... Figure 12 The second computing card 1205 in the configuration is, for example, an NPU, GPU, or TPU. The computing card may include computing cores and memory. A computing core is a module within the computing card used to implement computing capabilities. The computing cores of different types of computing cards may differ structurally. For example, the computing core of an NPU may be an AI core, which includes matrix computation units, vector computation units, and scalar computation units designed for neural networks. The computing core of a GPU includes multiple stream processors used for parallel processing of data computation tasks. The memory of the computing card, also known as device memory or video memory, typically includes global memory. The computing core is used to execute computer-readable instructions loaded into the memory to perform the memory anomaly detection method of the aforementioned embodiments.

[0230] To facilitate understanding, the following example uses an NPU (Neural Processing Unit) with accompanying diagrams to illustrate the hardware architecture of the computing card. The NPU can be a single-core or multi-core architecture. For clarity, a single-core architecture will be used as an example.

[0231] See Figure 14 The diagram shows a hardware architecture of a computing card 1400, which includes an AI core 1402 and global memory 1404. The AI ​​core 1402, also known as the AI ​​Core, is the computing core of the computing card 1400 and typically employs a Domain Specific Architecture (DSA) to adapt to common applications and algorithms in a specific domain. Global memory 1404 is used to store input, intermediate, or output data during AI core computation.

[0232] AI Core 1402 is responsible for executing computationally intensive operators related to scalars, vectors, and tensors. AI Core 1402 includes several basic computational units: matrix (Cube) computation units, vector (Vector) computation units, and scalar (Scalar) computation units. These units perform different types of data computations. It should be noted that these different types of computational units form multiple independent execution pipelines; through unified scheduling and mutual cooperation, computational efficiency can be optimized.

[0233] Hardware architectures are categorized into coupled and separated architectures based on whether the matrix computation unit and vector computation unit are deployed on the same core. This application uses a separated architecture as an example. In the separated architecture, the AI ​​core 1402 is split into a matrix computation core 1402A and a vector computation core 1402B. The matrix computation core 1402A is also called AI Cube (AIC), and the vector computation core 1402B is also called AI Vector (AIV). The matrix computation core 1402A and the vector computation core 1402B are independent of each other, each having its own scalar computation unit and capable of independently loading its own code, thus achieving decoupling between matrix computation and vector computation. Figure 14 As shown, data can be transferred between matrix calculation core 1402A and vector calculation core 1402B through global memory 1404.

[0234] AI core 1402 also includes storage units (such as hardware storage and data handling units) and control units. AI core 1402 includes internal and external storage. Global memory 1404 can serve as external storage for AI core 1402, also known as off-core storage. Memory storage can be buffers, including but not limited to L0 buffers, L1 buffers, and Unified Buffers (UB). The L0 buffer can be further divided into L0A, L0B, and L0C. AI core 1402 can load data from external storage into internal storage to complete corresponding computational tasks. It should be noted that in the separate architecture, matrix computation core 1402A adds a bias table buffer (BT buffer) and a fixed pipe buffer (FP buffer) to the existing L0 and L1 buffers. The BT buffer stores the bias of the AI ​​model, and the FP buffer stores quantization parameters and activation parameters (such as ReLU parameters).

[0235] To facilitate data transmission and handling within the AI ​​Core 1402, the AI ​​Core 1402 also includes a Bus Interface Unit (BIU), Memory Transfer Engine 1 (MTE1), Memory Transfer Engine 2 (MTE2), and Memory Transfer Engine 3 (MTE3). The BIU serves as the interface between the AI ​​Core and the bus; the MTEs are data transfer units that handle data transfer between different buffers. Figure 14 (Not shown in the image) is the interface between the AI ​​core 1402 and the bus. MTE is for data transfer, which completes the data transfer between different buffers.

[0236] In the discrete architecture, the matrix computation core 1402A can include 5 parallel execution units (transfer units and computation units) and 7 memory units. The 5 parallel execution units can be MTE1, MTE2, MTE3, and the matrix computation unit. The 7 memory units include the off-core global memory 1404 (off-core memory) and the L1 buffer, L0A, L0B, L0C, BT buffer, and FP buffer. The vector computation core 1402B can include 4 parallel execution units and 2 memory units. The 4 parallel execution units can be MTE2, MTE3, the vector computation unit, and the scalar computation unit. The 2 memory units include the global memory 1404 and the unified buffer.

[0237] The data flow for vector computation can be represented as follows: data is moved from global memory 1404 to a unified buffer; the vector computation unit reads data from the unified buffer to perform vector computation; and the computation result is then moved back to global memory 1404. Therefore, the data flow for vector computation can be represented as GM-UB-[Vector]-UB-GM. Similarly, the data flow for matrix computation can be represented as follows: data is moved from global memory 1404 to the L1 buffer, then from the L1 buffer to L0A / L0B; the matrix computation unit reads data from L0A / L0B to perform matrix computation; the computation result is then moved to L0C, and finally, via a fixed pipe, to either global memory 1404 or the L1 buffer. Therefore, the data flow for matrix computation can be represented as GM-L1-L0A / L0B-[Cube]-L0C-FixPipe-GM, or GM-L1-L0A / L0B-[Cube]-L0C-FixPipe-L1.

[0238] It should be noted that the AI ​​core 1402 may also include a control unit ( Figure 14(Not shown in the diagram). The control unit includes at least one of the following: System Control, Instruction Dispatch, Cube Queue, Vector Queue, and Memory Transformation Queue. The System Control module is responsible for directing and coordinating the overall operation mode of AI Core 1402, configuring parameters, and implementing power consumption control. When instructions are sequentially issued through the Instruction Dispatch module, they will be sent to the Cube Queue, Vector Queue, and Memory Transformation Queue, respectively, depending on the type of instruction. In this way, the Cube computation unit, Vector computation unit, and Memory Transformation Engine can execute corresponding tasks according to the instructions in their respective queues.

[0239] This application also provides a computer-readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store, or a data storage device such as a data center containing one or more available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid-state drive). The computer-readable storage medium includes instructions that instruct a computing card or computing device cluster to execute the memory anomaly detection method described above in the anomaly detection system 10.

[0240] This application also provides a computer program product containing instructions. The computer program product may be software or program products containing instructions, capable of running on a computing card, a computing device cluster, or stored on any available medium. When the computer program product runs on the computing card or at least one computing device, it causes the computing card or at least one computing device to execute the aforementioned memory anomaly detection method.

[0241] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the protection scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting memory anomalies, characterized in that, The method, applied to an anomaly detection system, includes: A startup command is sent to the AI ​​application runtime system. The startup command is used to start the AI ​​application, which is built based on an AI model, and the AI ​​model includes multiple operators. During the operation of the AI ​​application, abnormal operators are determined based on the actual memory access information of at least one of the multiple operators and the memory allocation information of the at least one operator. The actual memory access information includes the address range of the memory space actually accessed, and the memory allocation information includes the address range of the allocated memory space. Based on the input and output information of the abnormal operator during the operation of the AI ​​application, the memory access behavior of the abnormal operator is detected, and the detection result is obtained.

2. The method according to claim 1, characterized in that, During the operation of the AI ​​application, the abnormal operator is determined based on the actual memory access information and memory allocation information of at least one of the plurality of operators, including: Obtain the memory allocation information of at least one of the plurality of operators; Provide the AI ​​application runtime system with memory allocation information for at least one operator; Anomalies are obtained from the AI ​​application running system. The anomalies are determined by the AI ​​application running system based on the memory allocation information of the at least one operator and the actual memory access information during the operation of the at least one operator.

3. The method according to claim 2, characterized in that, Providing the memory allocation information of the at least one operator to the AI ​​application runtime system includes: The memory allocation information of the at least one operator is written into the shared storage so that the AI ​​application running system can obtain the memory allocation information from the shared storage. The step of obtaining the anomaly operator from the AI ​​application operating system includes: Read the out-of-bounds identifier of at least one operator written by the AI ​​application runtime system in the shared storage; The abnormal operator is determined based on the out-of-bounds identifier of the at least one operator.

4. The method according to claim 2, characterized in that, The step of obtaining the anomaly operator from the AI ​​application operating system includes: Receive the exception operator returned by the AI ​​application runtime system through the application programming interface (API); or, The system receives error messages or interruption messages reported by the AI ​​application's operating system. These error messages or interruption messages are used to indicate abnormal operators.

5. The method according to claim 1, characterized in that, During the operation of the AI ​​application, the abnormal operator is determined based on the actual memory access information and memory allocation information of at least one of the plurality of operators, including: Obtain the memory allocation information of at least one of the plurality of operators; Obtain the actual memory access information of the at least one operator during the operation of the AI ​​application running system; The exception operator is determined based on the memory allocation information and the actual memory access information.

6. The method according to claim 5, characterized in that, The step of obtaining the actual memory access information of the at least one operator during operation from the AI ​​application running system includes: Read the actual memory access information of the at least one operator written by the AI ​​application running system during the operation of the shared storage.

7. The method according to any one of claims 1 to 6, characterized in that, When the AI ​​application starts successfully, the kernel function of at least one operator of the AI ​​model runs in the AI ​​application's operating system, and the actual memory access information of the at least one operator is provided by the kernel function of the at least one operator.

8. The method according to claim 7, characterized in that, The source code of the kernel function includes detection code, which is used to detect abnormal operators or provide actual memory access information of the at least one operator to detect abnormal operators.

9. The method according to claim 7, characterized in that, The method further includes: The anomaly detection system detects from the AI ​​application running system that at least one operator's kernel function of the AI ​​model is scheduled, and integrates the detection function into the kernel function through dynamic instrumentation. The detection function is used to detect abnormal operators or provide the actual memory access information of the at least one operator to detect abnormal operators.

10. The method according to any one of claims 1 to 6, characterized in that, The AI ​​application running system includes a computing card for running the kernel function of at least one operator of the AI ​​model. The actual memory access information of the at least one operator includes the address range accessed by the kernel function of the at least one operator initiating a memory access instruction to the memory space. The computing card stores detection code, which is used to obtain the actual memory access information of the at least one operator in response to the memory access instruction initiated by the kernel function of the at least one operator, and to detect abnormal operators based on the actual memory access information of the at least one operator or to provide the actual memory access information of the at least one operator to detect abnormal operators.

11. The method according to any one of claims 1 to 10, characterized in that, The step of detecting the memory access behavior of the abnormal operator based on the input and output information of the abnormal operator during the operation of the AI ​​application, and obtaining the detection result, includes: Based on the input and output information of the anomaly operator during the operation of the AI ​​application, the anomaly operator is deduplicated to obtain the deduplicated anomaly operator; The memory access behavior of the abnormal operators after deduplication is detected, and the detection results are obtained.

12. The method according to any one of claims 1 to 11, characterized in that, The AI ​​application operating system includes a first computing card and a second computing card, and the method further includes: The execution results of the reconstructed use cases on the first computing card and the second computing card are obtained based on the input and output information of the anomaly operator during the operation of the AI ​​application. The anomaly operator is then verified to obtain the verified anomaly operator. The step of detecting the memory access behavior of the abnormal operator based on the input and output information of the abnormal operator during the operation of the AI ​​application, and obtaining the detection result, includes: Based on the input and output information of the verified anomaly operator during the operation of the AI ​​application, the memory access behavior of the anomaly operator is detected, and the detection result is obtained.

13. An anomaly detection system, characterized in that, The anomaly detection system includes: An operator delimitation subsystem is used to send a startup command to the AI ​​application runtime system. The startup command is used to start the AI ​​application, which is built based on an AI model. The AI ​​model includes multiple operators. During the operation of the AI ​​application, abnormal operators are determined based on the actual memory access information and memory allocation information of at least one of the multiple operators. The actual memory access information includes the address range of the memory space actually accessed, and the memory allocation information includes the address range of the allocated memory space. The operator detection subsystem is used to detect the memory access behavior of the abnormal operators based on the input and output information of the abnormal operators during the operation of the AI ​​application, and obtain the detection results.

14. The system according to claim 13, characterized in that, The operator delimiting subsystem is specifically used for: Obtain the memory allocation information of at least one of the plurality of operators; Provide the AI ​​application runtime system with memory allocation information for at least one operator; Anomalies are obtained from the AI ​​application running system. The anomalies are determined by the AI ​​application running system based on the memory allocation information of the at least one operator and the actual memory access information during the operation of the at least one operator.

15. The system according to claim 14, characterized in that, The operator delimiting subsystem is specifically used for: The memory allocation information of the at least one operator is written into the shared storage so that the AI ​​application running system can obtain the memory allocation information from the shared storage. Read the out-of-bounds identifier of at least one operator written by the AI ​​application runtime system in the shared storage; The abnormal operator is determined based on the out-of-bounds identifier of the at least one operator.

16. The system according to claim 14, characterized in that, The operator delimiting subsystem is specifically used for: Receive the exception operator returned by the AI ​​application runtime system through the application programming interface (API); or, The system receives error messages or interruption messages reported by the AI ​​application's operating system. These error messages or interruption messages are used to indicate abnormal operators.

17. The system according to claim 13, characterized in that, The operator delimiting subsystem is specifically used for: Obtain the memory allocation information of at least one of the plurality of operators; Obtain the actual memory access information of the at least one operator during the operation of the AI ​​application running system; The exception operator is determined based on the memory allocation information and the actual memory access information.

18. The method according to claim 17, characterized in that, The operator delimiting subsystem is specifically used for: Read the actual memory access information of the at least one operator written by the AI ​​application running system during the operation of the shared storage.

19. The method according to any one of claims 13 to 18, characterized in that, When the AI ​​application starts successfully, the kernel function of at least one operator of the AI ​​model runs in the AI ​​application's operating system, and the actual memory access information of the at least one operator is provided by the kernel function of the at least one operator.

20. The system according to claim 19, characterized in that, The source code of the kernel function includes detection code, which is used to detect abnormal operators or provide actual memory access information of the at least one operator to detect abnormal operators.

21. The system according to claim 19, characterized in that, The operator delimiting subsystem is also used for: The AI ​​application runtime system detects that at least one operator's kernel function of the AI ​​model is scheduled, and a detection function is dynamically instrumented into the kernel function. The detection function is used to detect abnormal operators or provide actual memory access information of the at least one operator to detect abnormal operators.

22. The system according to any one of claims 13 to 18, characterized in that, The AI ​​application running system includes a computing card for running the kernel function of at least one operator of the AI ​​model. The actual memory access information of the at least one operator includes the address range accessed by the kernel function of the at least one operator initiating a memory access instruction to the memory space. The computing card stores detection code, which is used to obtain the actual memory access information of the at least one operator in response to the memory access instruction initiated by the kernel function of the at least one operator, and to detect abnormal operators based on the actual memory access information of the at least one operator or to provide the actual memory access information of the at least one operator to detect abnormal operators.

23. The system according to any one of claims 13 to 22, characterized in that, The operator delimiting subsystem is also used for: Based on the input and output information of the anomaly operator during the operation of the AI ​​application, the anomaly operator is deduplicated to obtain the deduplicated anomaly operator; The operator detection subsystem is specifically used for: The memory access behavior of the abnormal operators after deduplication is detected, and the detection results are obtained.

24. The system according to any one of claims 13 to 23, characterized in that, The AI ​​application operating system includes a first computing card and a second computing card. The operator delimiting subsystem is also used for: The execution results of the reconstructed use cases on the first computing card and the second computing card are obtained based on the input and output information of the anomaly operator during the operation of the AI ​​application. The anomaly operator is then verified to obtain the verified anomaly operator. The operator detection subsystem is specifically used for: Based on the input and output information of the verified anomaly operator during the operation of the AI ​​application, the memory access behavior of the anomaly operator is detected, and the detection result is obtained.

25. A computing device cluster, characterized in that, The computing device cluster includes at least one computing device, which includes a first computing card, a second computing card, and at least one memory, wherein the at least one memory stores computer-readable instructions; the first computing card or the second computing card executes the computer-readable instructions to cause the computing device cluster to perform the memory anomaly detection method as described in any one of claims 1 to 12.

26. A computing card, characterized in that, The computing card includes a computing core and memory, the computing core being used to execute computer-readable instructions loaded into the memory to perform the memory anomaly detection method as described in any one of claims 1 to 12.

27. A computer-readable storage medium, characterized in that, It includes computer-readable instructions; the computer-readable instructions are used to implement the memory anomaly detection method according to any one of claims 1 to 12.

28. A computer program product, characterized in that, It includes computer-readable instructions; the computer-readable instructions are used to implement the memory anomaly detection method according to any one of claims 1 to 12.