Multi-scale neural network for anomaly detection
By embedding training data into a multi-scale neural network at different spatial scales, extracting features, and training a lightweight DNN, the robustness and accuracy issues of anomaly detection with small data in existing technologies are solved, and efficient anomaly detection is achieved even in the absence of anomaly training data.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INTEL CORP
- Filing Date
- 2025-11-11
- Publication Date
- 2026-06-12
AI Technical Summary
Existing anomaly detection methods require a large number of data examples, lack robustness, and are difficult to effectively calibrate anomaly types in the case of small data, especially when the anomaly data is unknown or unannotated.
A multi-scale neural network (DNN) is employed to extract embedded features by embedding training data at different spatial scales. A lightweight DNN is trained using a multi-resolution, contrastive learning paradigm to generate multi-scale embedded features. The robustness and accuracy of the detection model are improved by adjusting the internal parameters.
It enables effective detection of anomalous data even with small datasets, improving the robustness and accuracy of anomaly detection. It can perform both macroscopic and fine-grained anomaly detection even in the absence of specific anomaly training data.
Smart Images

Figure CN122197978A_ABST
Abstract
Description
Technical Field
[0001] This disclosure generally relates to neural networks (also known as “deep neural networks” or “DNNs”), and more specifically, to multi-scale DNNs for anomaly detection. Background Technology
[0002] Anomaly detection is the process of identifying anomalies (such as data points, items, events, or observations that differ from expectations, standards, or norms). Automated anomaly detection is crucial in industries such as manufacturing, finance, retail, and cybersecurity. It can provide a way to automatically detect harmful anomalies and protect data or products. Many anomaly detection technologies are based on deep learning and artificial intelligence. Summary of the Invention
[0003] According to one aspect of this disclosure, a method for anomaly detection is provided, the method comprising: embedding training data at different spatial scales in the latent space of a neural network model, the neural network model including a plurality of convolutional blocks having the different spatial scales; extracting a plurality of embedding features at the different spatial scales from the plurality of convolutional blocks; determining a loss of the neural network model based on the plurality of embedding features; training the neural network model based on the loss by updating one or more internal parameters of the neural network model; and using at least a portion of the trained neural network model to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies.
[0004] According to another aspect of this disclosure, one or more non-transitory computer-readable media are provided storing instructions executable to perform operations for anomaly detection, the operations comprising: embedding training data at different spatial scales in the latent space of a neural network model, the neural network model including a plurality of convolutional blocks having the different spatial scales; extracting a plurality of embedding features at the different spatial scales from the plurality of convolutional blocks; determining a loss of the neural network model based on the plurality of embedding features; training the neural network model based on the loss by updating one or more internal parameters of the neural network model; and using at least a portion of the trained neural network model to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies.
[0005] According to another aspect of this disclosure, an apparatus for anomaly detection is provided, the apparatus comprising: a computer processor for executing computer program instructions; and a non-transitory computer-readable storage memory storing computer program instructions executable by the computer processor to perform operations including: embedding training data at different spatial scales in the latent space of a neural network model, the neural network model including a plurality of convolutional blocks having the different spatial scales; extracting a plurality of embedding features at the different spatial scales from the plurality of convolutional blocks; determining a loss of the neural network model based on the plurality of embedding features; training the neural network model based on the loss by updating one or more internal parameters of the neural network model; and using at least a portion of the trained neural network model to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies. Attached Figure Description
[0006] The various embodiments will be readily understood from the following detailed description taken in conjunction with the accompanying drawings. For ease of description, similar reference numerals denote similar structural elements. In the accompanying figures, embodiments are shown by way of example rather than limitation.
[0007] Figure 1 This is a block diagram of an anomaly detection system according to various embodiments.
[0008] Figure 2 These are example DNNs for anomaly detection based on various embodiments.
[0009] Figure 3 This is a block diagram of the testing and deployment modules according to various embodiments.
[0010] Figure 4A It is a data capture component according to various embodiments.
[0011] Figure 4B These are aggregated images based on various embodiments.
[0012] Figure 5 This is a flowchart of an anomaly detection method according to various embodiments.
[0013] Figure 6 It is a CNN based on various embodiments.
[0014] Figure 7 These are example convolutions based on various implementations.
[0015] Figure 8 This is a block diagram of an example computing device according to various embodiments. Detailed Implementation
[0016] Overview The past decade has witnessed the rapid development of data processing technologies based on artificial intelligence (AI), especially deep neural networks (DNNs). DNNs are widely used in anomaly detection, computer vision, speech recognition, image and video processing, primarily due to their ability to surpass human accuracy. A DNN typically consists of a series of layers. A DNN layer can include one or more operations, such as convolution, interpolation, layer normalization, batch normalization, SoftMax operations, pooling, element-wise operations, linear operations, and non-linear operations. These operations are known as deep learning operations or neural network operations.
[0017] Neural network operations can be tensor operations. The input or output data of a neural network operation can be arranged in a data structure called a tensor. Taking a convolutional layer as an example, the input tensor includes an activation tensor (also called an "input feature map (IFM)" or "input activation tensor"), which includes one or more activations (also called "input elements") and a weight tensor. The weight tensor can be a kernel (2D weight tensor), a filter (3D weight tensor), or a filter bank (4D weight tensor). In a convolutional layer, a convolution operation can be performed on the input activation tensor and the weight tensor to compute the output activation tensor.
[0018] A tensor is a data structure that has multiple elements along one or more dimensions. Examples of tensors include vectors (one-dimensional (1D) tensors), matrices (two-dimensional (2D) tensors), three-dimensional (3D) tensors, four-dimensional (4D) tensors, and even tensors with higher dimensions. The dimensions of a tensor can correspond to axes, such as axes in a coordinate system. Dimensions can be measured by the number of data points along an axis. The dimensions of a tensor define its shape. A DNN layer can receive one or more input tensors and compute an output tensor based on those input tensors. In some embodiments, a three-dimensional tensor can have X, Y, and Z dimensions. The X dimension of a tensor can be a horizontal dimension, the length of which can be the width of the tensor; the Y dimension can be a vertical dimension, the length of which can be the height of the tensor; and the Z dimension can be a channel dimension, the length of which can be the number of channels. The coordinates of elements along a dimension can be integers between 0 and (L-1) (inclusive), where L is the length of the tensor in that dimension. For example, the x-coordinate of the first element in a row can be 0, the x-coordinate of the second element in a row can be 1, and so on. Similarly, the y-coordinate of the first element in a column can be 0, the y-coordinate of the second element can be 1, and so on. A four-dimensional tensor can have a fourth dimension, which can indicate the number of batches in the operation.
[0019] In real-world data-driven predictive and analytics workflows, automated anomaly detection is a pervasive and crucial problem. Effective anomaly detection can help support manufacturing processes, quality control assessments, and the identification of information-rich data points within datasets. Many anomaly detection methods are based on deep learning. However, currently available anomaly detection methods suffer from various drawbacks and challenges, such as the need for a large number of data examples (typically thousands to leverage a deep learning model), the requirement to pre-specify anomalous data and anomaly category types, and a lack of robustness. Furthermore, many anomaly detection algorithms do not calibrate well for anomaly type specificity (e.g., the optimal scale for anomaly detection) without a large amount of data.
[0020] Embodiments of this disclosure can improve upon at least some of the challenges and problems described above by providing multi-scale DNNs for anomaly detection. Example multi-scale DNNs include layers at different spatial scales. These multi-scale DNNs are capable of both macroscopic and fine-grained anomaly detection even with small datasets. Training these multi-scale DNNs may not require specific anomaly training data.
[0021] In various embodiments of this disclosure, the DNN for anomaly detection may include convolutional blocks of different spatial scales. For example, convolutional blocks may generate embedded features (e.g., feature maps) with different spatial scales. The spatial scale of the convolutional block can indicate the resolution of the feature map generated by the convolutional block. Resolution can be the total number of pixels or elements in the feature map, the total number of pixels or elements per unit spatial region in the feature map, the spatial size of pixels or elements in the feature map, and so on. A convolutional block includes one or more convolutional layers. A convolutional block may also include one or more other layers, such as pooling layers, and so on. The DNN may be a lightweight CNN, meaning that the total number of layers or the total number of internal parameters in the DNN may be finite (e.g., below a threshold number).
[0022] A multi-resolution, contrastive learning paradigm can be used to train a DNN. In some embodiments, the training data can be normal data lacking anomalies. In other embodiments, the training data can include both normal and anomalous data. Normal and anomalous data can have different labels. After the input is provided to the DNN, the convolutional blocks generate multiple embedding features with different spatial scales based on the input. The distance between each embedding feature can be determined. The distance between the embedding features can be the Euclidean distance between the embedding feature and the model embedding. The model embedding can be determined before the DNN is trained. The distances of multiple embedding features can be summed to determine the loss of the DNN. Based on this loss, the intrinsic parameters of the DNN can be adjusted. After training, the accuracy of the DNN can be validated. After training or validation, the anomaly detection model can be deployed for anomaly detection. During deployment, the DNN can receive input and generate an output indicating whether the input is anomalous. An example of the output can be an anomaly score. For an input with a specific spatial size, a subset of convolutional blocks (e.g., one or more convolutional blocks) can be selected from all convolutional blocks in the model based on the spatial scale of the input or the spatial scale of the convolutional blocks. The selected convolutional blocks(s) can be used to detect anomalies in the data. Unselected convolutional blocks (one or more) may not be used.
[0023] This disclosure presents a novel and robust anomaly detection algorithm that can effectively perform both macroscopic and fine-grained anomaly detection, even with small datasets, without requiring specific anomaly training data. This algorithm can be referred to as Multi-Resolution Deep Support Vector Data Description (MR-SVDD). MR-SVDD can be applied to a wide range of real-world anomaly detection use cases where data is scarce and anomaly examples are unknown (or unannotated) beforehand.
[0024] As mentioned above, MR-SVDD provides consistent and automated anomaly detection by training lightweight DNNs using a multi-resolution, contrastive learning paradigm. CNNs can learn to compactly embed normal training data into the model's latent space simultaneously at multiple resolutions. This multi-resolution guidance mechanism enhances the robustness of anomaly detection predictions. MR-SVDD can improve the cost and efficiency of various manufacturing processes and capabilities. Compared to currently available deep learning-based anomaly detection methods, MR-SVDD is likely more robust because it leverages a flexible, automated multi-scale resolution anomaly detection mechanism. MR-SVDD can operate effectively on single-class data (e.g., normal examples) in small datasets. With a custom loss function, MR-SVDD can still operate in supervised, multi-class settings, for example, when training data is provided for both normal and anomalous (or more specific types of anomaly categories) data simultaneously. Furthermore, due to its multi-scale aspect, MR-SVDD can be used to accurately identify specific parts / locations of anomalies.
[0025] For illustrative purposes, specific figures, materials, and configurations have been set forth to provide a thorough understanding of the illustrative implementation. However, it will be apparent to those skilled in the art that this disclosure may be practiced without specific details, and / or may be practiced using only some of the aspects described. In other instances, well-known features have been omitted or simplified so as not to obscure the illustrative embodiments.
[0026] Furthermore, reference has been made to the accompanying drawings, which form part of this disclosure, and practical embodiments are illustrated in the drawings by way of illustration. It should be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of this disclosure. Therefore, the following detailed description should not be construed as limiting.
[0027] Various operations can be described sequentially as a plurality of discrete actions or operations in a manner most conducive to understanding the claimed subject matter. However, the order of description should not be construed as implying that these operations must depend on the order. In particular, these operations may not be performed in the order presented. The described operations may be performed in a different order than in the described embodiments. Various additional operations may be performed, or the described operations may be omitted in additional embodiments.
[0028] For the purposes of this disclosure, the phrase "A or B" or the phrase "A and / or B" refers to (A), (B), or (A and B). For the purposes of this disclosure, the phrase "A, B, or C" or the phrase "A, B, and / or C" refers to (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C). When used to refer to a measurement range, the term "between" includes the endpoints of the measurement range.
[0029] This description uses the phrases "in one embodiment" or "in an embodiment," both of which can refer to one or more of the same or different embodiments. Terms such as "comprising," "including," "having," etc., used with respect to embodiments of this disclosure are synonyms. This disclosure may use perspective-based descriptions such as "above," "below," "top," "bottom," and "side" to interpret various features of the drawings; however, these terms are merely for ease of discussion and do not imply any desired or required direction. The drawings are not necessarily drawn to scale. Unless otherwise stated, the use of ordinal adjectives such as "first," "second," and "third" to describe common objects indicates only different instances of the similar objects referred to and is not intended to imply that the objects described must be arranged in a given order, whether temporally, spatially, in rank, or otherwise.
[0030] In the following detailed description, terms commonly used by those skilled in the art will be used to describe various aspects of the illustrative implementations in order to convey the substance of their work to others skilled in the art.
[0031] The terms “substantially,” “close to,” “approximately,” “near,” and “about” generally refer to values within + / - 20% of the target value as described herein or known in the art. Similarly, terms indicating the orientation of various elements, such as “coplanar,” “perpendicular,” “orthogonal,” “parallel,” or any other angle between elements, generally refer to values within + / - 5-20% of the target value as described herein or known in the art.
[0032] Furthermore, the terms “comprising,” “including,” “having,” or any other variations thereof are intended to cover non-exclusive inclusion. For example, a method, process, apparatus, or DNN accelerator that includes a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such a method, process, apparatus, or DNN accelerator. Additionally, the term “or” refers to inclusive “or” rather than exclusive “or.”
[0033] The systems, methods, and apparatuses disclosed herein are innovative in several ways, but none of them alone is responsible for all the desired properties disclosed herein. Details of one or more implementations of the subject matter described herein are set forth in the following description and figures.
[0034] Figure 1 This is a block diagram of an anomaly detection system 100 according to various embodiments. The anomaly detection system 100 can detect anomalies in images, videos, or other types of data. Figure 1As shown, the anomaly detection system 100 includes an interface module 110, a training module 120, an anomaly detection DNN 130, a compression module 140, a layer selection module 150, a testing and deployment module 160, a compiler 170, and a data storage device 180. In other embodiments, the anomaly detection system 100 may include alternative configurations, different or additional components. Furthermore, the functions implemented by the components of the anomaly detection system 100 may be performed by other components included in the anomaly detection system 100 or by other modules or systems.
[0035] Interface module 110 facilitates communication between anomaly detection system 100 and other modules or systems. For example, interface module 110 establishes communication between anomaly detection system 100 and an external database to receive data that can be used to train anomaly detection DNN 130. Interface module 110 can also establish communication between anomaly detection system 100 and external systems or devices to receive data that can be used to test or deploy anomaly detection DNN 130 for anomaly detection. As another example, interface module 110 can distribute a portion of anomaly detection DNN 130 to other systems to perform anomaly detection tasks, for example, after anomaly detection DNN 130 has been trained, compressed, or tested.
[0036] Training module 120 trains anomaly detection DNN 130. In some embodiments, training module 120 may form one or more training datasets to train anomaly detection DNN 130. The training dataset may include training samples, each of which may be associated with a class label. Training samples may be referred to as training data or training data features. The training dataset may be represented as... ,in This represents the data feature, where y represents the category label. An index representing each data feature. In some embodiments, the training dataset may include training data of a single category, such as normal data. Normal data may be expected, anticipated, standard, or typical data. All data in the training dataset may have the same category label. In other embodiments, the training dataset may include training data of multiple categories. For example, the training dataset may include both normal and anomalous data. Anomalous data may be unexpected, anticipated, non-standard, or unusual data. Normal data may have a category label y=1, while anomalous data may have a category label y=-1 or y=0. In some embodiments, a portion of the training dataset may be used for initial training of the DNN, while the remainder of the training dataset may be reserved as a tuning or validation subset for the training module 120 to tune or validate the performance of the trained DNN. A portion of the training dataset that does not include the tuning or validation subset may be used to train the anomaly detection DNN 130.
[0037] In some embodiments, training module 120 may determine one or more hyperparameters to train anomaly detection DNN 130. Hyperparameters are variables that specify the training process. Hyperparameters differ from parameters (e.g., weights) within the anomaly detection DNN 130. In some embodiments, hyperparameters include variables that determine the architecture of the anomaly detection DNN 130, such as the number of convolutional blocks, the number of layers, spatial scale, etc. Hyperparameters also include variables that determine how the anomaly detection DNN 130 is trained, such as batch size, number of training epochs, etc. The batch size defines the number of training samples to be processed before updating the parameters of the anomaly detection DNN 130. The batch size is the same as or smaller than the number of samples in the training dataset. The training dataset may be divided into one or more batches. The number of training epochs defines the number of times the entire training dataset is passed through the entire network. The number of training epochs defines the number of times the deep learning algorithm processes the entire training dataset. A training epoch means that each training sample in the training dataset has a chance to update the parameters within the anomaly detection DNN 130. Training epochs may include one or more batches. The number of training rounds can be 1, 5, 10, 50, 100, 500, 1000 or more.
[0038] In some embodiments, training module 120 may define the architecture of anomaly detection DNN 130, for example, based on some hyperparameters. The architecture of anomaly detection DNN 130 includes multiple convolutional blocks. A convolutional block may include one or more layers. In some embodiments, a convolutional block may include at least one convolutional layer. A convolutional block may also include one or more other layers (e.g., pooling layers for reducing the spatial volume of the feature map after convolution), activation functions (e.g., rectified linear unit (ReLU) activation function, tangent activation function, etc.), or other types of layer or neural network operations. The convolutional block itself may be a DNN. The convolutional block may abstract its input (which may be the input of anomaly detection DNN 130) to a feature map. The feature map may be embedded features, which may be represented by a tensor. The tensor may be a 3D tensor. The spatial size and shape of the feature map may be defined by the height, width, and depth of the tensor.
[0039] In some embodiments, the anomaly detection DNN 130 can simultaneously embed training data (e.g., normal data) into its latent space at different spatial scales of the convolutional blocks. The training data input to the anomaly detection DNN 130 can be processed simultaneously by the convolutional blocks (e.g., within the same epoch), and the convolutional blocks can output embedding features at different spatial scales.
[0040] After defining the architecture of the anomaly detection DNN 130 in training module 120, training module 120 can input the training dataset into the anomaly detection DNN 130. Training module 120 can compute the loss from the output of the anomaly detection DNN 130 and the output of the convolutional blocks in the anomaly detection DNN 130. Training module 120 can modify the internal parameters of the anomaly detection DNN 130 to minimize the loss. The internal parameters include the weights of one or more convolutional layers in the anomaly detection DNN 130.
[0041] In some embodiments, the loss L of the anomaly detection DNN 130 can be expressed as: , in This represents the total number of training data. Representing data The index, loss It is all The sum of three data points. The first item is... ,in This indicates anomaly detection DNN 130. Representing data Potential embeddings, This represents a fixed value during training. Representing data Category labels. In some embodiments, the training module 120 may extract from the anomaly detection DNN 130 before the training process. . It can be anomaly detection DNN130 based on data The output of the generated anomaly detection DNN 130. It can indicate data Is it normal or abnormal?
[0042] In some embodiments, It can be the average value of the embeddings from normal training data, for example, for the untrained, initial phase of an anomaly detection DNN130. μ can be expressed as: This represents the anomaly detection DNN 130 before training. For example, the internal parameters of the anomaly detection DNN 130 have their original values. This indicates that the anomaly detection DNN 130 uses the raw values of its internal parameters based on the data. The output of the generated anomaly detection DNN 130. The training module 120 can compute the following before training the anomaly detection DNN 130: Although the internal parameters of the anomaly detection DNN130 are modified during training, It can remain unchanged. In some embodiments, These can be used as anchor embeddings for training. During training, the internal parameters of the anomaly detection DNN 130 can be modified to encourage the embedding of one or more (or even all) normal embeddings into the training vector. Proximity, while simultaneously prompting the removal of one or more (or even all) anomalous training data. This can produce compact and discriminative normal / abnormal embedding manifolds.
[0043] The second item is The second part can accumulate the distances (e.g., Euclidean distance, also known as L2 distance) between latent embeddings at different spatial scales within the anomaly detection DNN 130. For example, It can be a convolutional block The embedded features generated in (represented as) ) and the average value at different spatial scales (expressed as The Euclidean distance between them. In some embodiments, This indicates that in the convolution block The model embedding extracted from the source. It can represent a convolutional block. Spatial scale and convolutional blocks The spatial scale of the generated embedded features. In some embodiments, It can represent a convolutional block. The resolution. Indicated in scale The generated data Embedded features. It can be a feature map, or a 1D, 2D, or 3D tensor.
[0044] Training module 120 can be computed before training the anomaly detection DNN 130. In some embodiments, This scale can be obtained by averaging the training (e.g., normal category) data. The average value of potential embeddings. It can be represented as: Represents the convolutional block before training. For example, convolutional blocks The internal parameters have their original values. Represents a convolution block Use the original values of its internal parameters based on the data. The generated output. In some embodiments, It can be a fixed value during training, which means It can be independent of the convolution block It changes due to modifications to its internal parameters.
[0045] The third item is In some embodiments, This could be an L2 regularization term. The third term can mitigate model overfitting during training. In some embodiments, as the anomaly detection DNN 130 processes input data (e.g., training data), its receptive field may increase at each layer due to the combination of convolutional operations performed layer by layer. Each layer may process increasingly larger spatial scale information.
[0046] Training module 120 can train anomaly detection DNN 130 for a predetermined number of training epochs. The number of training epochs can be a hyperparameter, defining the number of times the deep learning algorithm processes the entire training dataset. One training epoch means that each sample in the training dataset has a chance to update the internal parameters of anomaly detection DNN 130. After training module 120 completes the predetermined number of training epochs, training module 120 can stop updating the parameters in anomaly detection DNN 130.
[0047] Compression module 140 can compress anomaly detection DNN 130. In some embodiments, compression module 140 can add one or more pruning operations to one or more layers of anomaly detection DNN 130 to reduce computational complexity or memory usage. In some embodiments, compression module 140 can determine the compression of anomaly detection DNN 130 based on one or more configurations of the hardware device on which anomaly detection DNN 130 is to be executed. Examples of such configurations may include the configuration of one or more computing resources available in the hardware device (e.g., number of processing units, number of processing elements, number of available threads, etc.) and the configuration of one or more data storage resources (e.g., memory size, memory bandwidth, etc.). Compression module 140 can compress anomaly detection DNN 130 when it determines that the available computing resources or data storage resources in the hardware device are insufficient to execute anomaly detection DNN 130 or one or more layers of anomaly detection DNN 130.
[0048] Pruning operations can be performed on the weight tensor of a layer by changing one or more non-zero value weights of the layer to zero. This modification can be performed before, during, or after training. Weights can be pruned during training, during inference, or a combination of both. Compression module 140 can determine the sparsity of the layer. The sparsity can be the ratio of the number of zero-value weights in the layer to the total number of weights. Compression module 140 can perform pruning operations until the sparsity of the layer reaches a target sparsity, such as 10%, 20%, 30%, 40%, 50%, etc. In some embodiments, compression module 140 can determine the target sparsity based on the configuration of one or more of the above-described hardware devices.
[0049] In some embodiments, the compression module 140 may select a structured sparsity pattern for a layer and prune the weights of the DNN layer to achieve the structured sparsity pattern. The structured sparsity pattern can be represented by a structured sparsity ratio of N:M. During pruning, the compression module 140 may divide the kernel into weight blocks, each weight block including M consecutive weights. For each weight block, the compression module 140 may select N(one or more) elements and change the values of the unselected(one or more) elements in the weight block to zero. The compression module 140 may generate a sparse graph indicating the sparsity of the weights. In some embodiments, the compression module 140 may generate a sparse graph for each weight block. The sparse graph may include M sparse elements corresponding to the M weights in the weight block. Each sparse element may indicate whether the corresponding weight is zero. The sparse graph may be provided to a hardware device that executes the anomaly detection DNN 130 and used by the hardware device to accelerate the execution of the anomaly detection DNN 130.
[0050] In some embodiments, the compression module 140 may select one or more layers in the anomaly detection DNN 130 and modify each selected layer using pruning operations. For example, the compression module 140 may select layers with high computational complexity, such as layers with large filters. For pruning operations on a layer or a class of layers, the compression module 140 may determine a weight threshold that will not cause the accuracy loss of the anomaly detection DNN 130 to exceed the accuracy loss constraint. The pruning operation may modify weights with absolute values higher than the weight threshold to zero, while other weights remain unchanged. Weight pruning can reduce memory storage because zero-value weights do not need to be stored. Furthermore, the number of operations in a layer can be reduced because the computation of zero-value weights can be skipped without affecting the layer's output. In some embodiments, the compression module 140 may also measure the energy savings, final DNN accuracy, or layer-by-layer sparsity resulting from the pruning operation.
[0051] After compressing the anomaly detection DNN 130, the compression module 140 can fine-tune (or instruct the training module 120 to fine-tune) the anomaly detection DNN 130, for example, through a retraining process. The compression module 140 can fine-tune the DNN after the weights have been pruned. In some embodiments, the fine-tuning process is a retraining or further training process. For example, after the weights in the anomaly detection DNN 130 have been pruned, the anomaly detection DNN 130 can be further trained by inputting an adjusted dataset into the anomaly detection DNN 130. In some embodiments, the values of the pruned weights (i.e., zero) can remain unchanged during the fine-tuning process. For example, the compression module 140 can place a mask over the pruned weight block that prevents the values in the pruned weight block from being changed during the fine-tuning process. In other embodiments, the values of all weights (including the pruned weights) can be changed during the fine-tuning process.
[0052] After one or more retraining and weight change cycles, the compression module 140 can perform a new pruning process, for example, by selecting and pruning a weight block. In some embodiments, the weight pruning process can be repeated multiple times before the fine-tuning process is complete. In some embodiments, the number of training epochs in the fine-tuning process can differ from the number of training epochs in the training process that determines the pre-pruned weight values. For example, the number of training epochs in the fine-tuning process can be fewer than the number of training epochs in the training process. In one example, the number of training epochs in the fine-tuning process can be relatively small, such as 2, 3, 4, 5, etc.
[0053] Layer selection module 150 selects layers from anomaly detection DNN 130 to perform anomaly detection tasks. For example, layer selection module 150 can select various subsets (“subsets of convolutional blocks”) of convolutional blocks in the anomaly detection DNN 130 for various applications. Subsets of convolutional blocks may include one or more, but not all, convolutional blocks in the anomaly detection DNN 130. Layer selection module 150 can select subsets of convolutional blocks based on a target spatial scale. The target spatial scale can be the spatial scale of the input data (e.g., an image), which can be the data to be fed into the subset of convolutional blocks to perform the anomaly detection task. Layer selection module 150 can also select subsets of convolutional blocks based on the spatial scale of the convolutional blocks. In one example, layer selection module 150 can select one or more convolutional blocks, each with a spatial scale no greater than the spatial scale of the input data. Furthermore, layer selection module 150 can also select at least one convolutional block with a spatial scale greater than the spatial scale of the input data.
[0054] In some embodiments, to form a subset of convolutional blocks for an anomaly detection task, the layer selection module 150 may generate multiple candidate subsets of convolutional blocks. The layer selection module 150 may evaluate the performance of the candidate subsets of convolutional blocks and select the optimal subset for the anomaly detection task. The optimal subset of convolutional blocks may be the subset with the best performance. To evaluate the performance of the subset of convolutional blocks, the layer selection module 150 may evaluate or measure the accuracy, latency, power consumption, time consumption, computational resource consumption, data storage resource consumption, or other factors of each subset of convolutional blocks.
[0055] Before or after the anomaly detection DNN 130 is trained (e.g., by training module 120), compressed (e.g., by anomaly detection DNN 130), or tested (e.g., by testing and deployment module 160), the layer selection module 150 may form a subset of convolutional blocks for the anomaly detection task. One or more layers in the subset of convolutional blocks will be used to perform the task, while the unselected layers (one or more) will not be used. In some embodiments, the layer selection module 150 may update the anomaly detection DNN 130 to include the selected layers (one or more), while the unselected layers (one or more) will not be included in the anomaly detection DNN 130 after the update.
[0056] The testing and deployment module 160 can test and deploy the anomaly detection DNN 130 to perform anomaly detection tasks. The testing and deployment module 160 can acquire data to be input into the anomaly detection DNN 130 for testing or deployment. For example, the testing and deployment module 160 can combine multiple images of an object to generate an aggregated image. The testing and deployment module 160 can also control the operation of components that facilitate image capture. The testing and deployment module 160 can use the aggregated image as anomaly detection data and input the anomaly detection data into the anomaly detection DNN 130 to initiate inference of the anomaly detection DNN 130. The anomaly detection DNN 130 can generate output based on the anomaly detection data. Each convolutional block in the anomaly detection DNN 130 can process the anomaly detection data and generate embedding features with the spatial scale of the convolutional block.
[0057] In some embodiments, the testing and deployment module 160 can determine anomaly scores based on the outputs of the anomaly detection DNN 130 and the convolutional blocks. The anomaly score can be expressed as: in Indicates input data, This represents the output of the anomaly detection DNN 130. Indicates spatial scale The output of the convolutional block, This represents a set of anomaly score weights. In some embodiments, the testing and deployment module 160 may adjust or specify the anomaly score weights. In other embodiments, the testing and deployment module 160 may allow the user to adjust or specify the anomaly score weights. Anomaly scores may include weighted averages of L2 distances relative to the normal data embedding manifold for different scale resolutions.
[0058] The testing and deployment module 160 can determine whether the anomaly detection data has any anomalies based on an anomaly score. For example, the testing and deployment module 160 can determine whether the anomaly score is greater than or equal to a threshold score. In embodiments where the anomaly score is greater than or equal to the threshold score, the testing and deployment module 160 can determine that the anomaly detection data is anomaly. In embodiments where the anomaly score is lower than the threshold score, the testing and deployment module 160 can determine that the anomaly detection data is not anomaly.
[0059] In some embodiments (e.g., embodiments where the testing and deployment module 160 tests the effectiveness of the anomaly detection DNN 130), the testing and deployment module 160 can verify the accuracy of the anomaly detection DNN 130 after training by the training module 120, compression by the compression module 140, or layer selection by the layer selection module 150. In some embodiments, the testing and deployment module 160 inputs one or more data points from the verification dataset into the anomaly detection DNN 130 and uses the output of the anomaly detection DNN 130 to determine the model accuracy. In some embodiments, the verification dataset may consist of some or all of the samples from the training dataset. Additionally or alternatively, the verification dataset includes additional samples beyond the training set.
[0060] In some embodiments, the testing and deployment module 160 may determine an accuracy score that measures the precision, recall, or a combination of precision and recall of the anomaly detection DNN 130. The testing and deployment module 160 may use the following metrics to determine the accuracy score: Precision = TP / (TP + FP) and Recall = TP / (TP + FN), where precision can be the number of anomalies correctly predicted by the anomaly detection DNN 130 out of the total number of objects predicted as anomalies (TP + FP or false positives), and recall can be the number of anomalies correctly predicted by the anomaly detection DNN 130 out of the total number of objects that actually have anomalies (TP + FN or false negatives). An F-score (F-score = 2 * PR / (P + R)) unifies precision and recall into a single metric. TP can represent that the anomaly detection DNN 130 predicts anomalies, and the data does indeed have anomalies. FP can represent that the anomaly detection DNN 130 predicts anomalies, but the data does not have anomalies. TN can represent that the anomaly detection DNN 130 predicts no anomalies, and the data is indeed not anomaly-free. FN can represent that the anomaly detection DNN 130 predicts no anomalies, but the data contains anomalies.
[0061] The testing and deployment module 160 can compare the accuracy score to a threshold accuracy. In an example where the testing and deployment module 160 determines that the accuracy score of the DNN is below a threshold, the testing and deployment module 160 instructs the training module 120 to retrain the anomaly detection DNN 130. In one embodiment, the testing and deployment module 160 can instruct the training module 120 to iteratively retrain the anomaly detection DNN 130 until a stopping condition is met, such as an accuracy measurement indicating that the anomaly detection DNN 130 may be sufficiently accurate, or that it has already been trained multiple times.
[0062] In some embodiments (e.g., test and deployment module 160 deploys anomaly detection DNN 130 to perform anomaly detection tasks), test and deployment module 160 may generate a message indicating the presence or absence of an anomaly based on the output or anomaly score of anomaly detection DNN 130. Test and deployment module 160 may transmit this message to an external system or device, for example, via interface module 110. Certain aspects of test and deployment module 160 will be discussed below. Figure 3 The description is based on Figure 4.
[0063] Compiler 170 can compile information about anomaly detection DNN 130 to generate executable instructions that can be executed, for example, by one or more hardware devices (e.g., processing units), to perform neural network operations in anomaly detection DNN 130. In some embodiments, compiler 170 can generate a graph representing anomaly detection DNN 130. The graph can include nodes and edges. Nodes can represent specific neural network operations in anomaly detection DNN 130. Edges can connect two nodes and represent connections between two corresponding neural network operations. For example, edges can encode tensors from one neural network operation to another. The tensor can be the output tensor of the first neural network operation and the input tensor of the second neural network operation. Edges can encode one or more properties of the tensor (e.g., size, shape, storage format, etc.). Compiler 170 can use this graph to generate an executable version of anomaly detection DNN 130. For example, the compiler can generate computer program instructions for executing anomaly detection DNN 130.
[0064] In some embodiments, compiler 170 may generate configuration parameters for configuring components of one or more hardware devices to perform anomaly detection DNN 130. The configuration parameters may be stored in one or more configuration registers associated with the components of one or more hardware devices. In some embodiments, compiler 170 may compile anomaly detection DNN 130 after compression module 140 has compressed it. For example, compiler 170 may generate configuration parameters that cause the hardware device to perform anomaly detection DNN 130 to load convolution activations and weights into the processing unit in such a way that computation in the processing unit is accelerated based on the sparsity of the activations or weights. Compiler 170 may also generate configuration parameters for configuring components of one or more hardware devices to perform accelerated computation based on sparsity.
[0065] Data storage device 180 stores data received, generated, used, or otherwise related to the anomaly detection system 100. For example, data storage device 180 stores the dataset used by training module 120. Data storage device 180 may also store data generated by training module 120, such as hyperparameters used to train anomaly detection DNN 130, intrinsic parameters of anomaly detection DNN 130 (e.g., weights), etc. Data storage device 180 may also store data generated by compression module 140, such as compressed weights, sparse graphs, etc. Data storage device 180 may also store data generated by layer selection module 150 and testing and deployment module 160. Data storage device 180 may store instructions, configuration parameters, or other data generated by compiler 170. Data storage device 180 may include one or more memories. Figure 1 In one embodiment, the data storage device 180 is a component of the anomaly detection system 100. In other embodiments, the data storage device 180 may be located outside the anomaly detection system 100 and communicate with the anomaly detection system 100 via a network.
[0066] Figure 2 An example DNN 200 for anomaly detection is shown according to various embodiments. The DNN 200 can be... Figure 1 An example of an anomaly detection DNN 130 in [the example]. Figure 2 As shown, DNN 200 includes convolutional blocks 230A-230N (collectively referred to as "convolutional blocks 230"). Each convolutional block 230 may include at least one convolutional layer. In other embodiments, DNN 200 may include fewer, more, or different components. For example, DNN 200 may arrange fully connected layers or SoftMax layers after convolutional blocks 230. As another example, DNN 200 may have a different number of convolutional blocks.
[0067] To illustrate, Figure 2 In this embodiment, an input image 210 is used. The input image 210 can be an image of an object, which the DNN 200 uses to detect any anomalies in the object. The input image 210 can be captured by one or more cameras. In some embodiments, the input image 210 can be generated from multiple images of the object. For example, these images can be stitched together to form the input image 210. In some embodiments, the input image 210 can be... Figure 2 The test and deployment modules in 160 are obtained.
[0068] The input image 210 is converted into the input tensor 220. For example, Figure 2 The input tensor 220 is a 3D tensor, comprising data elements (e.g., activation values) arranged in a 3D structure. The input tensor 220 can be generated by encoding the input image 210. In some embodiments, the input tensor 220 can be generated by... Figure 2 The testing and deployment module 160 is generated. Input tensor 220 is fed into DNN 200. Convolutional block 230 processes input tensor 220 and produces embedding features 235A-235N (collectively referred to as "embedding features 235"). Embedding feature 235A is represented as... ,in This represents the spatial scale of convolutional block 230A. Embedded feature 235B is represented as... ,in This represents the spatial scale of the convolutional block 230B. The embedding feature 235N is represented as... ,in This represents the spatial scale of the convolutional block 230N. Each embedding feature 235 may have the spatial scale of the convolutional block 230 that generates the embedding feature 235. In some embodiments, each embedding feature 235 may be a tensor, such as a 2D or 3D tensor.
[0069] DNN 200 uses input tensor 220 to generate output 205. Figure 2 The output 205 is represented as Output 205 and the embedding features 235 extracted from convolutional block 230 can be used to determine an anomaly score, which can indicate whether the input image 210 displays any anomalies of an object. In some embodiments, not all embedding features 235 are used to determine the anomaly score. For example, one or more convolutional blocks 230 can be selected based on the spatial scale of the input tensor 220 or the input image 210. Embedding features 235 extracted from the selected convolutional block(s) 230(s) can be used to determine the anomaly score, while other embedding features 235(s) may not be used. In some embodiments, convolutional blocks 230 can generate embedding features 235 in parallel or even simultaneously.
[0070] Figure 3 This is a block diagram of a test and deployment module 300 according to various embodiments. The test and deployment module 300 may be... Figure 1 Example of the testing and deployment module 160. (e.g.) Figure 3 As shown, the test and deployment module 300 includes a data acquisition component 310, an orientation module 320, a sensor controller 330, a deployment module 340, and a neural processing unit (NPU) 350. In other embodiments, the test and deployment module 300 may include alternative configurations, different or additional components. Furthermore, the functions implemented by the components of the test and deployment module 300 may be performed by other components included in the test and deployment module 300 or by other modules or systems.
[0071] Data capture component 310 facilitates the capture of data about an object, which can be used to detect anomalies associated with that object. In some embodiments, data capture component 310 may include one or more sensors capable of detecting objects placed inside or near data capture component 310. The sensors may capture at least a portion of the object and output sensor data. Examples of the sensors (one or more) may include image sensors, depth sensors, pressure sensors, ultrasonic sensors, other types of sensors, or combinations thereof. For example, data capture component 310 may include one or more cameras for capturing images of the object. Sensors in data capture component 310 may be placed in different locations. In some embodiments, different sensors may detect or capture the object from different angles. Data capture component 310 may also include one or more other components besides sensors (one or more). For example, data capture component 310 may include a component for fixing the sensor or object. This component may be moved to change the orientation (e.g., position or direction) of the sensor or object. Some aspects of data capture component 310 will be described below. Figure 4A To provide.
[0072] Orientation module 320 can control the orientation (e.g., position or direction) of one or more components of data acquisition component 310 or objects placed inside data acquisition component 310. In some embodiments, orientation module 320 can detect the initial orientation of a component of data acquisition component 310 (e.g., a sensor) or an object inside data acquisition component 310. Orientation module 320 can also determine the target orientation of the sensor or object and determine whether the current orientation of the sensor or object matches the target orientation (e.g., whether it is the same or substantially similar). If it is determined that the initial orientation does not match the target orientation, orientation module 320 can move the sensor or object to the target orientation. In addition, or alternatively, orientation module 320 can move the sensor or object from the target orientation to another target orientation, for example, to capture different features of the object. After the sensor or object reaches the target orientation, orientation module 320 can notify sensor controller 330 so that sensor controller 330 can control the sensor to start scanning the object.
[0073] Sensor controller 330 controls one or more sensors in data acquisition assembly 310. For example, sensor controller 330 may configure one or more settings for the sensors such that the sensors capture sensor data of an object according to one or more settings. Examples of such settings may include scan speed, scan time, scan resolution, etc. In some embodiments, sensor controller 330 may configure different settings for different sensors on the same object. Sensor settings may affect the data to be input to a DNN (e.g., anomaly detection DNN 130) to detect anomalous data. For example, sensor controller 330 may configure a camera to produce images with a specific resolution. In some embodiments, sensor controller 330 may determine sensor settings based on user input. User input may include information about the task of detecting object anomalies. This task information may include information about the object, information about possible anomalies, information about the DNN performing the task, information about the hardware device performing the DNN (e.g., NPU 350), etc.
[0074] Deployment module 340 can deploy a DNN to perform anomaly detection tasks. Examples of DNNs include... Figure 1 Anomaly detection DNN 130 and Figure 2 The DNN 200 is described above. In some embodiments, the deployment module 340 can generate input to the DNN. The input can be data generated by the deployment module 340 based on sensor data captured by one or more sensors in the data capture component 310. For example, the deployment module 340 can receive one or more images of an object from the data capture component 310. The deployment module 340 can generate input data based on one or more images. In embodiments where multiple images of an object exist, the deployment module 340 can combine these images into an aggregated image, for example, by stitching the images together. A portion of an image can be removed to stitch it together with one or more other images. The deployment module 340 can generate input tensors based on images (images from the data capture component 310 or aggregated images). Examples of input tensors could be... Figure 2 The input tensor is 220.
[0075] Deployment module 340 can provide input data to NPU 350. NPU 350 can execute DNNs, including DNNs for anomaly detection. For example, NPU 350 can execute a DNN by executing neural network operations within the DNN. The process of executing neural network operations is also referred to as running neural network operations or performing neural network operations. The execution of a DNN can be for training a DNN or for performing AI tasks using a DNN. NPU 350 can be a DNN accelerator. In some embodiments, NPU 350 includes memory, one or more data processing units, and a direct memory access engine that can transfer data between memory and one or more data processing units. The data processing units can include processing elements that can be arranged in an array. The processing elements can include one or more multipliers and one or more adders. The processing elements can perform multiply-accumulate (MAC) operations. The data processing units can also include acceleration logic that can accelerate neural network operations based on data sparsity. For example, the acceleration logic can accelerate convolution based on the sparsity of the input activation tensor or weight tensor. In some embodiments, NPU 350 can be configured according to a compiler (e.g., Figure 1 The compiler (170) provides instructions (e.g., configuration parameters) to operate.
[0076] Input data from deployment module 340 can be written to the memory of NPU 350 and then transferred to one or more data processing units via a direct memory access engine. NPU 350 can run the inference process of the DNN to detect anomalies in the input data. During inference, one or more data processing units can use the input data or new data generated from the input data to perform neural network operations (e.g., convolutions, etc.) in the DNN. Although Figure 2 Not shown, but in addition to or instead of the NPU 350, the DNN for anomaly detection can be executed by one or more central processing units, graphics processing units or other types of processing units.
[0077] Deployment module 340 can acquire the output of the DNN and the output of the convolutional blocks from NPU 350. Deployment module 340 can determine anomaly scores based on the DNN output and the convolutional block output, as described above. Deployment module 340 can generate a message indicating the anomaly detection result. For example, the message could indicate whether an anomaly exists in an object. This message can be sent to a device or system to facilitate the device or system (or its user) in processing the object. In an example where no anomaly is detected, the object may be considered expected, anticipated, standard, or general, and may be used for manufacturing, providing services, selling, etc. In another example where an anomaly is detected, the object may be discarded or repaired before being used.
[0078] Figure 4A A data capture component 400 according to various embodiments is shown. The data capture component 400 may be... Figure 3 An example of data capture component 310 in [the example]. Figure 4A As shown, the data acquisition component 400 includes a housing 410, cameras 420A-420C, and a station 430. In other embodiments, the data acquisition component 400 may include fewer, more, or different components.
[0079] Housing 410 provides enclosure for cameras 420A-420C and station 430. Cameras 420A-420C can be fixed to housing 410. For illustration, cameras 420A-420B are arranged on top of housing 410. In other embodiments, cameras 420A-420B can be located at other positions within housing 410. Cameras 420A-420B are configured to capture images of objects placed on station 430 to detect anomalies in the objects. For illustration, a screw 440 with an anomaly 450 is placed on station 430. Cameras 420A-420B can capture images of screw 440 from different angles. In some embodiments, station 430 can facilitate rotation of screw 440 so that at least one of cameras 420A-420B can capture a 360-degree image of screw 440. Although Figure 4A Three cameras are shown, but in other embodiments, the data acquisition component 400 may include a different number of cameras. Furthermore, the camera orientations may also differ. Alternatively or additionally, the data acquisition component 400 may also include other types of sensors.
[0080] Figure 4B An aggregated image 405 is shown according to various embodiments. Aggregated image 405 can be generated from images 415A-415C, which are images of screw 440 captured by cameras 420A-420C. Aggregated image 405 can be generated by stitching images 415A-415C together, for example, by aligning the threads on screw 440. Aggregated image 405 shows anomaly 450. In some embodiments, aggregated image 405 can be input into a DNN (e.g., Figure 1 The anomaly detection DNN 130 in the DNN automatically detects anomalies 450. In some embodiments, the aggregated image 405 can be converted into a tensor (e.g., Figure 2Tensor 220 is input into the DNN. Anomalies 450 can be detected based on the output of the DNN and the output of the convolutional blocks in the DNN. In some embodiments, convolutional blocks in the DNN can be selected from the pool of convolutional blocks. The convolutional blocks can be selected based on the resolution of the aggregated image 405 or the resolution of one or more images 415A-415C. The selection can also be based on the spatial scale of the convolutional blocks.
[0081] Figure 4B Images 415A-415C shown are for illustrative and simplification purposes only. Some features of screw 440 or data capture component 400 may not be shown in images 415A-415C. Additionally, other images may be used to generate aggregate image 405.
[0082] Figure 5 This is a flowchart of an anomaly detection method 500 according to various embodiments. Method 500 can be... Figure 1 The anomaly detection system 100 is executed. Although method 500 is a reference... Figure 5 The flowchart shown illustrates this method, but many other methods can be used alternatively for anomaly detection. For example, Figure 5 The execution order of the steps can be changed. As another example, some steps can be changed, deleted, or merged.
[0083] An anomaly detection system 100 embeds (510) training data into the latent space of a DNN model at different spatial scales. The training data includes normal data lacking anomalies. The DNN model comprises multiple convolutional blocks with different spatial scales. In some embodiments, an example of a DNN model is... Figure 1 Anomaly detection DNN 130 or Figure 2 The DNN 200 in this example. In some embodiments, the spatial scale indicates the model embedding extracted at a convolutional block of the neural network model. A convolutional block comprises one or more convolutional layers. In some embodiments, the spatial scale may indicate the resolution of the corresponding convolutional block.
[0084] In some embodiments, the anomaly detection system 100 simultaneously embeds normal data into the latent space of the DNN model at different spatial scales. In some embodiments, the training data includes normal data lacking anomalies and anomalous data containing anomalies. The normal data and anomalous data have different class labels.
[0085] Anomaly detection system 100 extracts (520) multiple embedding features from multiple convolutional blocks. In some embodiments, the multiple embedding features are generated by the multiple convolutional blocks using training data. The multiple embedding features have different spatial scales. In some embodiments, the embedding features have the spatial scale of the convolutional blocks that generated the embedding features.
[0086] Anomaly detection system 100 determines the loss of a (530)DNN model based on multiple embedded features. In some embodiments, for each embedded feature, anomaly detection system 100 determines the distance between the embedded feature and the average value at different spatial scales. Anomaly detection system 100 sums the distances of multiple embedded features. In some embodiments, the distance between the embedded feature and the average value at different spatial scales is the Euclidean distance.
[0087] Anomaly detection system 100 trains (540) a DNN model based on a loss by updating one or more intrinsic parameters of the DNN model. In some embodiments, anomaly detection system 100 updates one or more intrinsic parameters of the DNN model to minimize the loss. In some embodiments, the one or more intrinsic parameters of the DNN model include one or more weights in the convolutional layers of the DNN model.
[0088] Anomaly detection system 100 uses at least a portion of a trained DNN model to detect (550) anomalies in new data. The new data includes anomalous data with anomalies. In some embodiments, anomaly detection system 100 selects one or more convolutional blocks from a plurality of convolutional blocks. The one or more convolutional blocks are used to detect anomalies in the new data. In some embodiments, the new data includes images. The one or more convolutional blocks are selected based on one or more spatial scales of the one or more convolutional blocks and the spatial scale of the image. In some embodiments, anomaly detection system 100 also verifies the effectiveness of the neural network model after training by performing anomaly detection on test data with verified anomalies using the trained neural network model.
[0089] In some embodiments, the anomaly detection system 100 inputs new data into at least a portion of a trained DNN model. The anomaly detection system 100 determines an anomaly score based on the output of at least a portion of the trained DNN model. The anomaly detection system 100 determines whether the new data is anomalous based on the anomaly score. In some embodiments, the anomaly detection system 100 extracts one or more new embedding features from one or more convolutional blocks in at least a portion of the neural network model. The anomaly detection system 100 determines an anomaly score based on the one or more new embedding features. In some embodiments, the anomaly detection system 100 compares the anomaly score with a threshold score and determines that the new data is anomalous when the anomaly score is greater than the threshold score.
[0090] Figure 6 A CNN 600 according to various embodiments is shown. The CNN 600 can be a DNN capable of being used for anomaly detection (e.g., Figure 1 Anomaly detection DNN 130 or Figure 2The CNN 600 is at least a portion of the DNN 200. For illustrative purposes, the CNN 600 includes a series of layers, including multiple convolutional layers 610 (collectively referred to as "convolutional layers 610"), multiple pooling layers 620 (collectively referred to as "pooling layers 620"), and multiple fully connected layers 630 (collectively referred to as "fully connected layers 630"). In other embodiments, the CNN 600 may include fewer, more, or different layers. During the execution of the CNN 600, the layer execution of the CNN 600 includes tensor computations of a variety of tensor operations, such as convolution, interpolation, pooling operations, element-wise operations (e.g., element-wise addition, element-wise multiplication, etc.), other types of tensor operations, or certain combinations of these operations.
[0091] Convolutional layer 610 summarizes the presence of features input into CNN 600. Convolutional layer 610 acts as a feature extractor. The first layer of CNN 600 is convolutional layer 610. In one example, convolutional layer 610 performs a convolution operation on the input tensor 640 (also known as IFM 640) and filter 650. Figure 6 As shown, the IFM 640 is represented by a 7×7×3 three-dimensional (3D) matrix. The IFM 640 includes three input channels, each represented by a 7×7 two-dimensional (2D) matrix. Each row of the 7×7 2D matrix contains 7 input elements (also called input points), and each column contains 7 input elements. The filter 650 is represented by a 3×3×3 3D matrix. The filter 650 includes three kernels, each corresponding to a different input channel of the IFM 640. The kernel is a 2D matrix of weights, where the weights are arranged by column and row. The kernel can be smaller than the IFM. Figure 6 In this embodiment, each kernel is represented by a 3×3 2D matrix. Each row of the 3×3 kernel contains 3 weights, and each column also contains 3 weights. The weights can be initialized and updated using gradient descent via backpropagation. The magnitude of the weights can represent the importance of filter 650 in extracting features from IFM 640.
[0092] The convolution involves a MAC operation on the input elements in IFM 640 and the weights in filter 650. The convolution can be a standard convolution 663 or a depthwise convolution 683. In a standard convolution 663, the entire filter 650 slides over IFM 640. All input channels are combined to produce an output tensor 660 (also called an output feature map (OFM) 660). OFM 660 is represented by a 5×5 2D matrix. Each row of the 5×5 2D matrix contains 5 output elements (also called output points), and each column also contains 5 output elements. For illustration, in... Figure 6 In one embodiment, the standard convolution includes a filter. In an embodiment with multiple filters, the standard convolution can produce multiple output channels (OC) in the OFM 660.
[0093] The multiplication applied between a local patch of IFM 640 kernel size and the kernel can be a dot product. A dot product is an element-wise multiplication between a local patch of IFM 640 kernel size and the corresponding kernel, then summed, always producing a single value. Because it produces a single value, this operation is often called a "scalar product." Using a kernel smaller than IFM 640 is intentional because it allows the same kernel (a set of weights) to be multiplied multiple times by IFM 640 at different points on IFM 640. Specifically, the kernel is systematically applied from left to right and top to bottom to each overlapping portion or local patch of IFM 640 kernel size. Multiplying the kernel by IFM 640 once results in a single value. Since the kernel is applied multiple times to IFM 640, the result of the multiplication is the output element of a 2D matrix. Thus, the 2D output matrix from a standard convolution 663 (i.e., OFM 660) is called OFM.
[0094] In depthwise convolution 683, the input channels are not combined. Instead, a MAC operation is performed on individual input channels and individual kernels, producing open-ended computation (OC). Figure 6 As shown, depthwise convolution 683 produces a depth output tensor 680. The depth output tensor 680 is represented by a 5×5×3 3D matrix. The depth output tensor 680 includes three open-ended (OCs), each channel represented by a 5×5 2D matrix. Each row of the 5×5 2D matrix contains 5 output elements, and each column also contains 5 output elements. Each OC is the result of a MAC operation performed on the input channel of the IFM 640 and the kernel of the filter 650. For example, the first OC (dot pattern) is the result of a MAC operation on the first input channel (dot pattern) and the first kernel (dot pattern); the second OC (horizontal stripe pattern) is the result of a MAC operation on the second input channel (horizontal stripe pattern) and the second kernel (horizontal stripe pattern); and the third OC (diagonal stripe pattern) is the result of a MAC operation on the third input channel (diagonal stripe pattern) and the third kernel (diagonal stripe pattern). In such a depthwise convolution, the number of input channels equals the number of OCs, and each OC corresponds to a different input channel. The input channels and output channels are collectively referred to as depth channels. Following the depthwise convolution, a pointwise convolution 693 is performed on the depthwise output tensor 680 and the 1×1×3 tensor 690 to produce OFM 660. Tensor 690 is a 1D tensor.
[0095] OFM 660 is then passed to the next layer in the sequence. In some embodiments, OFM 660 is passed through an activation function. An example activation function is ReLU. ReLU is a computation that directly returns the value provided as input, or returns 0 if the input is 0 or less. Convolutional layer 610 can receive several images as input and compute the convolution of each of them with each kernel. This process can be repeated several times. For example, OFM 660 is passed to a subsequent convolutional layer 610 (i.e., the convolutional layer 610 in the sequence that produces OFM 660). The subsequent convolutional layer 610 performs convolution on OFM 660 with a new kernel and generates a new feature map. The new feature map can also be normalized and resized. The new feature map can be kernelized again by further subsequent convolutional layers 610, and so on.
[0096] In some embodiments, the convolutional layer 610 has four hyperparameters: the number of kernels, the kernel size (e.g., the kernel size is F×F×D pixels), the stride S of dragging the window corresponding to the kernel on the image (e.g., a stride of 1 means moving the window one pixel at a time), and zero padding P (e.g., adding a black outline of P pixels thickness to the input image of the convolutional layer 610). The convolutional layer 610 can perform various types of convolutions, such as 2D convolution, dilated or dilated convolution, spatially separable convolution, depthwise separable convolution, transposed convolution, etc. CNN 600 includes 16 convolutional layers 610. In other embodiments, CNN 600 may include different numbers of convolutional layers.
[0097] Pooling layer 620 downsamples the feature map generated by the convolutional layer, for example, by downsampling the presence of features in blocks that summarize the feature map. Pooling layer 620 is positioned between two convolutional layers 610: a pre-convolutional layer 610 (the convolutional layer 610 preceding pooling layer 620 in the layer sequence) and a post-convolutional layer 610 (the convolutional layer 610 following pooling layer 620 in the layer sequence). In some embodiments, pooling layer 620 is added after convolutional layer 610, for example, after an activation function (e.g., ReLU, etc.) has been applied to OFM 660.
[0098] Pooling layer 620 receives feature maps generated by the preceding convolutional layer 610 and applies pooling operations to these feature maps. Pooling operations reduce the size of the feature maps while preserving their important characteristics. Therefore, pooling operations improve the efficiency of the DNN and avoid overlearning. Pooling layer 620 can perform pooling operations using average pooling (calculating the average value of each local block on the feature map), max pooling (calculating the maximum value of each local block on the feature map), or a combination of both. The size of the pooling operation is smaller than the size of the feature map. In various embodiments, the pooling operation applies a stride of 2 pixels to a 2×2 pixel area, thereby reducing the size of the feature map by a factor of 2, for example, reducing the number of pixels or values in the feature map to one-quarter of its original size. In one example, pooling layer 620 applied to a 6×6 feature map produces a 3×3 output pooled feature map. The output of pooling layer 620 is fed into the subsequent convolutional layer 610 for further feature extraction. In some embodiments, pooling layer 620 operates on each feature map separately to create a new set of the same number of pooled feature maps.
[0099] Fully connected layer 630 is the last layer of the DNN. Fully connected layer 630 may or may not be convolutional. Fully connected layer 630 receives input operands. The input operands define the outputs of convolutional layer 610 and pooling layer 620 and include the values of the final feature map generated by the last pooling layer 620 in the sequence. Fully connected layer 630 applies a linear combination and activation function to the input operands and generates a vector. This vector can contain as many elements as there are classes: element i represents the probability that an image belongs to class i. Therefore, each element is between 0 and 1, and the sum of all elements can be 1. These probabilities are calculated by the last fully connected layer 630 using a logistic function (for binary classification) or a SoftMax function (for multi-class classification) as the activation function. In some embodiments, fully connected layer 630 multiplies each input element by a weight, sums the results, and then applies an activation function (e.g., logistic if N=2, SoftMax if N>2). This is equivalent to multiplying the input operands by a matrix containing the weights.
[0100] Figure 7 Example convolutions according to various embodiments are shown. This convolution can be a convolutional layer in a DNN (e.g., ...). Figure 6 The deep learning operation is performed in the convolutional layer 610. This convolution can be performed on the activation tensor 710 and the filter 720 (referred to separately as "filter 720"). The filters can form the weight tensor of the convolution. The result of the convolution is the output tensor 730.
[0101] The activation tensor 710 can be computed in the preceding layer of the DNN. In some embodiments (e.g., where the convolutional layer is the first layer of the DNN), the activation tensor 710 can be an image. Figure 7 In this embodiment, the activation tensor 710 includes activation values (also referred to as "input activation values," "elements," or "input elements") arranged in a 3D matrix. The activation tensor 710 may be referred to as the input tensor of the convolution. Input elements are data points in the activation tensor 710. The activation tensor 710 has a spatial size... ,in It is the height of the 3D matrix (i.e., the length along the Y-axis, representing the number of activation values in the column of the 3D matrix for each input channel). It is the width of the 3D matrix (i.e., the length along the X-axis, representing the number of activation values in the rows of the 2D matrix for each input channel). This is the depth of the 3D matrix (i.e., the length along the Z-axis, representing the number of input channels). For simplicity and illustration, the activation tensor 710 has a spatial size of 7×7×3, meaning the activation tensor 710 includes three input channels, each with a 7×7 2D matrix. Each input element in the activation tensor 710 can be represented by (X, Y, Z) coordinates. In other embodiments, the height, width, or depth of the activation tensor 710 may be different.
[0102] Each filter 720 includes weights arranged in a 3D matrix. The values of the weights can be determined by training a DNN. The filter 720 has a spatial size. ,in It is the height of the filter (i.e., the length along the Y-axis, representing the number of weights in each column of the core). It is the width of the filter (i.e., the length along the X-axis, representing the number of weights in each row of the core). This is the depth of the filter (i.e., the length along the Z-axis, representing the number of channels). In some embodiments, equal For the sake of simplicity and explanation, Figure 7 Each filter 720 in the configuration has a spatial size of 3×3×3, meaning that filter 720 includes three convolutional kernels with a spatial size of 3×3. In other embodiments, the height, width, or depth of filter 720 may vary. The spatial size of the convolutional kernels is smaller than the spatial size of the 2D matrix of each input channel in activation tensor 710.
[0103] Activation values or weights can occupy one or more bytes in memory. The number of bytes for activation values or weights can depend on the data format. For example, when activation values or weights are in INT8 format, the activation value occupies one byte. When activation values or weights are in FP16 format, the activation value or weight occupies two bytes. Other data formats can be used for activation values or weights.
[0104] In convolution, each filter 720 slides over the activation tensor 710 and generates a 2D matrix for the output channels in the output tensor 730. Figure 7 In this embodiment, the 2D matrix has a spatial size of 5×5. The output tensor 730 includes activation values (also referred to as "output activation values," "elements," or "output elements") arranged in a 3D matrix. Output activation values are data points in the output tensor 730. The output tensor 730 has a spatial size... ,in It is the height of the 3D matrix (i.e., the length along the Y-axis, representing the number of output activation values in the columns of the 2D matrix for each output channel). It is the width of the 3D matrix (i.e., the length along the X-axis, representing the number of output activation values in the rows of the 2D matrix for each output channel). It is the depth of the 3D matrix (i.e., the length along the Z-axis, representing the number of output channels). It can be equal to the number of filters 720 in the convolution. and This can depend on the activation tensor 710 and the height and width of each filter 720. In the example with a kernel size of 1×1, and They can be equal to respectively and .
[0105] As part of the convolution, the 3×3×3 sub-tensor 715 in the activation tensor 710 (in Figure 7 (Highlighted with a dotted pattern) and each filter 720 performs a MAC operation. The result of performing a MAC operation on subtensor 715 and filter 720 is the output activation value. In some embodiments (e.g., embodiments where the convolution is an integer convolution), the output activation value may include 8 bits, such as one byte. In other embodiments (e.g., embodiments where the convolution is a floating-point convolution), the output activation value may include more than one byte. For example, the output element may include two bytes.
[0106] After completing the MAC operations on subtensor 715 and all filters 720, vector 735 is produced. Vector 735 is... Figure 7 Highlighted using a dotted pattern. Vector 735 comprises a sequence of output activation values arranged along the Z-axis. The output activation values in vector 735 have the same (x, y) coordinates, but these output activation values correspond to different output channels and have different Z-coordinates. The dimension of vector 735 along the Z-axis can be equal to the total number of output channels in output tensor 730. After generating vector 735, further MAC operations are performed to generate additional vectors until output tensor 730 is generated. Figure 7In this embodiment, the output tensor 730 is computed in Z-major format. When the output tensor 730 is computed in ZXY format, the vector adjacent to vector 735 along the X-axis is computed immediately following vector 735. When the output tensor 730 is computed in ZYX format, the vector adjacent to vector 735 along the Y-axis is computed immediately following vector 735. The output tensor 730 can be arranged and stored in memory in either X-major or Y-major format.
[0107] In some embodiments, MAC operations on a 3×3×3 subtensor (e.g., subtensor 715) and filter 720 can be performed by multiple MAC units. One or more MAC units can receive input operands (e.g., Figure 7 The activation operand 717 shown in the figure) and the weight operand (e.g., Figure 7 The weight operand 727 is shown in Figure 717. The activation operand 717 includes a sequence of activation values with the same (x, y) coordinates but different z coordinates. The activation operand 717 includes the activation values from each input channel in the activation tensor 710. The weight operand 727 includes a sequence of weights with the same (x, y) coordinates but different z coordinates. The weight operand 727 includes the weights from each channel in the filter 720. The activation values in the activation operand 717 and the weights in the weight operand 727 can be sequentially input into the MAC unit. The MAC unit receives one activation value and one weight (“activation value-weight pair”) at a time and multiplies the activation value and weight. The position of the activation value in the activation operand 717 can be matched with the position of the weight in the weight operand 727. The activation value and weight can correspond to the same channel.
[0108] Activation values or weights can be floating-point numbers. Floating-point numbers can have various data formats, such as FP32, FP16, BF16, etc. Floating-point numbers can be positive or negative numbers with a decimal point. Floating-point numbers can be represented by a series of bits, including one or more bits representing the sign of the floating-point number (e.g., positive or negative), bits representing the exponent, and bits representing the mantissa. The mantissa is a part of the floating-point number that represents its significant digits. Multiplying the mantissa by the exponent of the base yields the actual value of the floating-point number.
[0109] In some embodiments, the output activation values in the output tensor 730 may be further processed based on one or more activation functions before being written to memory or input to the next layer of the DNN. Processing based on one or more activation functions may be part of convolutional post-processing. In some embodiments, post-processing may include one or more other computations, such as offset computation, bias computation, etc. The result of post-processing may be stored in the local memory of the computation block and used as input to the next DNN layer. In some embodiments, the input activation values in the activation tensor 710 may be the result of post-processing of the previous DNN layer.
[0110] Figure 8 This is a block diagram of an example computing device 2000 according to various embodiments. In some embodiments, the computing device 2000 may be used as at least a part of an anomaly detection system 100. Multiple components are... Figure 8 The components are shown as being included in computing device 2000, but any one or more of these components may be omitted or copied to suit the application. In some embodiments, some or all of the components included in computing device 2000 may be attached to one or more motherboards. In some embodiments, some or all of these components are manufactured on a single system-on-a-chip (SoC) die. Furthermore, in various embodiments, computing device 2000 may not include... Figure 8 The computing device 2000 may include one or more of the components shown, but may include interface circuitry for coupling to said one or more components. For example, the computing device 2000 may not include the display device 2006, but may include display device interface circuitry to which the display device 2006 may be coupled. In another set of examples, the computing device 2000 may not include the audio input device 2018 or the audio output device 2008, but may include audio input or output device interface circuitry (e.g., connectors and support circuitry) to which the audio input device 2018 or the audio output device 2008 may be coupled.
[0111] Computing device 2000 may include processing device 2002 (e.g., one or more processing devices). Processing device 2002 processes electronic data from registers and / or memory to convert the electronic data into other electronic data that can be stored in registers and / or memory. Computing device 2000 may include memory 2004, which itself may include one or more memory devices, such as volatile memory (e.g., DRAM), non-volatile memory (e.g., read-only memory (ROM)), high-bandwidth memory (HBM), flash memory, solid-state memory, and / or hard disk drive. In some embodiments, memory 2004 may include memory sharing a die with processing device 2002. In some embodiments, memory 2004 includes one or more non-transitory computer-readable media storing instructions executable to perform operations for anomaly detection (e.g., combining...). Figure 5 The described method 500) or some operations performed by one or more components of the anomaly detection system 100. Instructions stored in one or more non-transitory computer-readable media can be executed by the processing device 2002.
[0112] In some embodiments, computing device 2000 may include communication chip 2012 (e.g., one or more communication chips). For example, communication chip 2012 may be configured to manage wireless communication for transmitting data to and from computing device 2000. The term "wireless" and its derivatives can be used to describe circuits, devices, systems, methods, technologies, communication channels, etc., which can transmit data through a non-solid medium using modulated electromagnetic radiation. This term does not imply that the associated device does not contain any wires; however, in some embodiments they may be wire-free.
[0113] The 2012 communication chip can implement any of many wireless standards or protocols, including but not limited to Institute of Electrical and Electronics Engineers (IEEE) standards, such as Wi-Fi (IEEE 802.10 series), IEEE 802.16 standards (e.g., IEEE 802.16-2005 amendments), the Long Term Evolution (LTE) project, and any amendments, updates, and / or revisions (e.g., the improved LTE project, the Ultra Mobile Broadband (UMB) project (also known as "3GPP2"), etc.). Broadband Wireless Access (BWA) networks compatible with IEEE 802.16 are often referred to as WiMAX networks, an abbreviation for Global Microwave Access Interoperability, which is a certification mark for products that have passed conformance and interoperability testing of the IEEE 802.16 standard. The 2012 communication chip can operate according to Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Evolved HSPA (E-HSPA), or LTE networks. The 2012 communication chip can also operate according to Enhanced Data for GSM Evolution (EDGE), GSM EDGE Radio Access Network (GERAN), Universal Terrestrial Radio Access Network (UTRAN), or Evolved UTRAN (E-UTRAN). The communication chip 2012 can operate according to Code-Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Digital Enhanced Cordless Telecommunication (DECT), Evolution-Data Optimized (EV-DO) and its derivatives, as well as any other wireless protocol specified as 3G, 4G, 5G, etc. In other embodiments, the communication chip 2012 can operate according to other wireless protocols.The computing device 2000 may include an antenna 2022 to facilitate wireless communication and / or receive other wireless communications (e.g., AM or FM radio transmissions).
[0114] In some embodiments, the communication chip 2012 can manage wired communications such as electrical, optical, or any other suitable communication protocol (e.g., Ethernet). As described above, the communication chip 2012 may include multiple communication chips. For example, a first communication chip 2012 may be dedicated to short-range wireless communications such as Wi-Fi or Bluetooth, and a second communication chip 2012 may be dedicated to long-range wireless communications such as GPS, EDGE, GPRS, CDMA, WiMAX, LTE, EV-DO, or others. In some embodiments, the first communication chip 2012 may be dedicated to wireless communications, and the second communication chip 2012 may be dedicated to wired communications.
[0115] The computing device 2000 may include a battery / power circuit 2014. The battery / power circuit 2014 may include one or more energy storage devices (e.g., batteries or capacitors) and / or circuitry for coupling components of the computing device 2000 to a power source (e.g., AC line power) that is separate from the computing device 2000.
[0116] The computing device 2000 may include a display device 2006 (or the corresponding interface circuitry described above). For example, the display device 2006 may include any visual indicator, such as a head-up display, computer monitor, projector, touch screen display, liquid crystal display (LCD), light-emitting diode display, or flat panel display.
[0117] The computing device 2000 may include an audio output device 2008 (or a corresponding interface circuit as described above). For example, the audio output device 2008 may include any device that generates audible indicators, such as a speaker, headphones, or earphones.
[0118] The computing device 2000 may include an audio input device 2018 (or a corresponding interface circuit as described above). The audio input device 2018 may include any device that generates a signal representing sound, such as a microphone, microphone array, or digital musical instrument (e.g., a musical instrument with a Musical Instrument Digital Interface (MIDI) output).
[0119] The computing device 2000 may include a GPS device 2016 (or a corresponding interface circuit as described above). As is known in the art, the GPS device 2016 can communicate with a satellite-based system and can receive the location of the computing device 2000.
[0120] The computing device 2000 may include other output devices 2010 (or corresponding interface circuitry as described above). Examples of other output devices 2010 may include audio codecs, video codecs, printers, wired or wireless transmitters for providing information to other devices, or additional storage devices.
[0121] The computing device 2000 may include other input devices 2020 (or corresponding interface circuits as described above). Examples of other input devices 2020 may include accelerometers, gyroscopes, compasses, image capture devices, keyboards, cursor control devices such as mice, styluses, touchpads, barcode readers, Quick Response (QR) code readers, any sensors, or radio frequency identification (RFID) readers.
[0122] The computing device 2000 can have any desired form factor, such as a handheld or mobile computer system (e.g., a mobile phone, smartphone, mobile internet device, music player, tablet computer, laptop computer, netbook computer, ultrabook computer, personal digital assistant (PDA), ultraportable personal computer, etc.), desktop computer system, server or other networked computing component, printer, scanner, monitor, set-top box, entertainment control unit, vehicle control unit, digital camera, digital video recorder, or wearable computer system. In some embodiments, the computing device 2000 can be any other electronic device that processes data.
[0123] The following paragraphs provide various examples of the embodiments disclosed herein.
[0124] Example 1 provides a method for anomaly detection, the method comprising: embedding training data at different spatial scales in the latent space of a neural network model, the neural network model including multiple convolutional blocks with different spatial scales; extracting multiple embedding features at the different spatial scales from the multiple convolutional blocks; determining a loss of the neural network model based on the multiple embedding features; training the neural network model based on the loss by updating one or more internal parameters of the neural network model; and using at least a portion of the trained neural network model to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies.
[0125] Example 2 provides the method described in Example 1, wherein the spatial scale indicates the model embedding extracted at a convolutional block of the neural network model, wherein the convolutional block comprises one or more convolutional layers.
[0126] Example 3 provides the method described in Example 1 or 2, wherein the training data further includes anomalous data with abnormalities, wherein the normal data and the anomalous data have different category labels.
[0127] Example 4 provides a method as described in any one of Examples 1-3, wherein determining the loss of the DNN model includes: for each embedded feature, determining the distance between the embedded feature and the average of the different spatial scales; and accumulating the distances for the plurality of embedded features.
[0128] Example 5 provides the method described in Example 4, wherein the distance is a Euclidean distance.
[0129] Example 6 provides the method of any one of Examples 1-5, further comprising: selecting one or more convolutional blocks from the plurality of convolutional blocks, wherein the one or more convolutional blocks are used to detect anomalies in the new data.
[0130] Example 7 provides the method described in Example 6, wherein the new data includes an image, and wherein the one or more convolutional blocks are selected based on one or more spatial scales of the one or more convolutional blocks and the spatial scale of the image.
[0131] Example 8 provides a method according to any one of Examples 1-7, wherein detecting anomalies in the new data includes: inputting the new data into at least a portion of a trained DNN model; determining an anomaly score based on the output of at least a portion of the DNN model; and determining whether the new data is anomalous based on the anomaly score.
[0132] Example 9 provides the method of Example 8, wherein determining the anomaly score includes: extracting one or more new embedding features from one or more convolutional blocks in at least a portion of the neural network model; and determining the anomaly score based on the one or more new embedding features.
[0133] Example 10 provides the method of any one of Examples 1-9, further comprising: validating the effectiveness of the trained neural network model by performing anomaly detection on test data with verified anomalies using the trained neural network model.
[0134] Example 11 provides one or more non-transitory computer-readable media storing instructions executable to perform operations for anomaly detection, the operations including: embedding training data at different spatial scales in the latent space of a neural network model, the neural network model including a plurality of convolutional blocks with different spatial scales; extracting a plurality of embedding features at the different spatial scales from the plurality of convolutional blocks; determining a loss of the neural network model based on the plurality of embedding features; training the neural network model based on the loss by updating one or more intrinsic parameters of the neural network model; and using at least a portion of the trained neural network model to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies.
[0135] Example 12 provides one or more non-transitory computer-readable media as described in Example 11, wherein the spatial scale indicates the model embedding extracted at a convolutional block of the neural network model, wherein the convolutional block comprises one or more convolutional layers.
[0136] Example 13 provides one or more non-transitory computer-readable media as described in Example 11 or 12, wherein the training data further includes anomalous data with anomalies, wherein the normal data and the anomalous data have different category labels.
[0137] Example 14 provides one or more non-transitory computer-readable media as described in any one of Examples 11-13, wherein determining the loss of the DNN model includes: for each embedded feature, determining the distance between the embedded feature and the average of the different spatial scales; and accumulating the distances for the plurality of embedded features.
[0138] Example 15 provides one or more non-transitory computer-readable media as described in Example 14, wherein the distance is a Euclidean distance.
[0139] Example 16 provides one or more non-transitory computer-readable media as described in any one of Examples 11-15, wherein the operation further includes: selecting one or more convolutional blocks from the plurality of convolutional blocks based on one or more spatial scales of the one or more convolutional blocks and the spatial scale of the new data, wherein the one or more convolutional blocks are used to detect anomalies in the new data.
[0140] Example 17 provides one or more non-transitory computer-readable media as described in any one of Examples 11-16, wherein detecting anomalies in the new data includes: inputting the new data into at least a portion of a trained DNN model; determining an anomaly score based on the output of at least a portion of the DNN model; and determining whether the new data is anomalous based on the anomaly score.
[0141] Example 18 provides an apparatus for anomaly detection, the apparatus comprising: a computer processor for executing computer program instructions; and a non-transitory computer-readable storage device storing computer program instructions executable by the computer processor to perform operations including: embedding training data at different spatial scales in the latent space of a neural network model, the neural network model including a plurality of convolutional blocks with different spatial scales; extracting a plurality of embedding features at the different spatial scales from the plurality of convolutional blocks; determining a loss of the neural network model based on the plurality of embedding features; training the neural network model based on the loss by updating one or more internal parameters of the neural network model; and using at least a portion of the trained neural network model to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies.
[0142] Example 19 provides the apparatus described in Example 18, wherein the spatial scale indicates the model embedding extracted at a convolutional block of the neural network model, wherein the convolutional block comprises one or more convolutional layers.
[0143] Example 20 provides the apparatus described in Example 18 or 19, wherein the operation further includes: selecting one or more convolutional blocks from the plurality of convolutional blocks based on one or more spatial scales of one or more convolutional blocks and the spatial scale of the new data, wherein the one or more convolutional blocks are used to detect anomalies in the new data.
[0144] The foregoing description of the embodiments illustrated herein, including the content described in the abstract, is not intended to be exhaustive or to limit the present disclosure to the precise forms disclosed. Although specific implementations and examples of the present disclosure have been described herein for illustrative purposes, various equivalent modifications can be made within the scope of this disclosure, as will be recognized by those skilled in the art. These modifications can be made to the present disclosure based on the foregoing detailed description.
Claims
1. A method for anomaly detection, the method comprising: Training data is embedded in the latent space of a neural network model at different spatial scales, the neural network model comprising multiple convolutional blocks having the different spatial scales; Extract multiple embedded features at different spatial scales from the multiple convolutional blocks; The loss of the neural network model is determined based on the multiple embedding features; Based on the loss, the neural network model is trained by updating one or more internal parameters of the neural network model; as well as At least a portion of the trained neural network model is used to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies.
2. The method according to claim 1, wherein, Spatial scale indicates the model embedding extracted at a convolutional block of the neural network model, wherein the convolutional block comprises one or more convolutional layers.
3. The method according to claim 1 or 2, wherein, The training data also includes anomalous data, wherein the normal data and anomalous data in the training data have different category labels.
4. The method according to any one of claims 1-3, wherein, Determining the loss of the neural network model includes: For each embedded feature, determine the distance between that embedded feature and the average value of the different spatial scales; and Accumulate the distances for the multiple embedded features.
5. The method according to claim 4, wherein, The distance mentioned is the Euclidean distance.
6. The method according to any one of claims 1-5, further comprising: One or more convolutional blocks are selected from the plurality of convolutional blocks, wherein the one or more convolutional blocks are used to detect anomalies in the new data.
7. The method according to claim 6, wherein, The new data includes images, wherein the one or more convolutional blocks are selected based on one or more spatial scales of the one or more convolutional blocks and the spatial scale of the image.
8. The method according to any one of claims 1-7, wherein, Anomaly detection for the new data includes: The new data is input into at least a portion of the trained neural network model; Anomaly scores are determined based on at least a portion of the output of the neural network model; and The anomaly score is used to determine whether the new data is abnormal.
9. The method according to claim 8, wherein, Determining the abnormal score includes: Extract one or more new embedding features from one or more convolutional blocks in at least a portion of the neural network model; and The anomaly score is determined based on one or more of the new embedding features.
10. The method according to any one of claims 1-9, further comprising: The effectiveness of the trained neural network model is verified by performing anomaly detection on test data with verified anomalies using the trained neural network model.
11. One or more non-transitory computer-readable media storing instructions that are executable to perform operations for anomaly detection, the operations including: Training data is embedded in the latent space of a neural network model at different spatial scales, the neural network model comprising multiple convolutional blocks having the different spatial scales; Extract multiple embedded features at different spatial scales from the multiple convolutional blocks; The loss of the neural network model is determined based on the multiple embedding features; Based on the loss, the neural network model is trained by updating one or more internal parameters of the neural network model; as well as At least a portion of the trained neural network model is used to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies.
12. One or more non-transitory computer-readable media according to claim 11, wherein, Spatial scale indicates the model embedding extracted at a convolutional block of the neural network model, wherein the convolutional block comprises one or more convolutional layers.
13. One or more non-transitory computer-readable media according to claim 11 or 12, wherein, The training data also includes anomalous data, wherein the normal data and anomalous data in the training data have different category labels.
14. One or more non-transitory computer-readable media according to any one of claims 11-13, wherein, Determining the loss of the neural network model includes: For each embedded feature, determine the distance between that embedded feature and the average value of the different spatial scales; and Accumulate the distances for the multiple embedded features.
15. One or more non-transitory computer-readable media according to claim 14, wherein, The distance mentioned is the Euclidean distance.
16. One or more non-transitory computer-readable media according to any one of claims 11-15, wherein, The operation also includes: One or more convolutional blocks are selected from the plurality of convolutional blocks, wherein the one or more convolutional blocks are used to detect anomalies in the new data.
17. One or more non-transitory computer-readable media according to claim 16, wherein, The new data includes images, wherein the one or more convolutional blocks are selected based on one or more spatial scales of the one or more convolutional blocks and the spatial scale of the image.
18. One or more non-transitory computer-readable media according to any one of claims 11-17, wherein, Anomaly detection for the new data includes: The new data is input into at least a portion of the trained neural network model; Anomaly scores are determined based on at least a portion of the output of the neural network model; and The anomaly score is used to determine whether the new data is abnormal.
19. One or more non-transitory computer-readable media according to claim 18, wherein, Determining the abnormal score includes: Extract one or more new embedding features from one or more convolutional blocks in at least a portion of the neural network model; and The anomaly score is determined based on one or more of the new embedding features.
20. One or more non-transitory computer-readable media according to any one of claims 11-19, wherein, The operation also includes: The effectiveness of the trained neural network model is verified by performing anomaly detection on test data with verified anomalies using the trained neural network model.
21. An apparatus for anomaly detection, the apparatus comprising: A computer processor is used to execute computer program instructions; as well as A non-transitory computer-readable storage device that stores computer program instructions executable by the computer processor to perform operations, including: Training data is embedded in the latent space of a neural network model at different spatial scales, the neural network model comprising multiple convolutional blocks having the different spatial scales; Extract multiple embedded features at different spatial scales from the multiple convolutional blocks; The loss of the neural network model is determined based on the multiple embedding features; Based on the loss, the neural network model is trained by updating one or more intrinsic parameters of the neural network model; and At least a portion of the trained neural network model is used to detect anomalies in new data, wherein the training data includes normal data lacking anomalies, and the new data includes anomalous data having anomalies.
22. The apparatus according to claim 21, wherein, Spatial scale indicates the model embedding extracted at a convolutional block of the neural network model, wherein the convolutional block comprises one or more convolutional layers.
23. The apparatus according to claim 21 or 22, wherein, The operation also includes: Based on one or more spatial scales of one or more convolutional blocks and the spatial scale of the new data, one or more convolutional blocks are selected from the plurality of convolutional blocks, wherein the one or more convolutional blocks are used to detect anomalies in the new data.
24. The apparatus according to any one of claims 21-23, wherein, The training data also includes anomalous data, wherein the normal data and anomalous data in the training data have different category labels.
25. The apparatus according to any one of claims 21-24, wherein, Determining the loss of the neural network model includes: For each embedded feature, determine the distance between that embedded feature and the average value of the different spatial scales; and Accumulate the distances for the multiple embedded features.